Table of Contents
Ranking of Books and Other Long Documents in Search Results
A newly granted Google patent tells us about information retrieval and searcher interfaces for relevant search results for a query. It focuses on longer organic content such as books, or newsletters, or catalogs in search results. I have seen results from Google Books mixed in with organic results. The results from books are often one large document covering many pages of the same book and could show excerpts from that book in search results. Of course, these results are not as different as organic results or news results or local results, or knowledge-based results. But it was interesting seeing a patent that focused on these larger documents in SERPs.
Related Content:
Having a specific process for longer documents like this tells us that content length is not, specifically, a ranking signal. Google seems to want both longer and shorter documents to show up in search results. This patent describes a way for those longer documents to show up in a meaningful way.
This patent tells us this about search engines, as it introduces the processes that make it work:
Modern computer networks and the Web have made information widely and easily available. Free search engines index many millions of web documents linked to the Internet. A person connected to the Web can enter a query to locate web documents filled with relevant search results.
A category of content that is not widely available on the Web involves more traditional printed works of authorship, such as books and magazines.
These works are not generally available because of difficulty converting printed versions of those works to a digital form.
Optical character recognition (OCR) (an optical scanning device for images of text converted to characters in a computer-readable format such as an ASCII file) is a known technique for converting printed text to a useful digital form.
OCR systems generally include an optical scanner for generating images of printed pages and software for analyzing the images.
In the description that summarizes how this patent works, it breaks that patent down into features.
How to Return Relevant Search Results
A method to return relevant search results from a search engine may involve:
- Receiving a search query
- Identifying a document based on the search query
- Providing relevant search results based on the document
Relevant Search results may include:
- Images associated with the document
- Excerpts from the documents associated with the query
- Links to other excerpts in the document associated with the query
A GUI (graphical searcher interface) may include relevant search results associated with a set of documents.
The search results are likely generated based on a search query.
One of the search results may include:
- An image associated with the document
- An excerpt from the document that includes a search term of the search query
- Links to other excerpts in the document that include a search term of the search query
A Graphic User Interface of Search Results
The graphical searcher interface may include:
- Links to portions of a document
- Excerpts from the document, where the excerpt may include an image of text from the document
- Descriptions of content of the document
- Information about web documents associated with the document
- Bibliographic information associated with the document
The GUI may include a page of a document, which includes:
- A search term
- A set of links to portions of the document
- A link to a next or previous page of the document that includes the search term
The GUI may include a:
- First excerpt, with a part of text and a thumbnail image
- Second excerpt with a part of text and a thumbnail image
The GUI may include:
- Images from a document including a search term
- Links associated with images, where the links may permit a larger view of the image
- Links to other portions of the document
The GUI interface may include information about:
- A page of a document
- Links to earlier accessed pages, where each link from a searcher accessing the previous page.
The GUI may include information about:
Previously accessed pages associated with a set of documents.
An image associated with one of the documents.
The information may be from a searcher accessing the prior accessed pages.
What Should a Searcher Interface Contain?
A computer-readable medium may include instructions for:
- Identifying a document based on the search query.
- Providing a search result based on the document
The search result may include:
- An excerpt from the document that includes a search term associated with the search query
- Links to other excerpts in the document that include a search term associated with the search query
This patent can is at:
searcher interfaces for a document search engine
Inventors: Siraj Khaliq, Joe Sriver, Frederick G. M. Roebert, William Brougher, Adam Smith
Assignee: Google LLC
US Patent: 11,023,550
Granted: June 1, 2021
Filed: October 26, 2016
Prior Publication Data
Abstract
A method includes receiving a search query, identifying a document based on the search query, and providing a search result based on the document.
The search result includes, for example, an image associated with the document, an associated excerpt from the document with the search query, and links to other excerpts in the document with the search query.
The method may also include providing other information associated with the document.
Returning Relevant Search Results from Larger Documents
More types of documents are becoming searchable via search engines.
This includes books, magazines, and catalogs scanned with their text recognized via OCR.
It is beneficial to present information about those and other documents useful to searchers seeking such information. I have seen such search results from sources such as books included on search results. This patent reminds me of many I have seen like that.
“Systems and methods consistent with the principles of the invention may provide information about documents identified as relevant to search queries in a manner that is useful to the searchers who provided the search queries.”
In many ways, these seem like other organic search results, but they show information from larger documents such as books. Illustrations from this patent show the patent returning excerpts from content from books and other larger documents relevant to search queries.
Exemplary Processing
This patent shows processing beginning with a searcher using a search term (or a group of search terms) as a search query for searching a document repository. The document repository can include documents available from the Web and a database. The vehicle for searching this repository is a search engine. The searcher may provide the search query via web browser software on a client.
The search query from the search engine may identify documents (e.g., books, magazines, newspapers, articles, catalogs, etc.) related to a search query.
Identifying Documents Related to a Search Query
Many techniques exist for identifying documents related to a search query. For example, one might include identifying documents that contain the search term or synonyms of the search term. Besides, when the search query includes more than one search term, a technique might include identifying documents containing the search terms as a phrase, containing the search terms but not necessarily together, or containing less than all search terms.
An Information Retrieval Score May Be Generated
Optionally, scoring the documents may happen. This score may be an information retrieval (IR) score. Several techniques exist for generating an IR score. For example, an IR score for a document may be generated based on the number of occurrences of the search terms in the document text, where the search terms occur within the document (e.g., title, content, footer, header, etc.), or characteristics of occurrences of the search terms (e.g., font, size, color, etc.).
Search results may be based on the documents and their optional scores and presented to the searcher. The search results may include information associated with the documents. This can mean links to the documents based on the document scores. The search results may be an HTML document, like search results provided by conventional search engines. The search results may occur according to another format agreed upon by the search engine and the client (e.g., Extensible Markup Language (XML) or PDF).
Searcher Interfaces for Presenting Search Results
Assume that a searcher provides a search query that includes the search term “memory” and a search based on the search query to identify a set of documents related to the search query.
A search result may include a:
- Document title
- Author information
- Excerpt from the document
- Address associated with the document
- Links to other relevant excerpts in the document
- Images associated with the document
That document title may include a title associated with the document. Besides, the selection of the document title may cause the showing of detailed information, possibly in the form of a reference page (described below) or an excerpt page (described below), associated with the document. For example, the author information may include the name(s) of the author(s) of the document.
An Excerpt May Include a Part of the Document that Includes a Search Term of the Query
An excerpt may include a part of the document that includes a search term of the search query. Optionally, occurrences of the search term may be visually distinguished (e.g., highlighted) in the part of the document. An excerpt may also include a page number associated with the excerpt. The page number selection may result in the presentation of an excerpt page associated with the excerpt.
An address may include the address of the storage of the document. Links may permit one or more other excerpts shown to the searcher from the document. An image may include an image of a front cover (or another part) of the document (if available). The image can include a thumbnail version of the front cover of the document.
A search result may include:
- Document title and author information
- A first excerpt from the document
- A second excerpt from the document
- Optionally a Link to other relevant excerpts in the document
- An image associated with the document
Reference Pages That May Be Presented
Assume that a searcher provided a search query that included the search term “memory,” A search identified a set of documents related to the search query.
A reference page may include:
- An excerpt from the document
- A synopsis of the document
- A jacket or flap description associated with the document
- Related information
- Bibliographic information
- Links to different portions of the document
An excerpt may include a text from the document that may include a search query search term. The part of the text may correspond to an image of the document text or the text version. The search term’s occurrence may be visually distinguished (e.g., highlighted) in the part of the text. The searcher can view three excerpts from the document by selecting a selectable object, such as “Next” or “Previous.” In such ways, the searcher may view more or fewer excerpts.
A Synopsis May Include a Brief Description of the Contents of the Document
A synopsis may include a brief description of the contents of the document. For example, a jacket or flap description may include text from a jacket, cover, or flap associated with the document.
Related information may include information about web documents related to the document or an author associated with the document.
Related information may include:
- Information relating to web document(s) with a review of the document
- Web document(s) with a biography of the author
- Other web document(s) related to the document
- Web document(s) and image(s) related to the author
- News article(s) related to the document, or the author or product(s) related to the document
Bibliographic information may include information, such as the ISBN, ISSN. It would also include the name of the publisher, the category code that identifies a category of the topical content of the document, the publication date, the title, the name of an author associated with the document, and a format (e.g., hardcover, paperback, etc.) associated with the document. Bibliographic information may also include more, fewer, or different pieces of information. Links may include links to various portions of the document. Those links may reference the front cover, the table of contents, the index, and the back cover of the document.
What Would a Reference Page Include?
The reference page may also include an image and an advertisement (ad) associated with the document. The image may include an image of, for example, a front cover (or another part) of the document (if available).
That Image can include a thumbnail version of the front cover of the document. The advertisement may include a set of advertisements associated with a business that sells the document, other documents associated with the author, and documents related to this document. The advertisement may also include an advertisement associated with or derived from the search query, other (related) documents, or searcher behavior.
A reference page may also include a synopsis about the document, a jacket or flap description associated with the document, related information, bibliographic information, a set of links to different portions of the document, an image associated with the document, and an advertisement associated with the document. The reference page may also include a set of excerpts from the document. The excerpts may include portions of text from the document that may include a search query search term. The portions of text may correspond to images of the document text or the text versions. Occurrences of the search term may be visually distinguished (e.g., highlighted) in the portions of text. In this implementation, the presentation of three excerpts from the document may happen.
Prior Accessed Pages
The patent tells us that it may be beneficial to provide searchers with easy access to pages of a document before the searchers accessed. It may also be beneficial to provide searchers with easy access to pages from different documents before accessing them. Either of these would assist searchers in finding information of interest. Besides, techniques exist for tracking pages accessed by searchers.
An excerpt page may also include a set of links associated with prior accessed pages. For example, links may include links to prior accessed pages and links to all prior accessed pages. Selection of one of the links may cause the presentation of an excerpt page like the excerpt page. The selection of links may cause a presentation of prior accessed pages.
A Page of Prior Accessed Pages Associated with a Document
Documents that return relevant search results may include the document title and author information, an image associated with the document, links to different portions of the document, a set of excerpts associated with prior accessed pages from the document, and an advertisement for the document.
Document title and author information may include a title associated with the document and the name(s) of the author(s) of the document. The image may include an image of a front cover (or another part) of the document (if available).
The image could include a thumbnail version of the front cover of the document. Links may include links to various portions of the document. For example, the links may reference the front cover, the table of contents, an excerpt, the index, and the back cover associated with the document. Besides, the links may reference more, fewer, or different portions of the document. For example, the advertisement may include a set of advertisements associated with a business that sells the document, other documents associated with the author, or documents related to this document. The advertisement may also include an advertisement associated with or derived from the search query, other (related) documents, or searcher behavior.
The excerpts may include portions of text from previously accessed pages of the document. They may correspond to images of the document text or the text versions. Occurrences of a search term may be visually distinguished (e.g., highlighted) within the portions of text. Each of the excerpts may include a page number associated with the excerpt. In one implementation, selecting the page number may present an excerpt page associated with the excerpt. The number of excerpts may be configurable based on time (e.g., all pages accessed within the last 10 hours) or number (e.g., the last 20 pages accessed). Even longer documents may be returned when they ran for relevant search results.
Search News Straight To Your Inbox
*Required
Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: