Bill Slawski, Author at Go Fish Digital https://gofishdigital.com/blog/author/bill/ Wed, 30 Aug 2023 00:23:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.6 https://gofishdigital.com/wp-content/uploads/2021/09/cropped-gfdicon-color-favicon-1-32x32.png Bill Slawski, Author at Go Fish Digital https://gofishdigital.com/blog/author/bill/ 32 32 An Automated Assistant Enabled Vehicle https://gofishdigital.com/blog/an-automated-assistant-enabled-vehicle/ https://gofishdigital.com/blog/an-automated-assistant-enabled-vehicle/#respond Tue, 10 May 2022 13:17:30 +0000 https://gofishdigital.com/?p=5241 What will Google’s Automated Assistants be capable of doing tomorrow? Chances are they will be involved in running smart homes and internet of things devices and helping us drive vehicles.  A patent was just granted to Google this week about using an automated assistant to control a vehicle.  This won’t be implemented soon, but it […]

An Automated Assistant Enabled Vehicle is an original blog post first published on Go Fish Digital.

]]>
What will Google’s Automated Assistants be capable of doing tomorrow? Chances are they will be involved in running smart homes and internet of things devices and helping us drive vehicles.  A patent was just granted to Google this week about using an automated assistant to control a vehicle.  This won’t be implemented soon, but it might be something we are driving in in the not too distant future.

An Automated Assistant Controlling a Vehicle in the Future

Humans may engage in human-to-computer dialogs with interactive software applications referred to herein as an “automated assistant.”

I have written a few different posts about Google’s Automated Assistants which interact with humans in a variety of ways.

Here are some previous posts I have written about automated assistants:

I have a speaker device which is an automated assistant.  I use it to perform some searches and listen to music, and send some search results to my phone.  It doesn’t do as many things as helping me drive a vehicle, but this patent may be an illustration of what Google’s automated assistant will be able to do in the future.

Related Content:

Under this patent, humans may provide commands and requests to an automated assistant using spoken natural language input (such as utterances), which may in some cases get converted into text and then processed, and by providing textual (e.g., typed) natural language input.

An Automated assistant can get integrated into a variety of electronic devices, including vehicles. Unlike other computers such as mobile phones, vehicles are generally in motion over a large area, and thus are more susceptible to bandwidth restrictions during communications with an outside server.

automated assistant

This can in part result from the vehicle moving through areas that do not provide adequate network coverage. This can affect automated assistant operations, which may involve many round trips between a vehicle computer and a remote server.

Automated assistants may have access to publicly-available data as well as user-specific data, which can get associated with a personal user account served by the automated assistant. An automated assistant serving many users may have many accounts with different data available for each account.

Commanding Automated Assistant

Thus, if one user makes a request to an automated assistant, and responding to the request involves accessing a second user account, the automated assistant may not be able to complete the request without prompting the second user to log in to their account and repeat the request.

commanding automated assistant

As a result, computational and communication resources, such as network bandwidth and channel usage time, can get consumed by increasing many interactions between the vehicle computer and the server.

Other Users Overriding Restrictions

Implementations described herein relate to limiting vehicle automated assistant responsiveness according to restrictions that get used to determine whether certain input commands and certain users get restricted in certain vehicle contexts. Furthermore, implementations described herein allow for other users to override certain restrictions by providing authorization via an input to the vehicle computer or another computer.

Allowing other users to override such restrictions can preserve computational resources, as less processing resources and network bandwidth would get consumed when a restricted user does not have to rephrase and resubmit certain inputs in a way that would make the inputs permissible.

As an example, a passenger that provides a spoken input to a vehicle automated assistant such as “Assistant, send a message to Karen,” may get denied because the passenger is not the owner of the vehicle or otherwise permitted to access contacts accessible to the vehicle automated assistant.

As a result, the vehicle automated assistant can provide a response such as “I’m sorry, you are not authorized for such commands,” and the passenger would have to rephrase and resubmit the spoken input as, for example, “Ok, Assistant, send a message to 971-555-3141.”

Such a dialog session between the passenger and the vehicle automated assistant can waste computational resources as the later spoken input would have to get converted to audio data, transmitted over a network, and processed.

In a situation where available bandwidth gets limited or variable, such as for example in a moving vehicle, this might be particularly undesirable since the channel over which data gets communicated from the assistant device, over the network, may need to get used for a longer than desirable.

The length of time such a channel gets used might impact not only the operations of the automated assistant but also other software applications which rely on the network to send and receive information.

Such software applications may, for example, be present in the same device as the automated assistant (e.g. other in-vehicle software applications). However, implementations provided herein can eliminate such wasting of computational and communication resources by at least allowing other users to authorize the execution of certain input commands from a user, without requesting the user to re-submit the commands.

Restriction  Of Access To Commands

A vehicle computer and an automated assistant can operate according to different restrictions for restricting access to commands and data that would otherwise be accessible via the vehicle computer and the automated assistant. A restriction can characterize particular commands, data, types of data, and any other inputs and outputs that can get associated with an automated assistant, thereby defining certain information that is available to other users via the automated assistant and the vehicle computer.

When a user provides a spoken utterance corresponding to a particular command characterized by a restriction, the automated assistant can respond according to any restriction that gets associated with the user and the particular command. As an example, when a user provides a spoken utterance that corresponds to data that originated at a computer owned by another user, the spoken utterance can satisfy a criterion for restricting access to such data.

However, in response to receiving the spoken utterance, the automated assistant can determine that the criterion gets satisfied and await authorization from the other user. The authorization can get provided by the other user to the vehicle computer and a separate computer via another spoken utterance and any other input capable of getting received at a computer.

A vehicle that includes the vehicle computer can include an interface, such as a button (e.g., on the steering wheel of the vehicle), that the other user can interact with (e.g., depress the button) in order to indicate authorization to the automated assistant.

In response to the automated assistant receiving authorization from the other user, the automated assistant can proceed with executing the command provided by the user, without necessarily requesting further input from the user.

Automated Assisstant Limiting Acess To Passengers

Another user can limit a passenger from accessing certain data while the other user and the passenger are riding in the vehicle. The other user can limit access to certain data while the vehicle is navigating along a particular route and to a particular destination. Therefore, when the vehicle completes the route and arrives at the particular destination, a restriction on access to the particular data and for the passenger can get released, thereby allowing the passenger to subsequently access such data.

For instance, when the other user is driving the vehicle and the passenger is riding in the vehicle, the passenger can provide a spoken utterance to an automated assistant interface of the vehicle. The spoken utterance can be, “Assistant, call Aunt Lucy.”

Automated Assistant Awaiting Authorization From The User

In response, and because the spoken utterance includes a request that will result in accessing the contact information of the user, the automated assistant can await authorization from the user before fulfilling the request. However, in order to eliminate having to repeatedly authorize or not authorize requests originating from the passenger, the user can provide another spoken utterance such as, “Assistant, do not respond to the passenger for the remainder of this trip.”

In response, the automated assistant can cause restriction data to get generated for limiting access to services (e.g., making phone calls) that would otherwise be available via the automated assistant.

In this way, the user would not have to repeatedly authorize or not authorize the automated assistant to respond to requests from the passenger, thereby eliminating the waste of computational resources and network resources. Furthermore, because the access restrictions can be set to “reset” at the end of a trip, or upon reaching a destination, the user would not have to explicitly request a reset of restrictions, thereby further eliminating the waste of computational resources and network resources.

The user can limit access to certain data to a passenger indefinitely and for an operational lifetime of the vehicle.

For instance, subsequent to the passenger providing the spoken utterance, “Assistant, call Aunt Lucy,” and while the automated assistant is awaiting authorization from the user, the user can provide a separate spoken utterance such as, “Assistant, never respond to the user.”

Automated Assistant Causing Restriction Data To Get Generated

In response, the automated assistant can cause restriction data to get generated (or for an operational lifetime of the vehicle, the vehicle computer, and the automated assistant) limiting access to services that would otherwise be available to a particular user via the automated assistant.

Depending on the occupancy of the vehicle, the automated assistant and the vehicle computer can operate according to an operating model that limits access to the automated assistant and the vehicle computer for certain passengers. As an example, when a user is the only person occupying a vehicle, a vehicle computer and an automated assistant that is accessible via the vehicle computer, can operate according to a first operating mode.

Occupancy Of Vehicle Determined Based On Output Of Sensors Or Operating Modes

The occupancy can get determined based on an output of sensors of the vehicle, the vehicle computer, and any other device that can provide an output from which occupancy can get estimated. The first operating mode can get selected based on the occupancy and can provide the user access to the first set of services, data, and commands, associated with the automated assistant.

When the occupancy gets determined to include more than the user, such as when the user is driving with passengers (e.g., a parent driving with many children as passengers), a second operating mode can get selected. In accordance with the second operating mode, the user can still access the first set of services, data, and commands–however, the passengers would only be able to access the second set of services, data, and commands.

The second set can be different than the first set, and the second set can be a reduced subset relative to the first set. For example, pushing the “talk” button on the head unit, when only a driver (e.g., an unrestricted user) is in the vehicle, can respond with private data without any further authorization.

However, if the “talk” button on the head unit gets pushed when a passenger (e.g., a restricted user) is in the vehicle with the driver, the automated assistant request further authorization to respond to someone (e.g., the passenger) pressing the “talk” button on the head unit.

While the second operating mode (e.g., a shared operating mode) is active, a passenger can attempt to access a service, data, and a command that is exclusively provided in the first set, and not the second set. In order to permit such access, the user (e.g., the driver) can provide inputs to the automated assistant and the vehicle computer, in order to authorize such access.

The user can provide, for example, an input to an interface such as a button and touch display panel, which can get located approximately within reach of a driver of the vehicle (e.g., a button on a steering wheel, a touch display panel integral to a dashboard and console). The authorizing input can get provided in response to the automated assistant soliciting authorization from the user (e.g., “Sorry, I need the authorization to do that . . . [authorizing input received]”).

Alternatively, the automated assistant can bypass soliciting the user for authorization, and, rather, passively wait to respond to a request from a passenger until the user provides an authorizing input.

However, if the user elects to have their automated assistant and their vehicle computer operate according to a third operating mode.

In the third operating mode, in which no option to provide such authorization is available, the automated assistant and the vehicle computer can operate such that the availability of certain operations, data, and services get limited for some passengers (at least relative to a user that is a primary and “master” user with respect to the automated assistant and the vehicle computer).

Automated Assistant Routines

An automated assistant can perform automated assistant routines. An automated assistant routine can correspond to a set and sequence of actions performed and initialized by the automated assistant in response to a user providing a particular input. The user can provide a spoken utterance such as, “Assistant, let’s go to work,” when the user enters their vehicle, in order to cause the automated assistant to perform a “Going to Work” routine.

The “Going to Work” routine can involve the automated assistant causing the vehicle computer to render graphical data corresponding to a daily schedule of the user and render audio data corresponding to a podcast selected by the user.  It can generate a message to a spouse of the user indicating that the user is headed to work (e.g., “Hi Billy, I’m headed to work.”). In some instances, however, a passenger of the vehicle can provide the spoken utterance, “Assistant, let’s go to work.”

Depending on the mode that the vehicle computer and the automated assistant is operating in, the automated assistant can request that the driver, or another authorized user, provide permission to perform actions of a requested routing.

The Automated Assistant “Going to Work” Routine

For example, in response to the passenger invoking the “Going to Work” routine, the automated assistant can initialize performance rendering audio data corresponding to a particular podcast, and also prompt the driver for authorization to initialize other actions of the routine.

Specifically, the vehicle computer and server device can identify actions of the routine that involve accessing restricted data. In this instance, the vehicle computer and the server device can determine that the schedule of the user and the contacts of the user (for sending the message) get restricted data.

As a result, during the performance of the routine, the driver can get prompted times to give permission to execute any actions involving accessing restricted data.

If the driver gives authorization (e.g., via an assistant invocation task), by speaking an invocation phrase (e.g., “Ok, Assistant.”) or interacting with an interface (e.g., pressing a button), the routine can get completed. For instance, the message can get sent to the spouse and the schedule of the driver can get rendered audibly.

However, if authorization is not provided by the driver (e.g., the driver does not perform an assistant invocation task), the automated assistant can bypass the performance of such actions. When the driver does not provide authorization to complete the actions, alternative actions can get provided as options to the passenger.

For instance, instead of audibly rendering the schedule of the driver, the automated assistant can render public information about events that are occurring in the nearby geographic region.

Sending A Message

Instead of sending a message to the spouse of the driver, the automated assistant can prompt the passenger regarding whether they would like to have a message transmitted via their own account (e.g., “Would you like to login, in order to send a message?”). Restrictions on the data of the driver would get enforced while simultaneously providing assistance to a passenger who may be in the vehicle due to, for example, participation in a ride-sharing activity.

The above description gets provided as an overview of some implementations of the present disclosure.

Other implementations may include a system of computers and robots that include processors operable to execute stored instructions to perform a method such as of the methods described above and elsewhere herein.

This Automated Assistant-Enabled Vehicle is Described in this patent:

Modalities for authorizing access when operating an automated assistant enabled vehicle
Inventors: Vikram Aggarwal and Moises Morgenstern Gali
Assignee: GOOGLE LLC
US Patent: 11,318,955
Granted: May 3, 2022
Filed: February 28, 2019

Abstract:

Implementations relate to enabling of authorization of certain automated assistant functions via one or more modalities available within a vehicle.

Implementations can eliminate wasting of computational and communication resources by at least allowing other users to authorize execution of certain input commands from a user, without requesting the user to re-submit the commands.

The vehicle can include a computing device that provides access to restricted data, which can be accessed in order for an action to be performed by the automated assistant.

However, when a restricted user requests that the automated assistant perform an action involving accessing the restricted data, the automated assistant can be authorized or unauthorized to proceed with fulfilling the request via a modality controlled by an unrestricted user.

The unrestricted user can also cause contextual restrictions to be established for limiting functionality of the automated assistant during a trip, for certain types of requests, and/or for certain passengers.

 

Automated Assistant Enhanced Vehicle Conclusion

I have only written about the summary of this patent in this post.  If you want more details about how this automated assistant patent will work, click through to the patent itself for more details about how it could work.  This summary provides some insight into how control over a vehicle would be established using an automated assistant.

At this time Automated Assistants tend to be smaller devices such as smart speakers. Chances are that they will grow to do things such as power vehicles, as shown in this patent.  The interface is different than the one that Google devices tend to use. They are in a more conversational format than a desktop or laptop computer.  I was reminded of Android Auto while reading this post.  I can see Google wanting to have cars controlled by something like android auto or the Automated Assistant.

 

 

 

An Automated Assistant Enabled Vehicle is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/an-automated-assistant-enabled-vehicle/feed/ 0
Query Categorization Based On Image Results https://gofishdigital.com/blog/query-categorization-based-on-image-results/ https://gofishdigital.com/blog/query-categorization-based-on-image-results/#respond Wed, 27 Apr 2022 13:00:27 +0000 https://gofishdigital.com/?p=5202 Google was recently granted a patent on Query Categorization Based On Image Results. The patent tells us that: “internet search engines provide information about Internet-accessible resources (such as Web pages, images, text documents, multimedia content)  responsive to a user’s search query by returning, when image searching, a set of image search results in response to […]

Query Categorization Based On Image Results is an original blog post first published on Go Fish Digital.

]]>
Google was recently granted a patent on Query Categorization Based On Image Results.

The patent tells us that: “internet search engines provide information about Internet-accessible resources (such as Web pages, images, text documents, multimedia content)  responsive to a user’s search query by returning, when image searching, a set of image search results in response to the query.”

A search result includes, for example, a Uniform Resource Locator (URL) of an image or a document containing the image and a snippet of information.

Related Content:

Ranking SERPs Using a Scoring Function

The search results can be ranked (such as in order) according to scores assigned by a scoring function.

The scoring function ranks the search results according to various signals:

  • Where (and how often) query text appears in document text surrounding an image
  • An image caption or alternative text for the idea
  • How standard the query terms are in the search results indexed by the search engine.

In general, the subject described in this patent is in a method that includes:

  • Obtaining images from the first image results for a first query, where a number of the acquired images associated with scores and user behavior data that state user interaction with the obtained images when the obtained images are search results for the query
  • Selecting a number of the acquired images each having respective behavior data that satisfies a threshold
  • Associating the chosen first images with several annotations based on analysis of the selected images’ content

These can optionally include the following features.

The first query can be associated with categories based on the annotations. The query categorization and annotation associations can get stored for future use. The second image results responsive to a second query that is the same or like the first query can get received.

Each of the second images gets associated with a score, and the second image can get modified based on the categories related to the first query.

One of the query categorizations can state that the first query is a single-person query and increases the scores of the second image, whose annotations say that the set of second images contains a single face.

One query categorization can state that the first query is diverse and increase the scores of the second images, whose annotations say that the set of second images is diverse.

One of the categories can state that the first query is a text query and increase the scores of the second image, whose annotations say that the set of second images contains the text.

The first query can get provided to a trained classifier to determine a query categorization in the categories.

Analysis of the selected first images’ content can include clustering the first image results to determine an annotation in the annotations. User behavior data can be the number of times users select the image in search results for the first query.

The subject matter described in this patent can get implemented so on realize the following advantages:

The image result set gets analyzed to derive image annotations and a query categorization, and user interaction with image search results can get used to derive types for queries.

Query Categorization

Query categories can, in turn, improve the relevance, quality, and diversity of image search results.

Query categorization can also get used as part of query processing or in an off-line process.

Query categories can get used to provide automated query suggestions such as “show only images with faces” or “show only clip art.”

query categorization

 

Query categorization based on image results
Inventors: Anna Majkowska and Cristian Tapus
Assignee: GOOGLE LLC
US Patent: 11,308,149
Granted: April 19, 2022
Filed: November 3, 2017

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for query categorization based on image results.

In one aspect, a method includes receiving images from image results responsive to a query, wherein each of the photos gets associated with an order in the image results and respective user behavior data for the image as a search result for the first query and associating of the first images with a plurality of annotations based on analysis of the selected first images’ content.

A System That Uses Query Categorization To Improve The Set Of Results Returned For A Query

A client, such as a web browser or other process executing on a computing device submits an input query to a search engine, and the search engine returns image search results to the client. In some implementations, a query comprises text such as characters in a character set (e.g., “red tomato”).

A query comprises images, sounds, videos, or combinations of these. Other query types are possible. The search engine will search for results based on alternate query versions equal to, broader than, or more specific than the input query.

The image search results are an ordered or ranked list of documents or links to such, which are determined to be responsive to the input query, with the documents determined to be most relevant having the highest rank. A copy is a web page, an image, or another electronic file.

In the case of image search, the search engine determines an image’s relevance based, at least in part, on the following:

  • Image’s content
  • The text surrounding the image
  • Image caption
  • Alternative text for the image

Categories Associated With A Query

In producing the image search results, the search engine in some implementations submits a request for categories associated with the query. The search engine can use the associated categories to re-order the image search results by increasing the rank of image results determined to belong to the related categories.

In some cases, it may decrease image results that do not belong to the associated categories or both.

The search engine can also use the categories of the results to determine how they should get ranked in the finalized set of results in combination with or of the query category.

A categorizer engine or other process employs image results retrieved for the query and a user behavior data repository to derive categories for the query. The repository contains user behavior data. The storage indicates the number of times populations of users selected an image result for a given query.

Image selection can be accomplished in various ways, including using the keyboard, a computer mouse or a finger gesture, a voice command, or other methods. User behavior data includes “click data.”

Click Data Indicates How Long A User Views Or “Dwells” On An Image Result

Click data indicates how long a user views or “dwells” on an image result after selecting it in a results list for the query. For example, a long time dwelling on an image (such as greater than 1 minute), termed a “long click,” can state that a user found the image relevant to the user’s query.

A brief period of viewing an image (e.g., less than 30 seconds), termed a “short click,” can get interpreted as a lack of image relevance. Other types of user behavior data are possible.

By way of illustration, user behavior data can get generated by a process that creates a record for result documents selected by users in response to a specific query. Each form can get represented as a tuple: <document, query, data>) that includes:

  • A question submitted by users
  • A query reference indicating the query
  • A document reference a paper selected by users in response to the query
  • Aggregation of click data (such as a count of each click type) for all users or a subset of all users that selected the document reference in response to the query.

Extensions of this tuple-based approach to user behavior data are possible. For instance, the user behavior data can get extended to include location-specific (such as country or state) or language-specific identifiers.

With such identifiers included, a country-specific tuple would consist of the country from where the user query originated, and a language-specific tuple would consist of the language of the user query.

For simplicity of presentation, the user behavior data associated with documents A-CCC for the query get depicted in the table as being either a “high,” “med,” or “low” amount of favorable user behavior data (such as user behavior data indicating relevance between the document and the query).

User Behavior Data For A Document

Favorable user behavior data for a document can state that the paper is selected by users when it gets viewed in the results for the query, or when a users view the document after choosing it from the results for the query, the users view the document for an extended period (such as the user finds the document to be relevant to the question).

The categorizer engine works in conjunction with the search engine using returned results and user behavior data to determine query categories and then re-rank the results before they get returned to the user.

In general, for the query (such as a query or an alternate form of the query) specified in the query category request, the categorizer engine analyzes image results for the query to determine if the query belongs to categories. Image results analyzed in some implementations have been selected by users as a search result for the query a total number of times above a threshold (such as set at least ten times).

The categorizer engine analyzes all image results retrieved by the search engine for a given query. in other implementations

The categorizer engine analyzes image results for the query where a metric (e.g., the total number of selections or another measure) for the click data is above a threshold.

The image results can be analyzed online using computer vision techniques in various ways, either offline or online, during the scoring process. Images get annotated with information extracted from their visual content.

Image Annotations

For example, image annotations can get stored in the annotation store. Each analyzed image (e.g., image 1, image 2, etc.) gets associated with annotations (e.g., A1, A2, and so on) in a photo to annotation association.

The annotations can include:

  • The number of faces in the image
  • The size of each face
  • The dominant colors of the image
  • Whether a picture contains text or a graph
  • Whether an image is a screenshot

Additionally, each image can get annotated with a fingerprint which can then determine if two images are identical or identical.

Next, the categorizer engine analyzes image results for a given query and their annotations to determine query categories. Associations of query categories (e.g., C1, C2, and so on) for a given query (such as query 1, query 2, etc.) can be determined in many ways, such as using a simple heuristic or using an automated classifier.

A Simple Query Categorizer Based On A Heuristic

As an example, a simple query categorizer based on a heuristic can get used to determine the desired dominant color for the query (and whether there is one).

The heuristic can be, for example, that if out of the top 20 most often clicked images for the query, at least 70% have a dominant color red, then the query can get categorized as “red query.” For such queries, the search engine can re-order the retrieved results to increase the rank of all images annotated with red as a dominant color.

The same categorization can get used with all other standard colors. An advantage of this approach to over-analyzing the text of the query is that it works for all languages without the need for translation (such as it will promote images with dominant red color for the question “red apple” in any language).  It is more robust (such as it will not increase the rank of red images for the query “red sea”).

An Example Categorizer Engine

The categorizer engine can work in an online mode or offline mode in which query category associations get stored ahead of time (e.g., in the table) for use by the search engine during query processing.

The engine receives query image results for a given query and provides the image results to image annotators. Each image annotator analyzes image results and extracts information about the visual content of the image, which gets stored as an image annotation (e.g., image annotations) for the idea.

A Face Image Annotator

By way of illustration, a face image annotator:

  • Determines how many faces are in an image and the size of each face
  • a fingerprint image annotator extracts visual image features in a condensed form (fingerprint) which then can get compared with the fingerprint of another image to determine if the two images are similar
  • A screenshot image annotator determines if an image is a screenshot
  • A text image annotator determines if a picture contains text
  • A graph/chart image query determines if an image includes graphs or charts (e.g., bar graphs)
  • A dominant color annotator determines if a picture contains a dominant color

Other image annotators can also get used. For example, several image annotators get described in a paper entitled “Rapid Object Detection Using a Boosted Cascade of Simple Features,” by Viola, P.; Jones, M., Mitsubishi Electric Research Laboratories, TR2004-043 (May 2004).

 

query categories

Next, the categorizer engine analyzes image results for a given query and their annotations to determine query categories (e.g., query categories). Query categories are determined using a classifier, and a query classifier can get realized using a machine learning system.

Use of Adaptive Boosting

By way of illustration, AdaBoost, short for Adaptive Boosting, is a machine learning system that can be used with other learning algorithms to improve their performance. AdaBoost gets used to generate a query categorization. (More learning algorithms are possible)

AdaBoost invokes a “weak” image annotator in a series of rounds. By way of illustration, the single-person query classifier can get based on a learning machine algorithm trained to determine whether a query calls for images of a single person.

By way of illustration, such a query classifier can get trained with data sets comprising a query, a set of feature vectors representing result images for the question with zero or more faces, and the correct categorization for the query (i.e., faces or not). For each call, the query classifier updates a distribution of weights that indicates the importance of examples in the training data set for the classification.

On each round, the weights of each classified training example get increased (or the consequences of each classified training example get decreased), so the new query categorization focuses more on those examples. The resulting trained query categorization can take as input a query and output a probability that the query calls for images containing single persons.

A diverse/homogeneous query classifier takes as input a query and outputs a probability that the query is for various images. The classifier uses a clustering algorithm to cluster image results according to their fingerprints based on a measure of distance from each other. Each image gets associated with a cluster identifier.

The image cluster identifier gets used to determine the number of clusters, the size of the groups, and the similarity between clusters formed by images in the result set. For example, this information gets used to associate a probability that the query is specific (or inviting duplicates) or not,

Associating Queries With Canonical Meanings And Representations

The query categorization can also get used to associate queries with canonical meanings and representations. For example, if there is a single large cluster or several large clusters, the probability of the question getting related to duplicate image results is high. If there are many smaller clusters, then the likelihood that the query gets associated with the same image results is low.

Duplicates of images are usually not very useful as they provide no more information, so they should get demoted as query results. But, there are exceptions. For example, if there are many duplicates in initial results (a few, large clusters), the query is particular, and duplicates should not get demoted.

A screenshot/non-screenshot query categorization takes as input a query and outputs a probability that the query calls for images that are screenshots. A text/non-text query classifier accepts as input a query and outputs a chance that the query calls for images that contain text.

A graph/non-graph query categorization takes an input of a query and outputs a probability that the query calls for images that contain a graph or a chart. A color query classifier 133f takes an information query and outputs a chance that the query calls shots that get dominated by a single color. Other query classifiers are possible.

Improving The Relevance Of Image Results Based On Query Categorization

A searcher can interact with the system through a client or other device. For example, the client device can be a computer terminal within a local area network (LAN) or a vast area network (WAN). The client device can be a mobile device (e.g., a mobile phone, a mobile computer, a personal desktop assistant, etc.) capable of communicating over a LAN, a WAN, or some other network (e.g., a cellular phone network).

The client device can include a random access memory (RAM) (or other memory and a storage device) and a processor.

The processor gets structured to process instructions and data within the system. The processor is a single-threaded or multi-threaded microprocessor having processing cores. The processor receives structured to execute instructions stored in the RAM (or other memory and a storage device included with the client device) to render graphical information for a user interface.

A searcher can connect to the search engine within a server system to submit an input query. The search engine is an image search engine or a generic search engine that can retrieve images and other types of content such as documents (e.g., HTML pages).

When the user submits the input query through an input device attached to a client device, a client-side question gets sent into a network and forwarded to the server system as a server-side query. The server system can be server devices in locations. A server device includes a memory device consisting of the search engine loaded therein.

A processor gets structured to process instructions within the device. These instructions can install components of the search engine. The processor can be single-threaded or multi-threaded and include many processing cores. The processor can process instructions stored in the memory related to the search engine and send information to the client device through the network to create a graphical presentation in the user interface of the client device (e.g., search results on a web page displayed in a web browser).

The server-side query gets received by the search engine. The search engine uses the information within the input query (such as query terms) to find relevant documents. The search engine can include an indexing engine that searches a corpus (e.g., web pages on the Internet) to index the documents found in that corpus. The index information for the corpus documents can be stored in an index database.

This index database can get accessed to identify documents related to the user. Note that an electronic copy (which will s get referred to as a document) does not correspond to a file. A record can get stored in a part of a file that holds other documents, in a single file dedicated to the document in question, or in many coordinated files. Moreover, a copy can get stored in a memory without being stored in a file.

The search engine can include a ranking engine to rank the documents related to the input query. The documents’ ranking can get performed using traditional techniques to determine an Information Retrieval (IR) score for indexed records given a given query.

Any appropriate method may determine the relevance of a particular document in a specific search term or to other provided information. For example, the general level of back-links to a document containing matches for a search term may get used to infer a document’s relevance.

In particular, if a document gets linked to (e.g., is the target of a hyperlink) by many other relevant documents (such as documents containing matches for the search terms), it can get inferred that the target document is particularly relevant. This inference can get made because the authors of the pointing papers presumably point, for the most part, to other documents that are relevant to their audience.

The pointing documents target links from other relevant documents, which can be considered more relevant. The first document is particularly appropriate because it targets applicable (or even highly relevant) documents.

Such a technique may determine a document’s relevance or one of many determinants. Appropriate methods can also get taken to identify and cut attempts to cast fraudulent votes to drive up the relevance of a page.

To further improve such traditional document ranking techniques, the ranking engine can receive more signals from a rank modifier engine to assist in determining an appropriate ranking for the documents.

In conjunction with image annotators and query categorization described above, the rank modifier engine provides relevance measures for the papers. The ranking engine can use to improve the search results’ ranking provided to the user.

The rank modifier engine can perform operations to generate the measures of relevance.

Whether an image result’s score increases or decreases depends on whether the image’s visual content (as represented in image annotations) matches the query categorization, each image category gets considered.

For example, if the query’s categorization is “single person,” then an image result that gets classified both as a “screenshot” and “single face” would first have its score decreased because of the “screenshot” category. It can then increase its score because of the “single face” category.

The search engine can forward the final, ranked result list within server-side search results through the network. Exiting the network, client-side search results can get received by the client device, where the results can get stored within the RAM and used by the processor to display the results on an output device for the user.

An Information Retrieval System

These components include an:

  • Indexing engine
  • Scoring engine
  • Ranking engine
  • Rank modifier engine

The indexing engine functions as described above for the indexing engine. The scoring engine generates scores for document results based on many features, including content-based features that link a query to document results and query-independent parts that generally state the quality of documents results.

Content-based features for images include aspects of the document that contains the picture, such as query matches to the document’s title or the image’s caption.

query scoring engine

The query-independent features include, for example, aspects of document cross-referencing of the paper or the domain or image dimensions.

Moreover, the particular functions used by the scoring engine can get tuned to adjust the various feature contributions to the final IR score, using automatic or semi-automatic processes.

The ranking engine ranks document results for display to a user based on IR scores received from the scoring machine and signals from the rank modifier engine.

The rank modifier engine provides relevance measures for the documents, which the ranking engine can use to improve the search results’ ranking provided to the user. A tracking component records user behavior information, such as individual user selections of the results presented in the order.

The tracking component gets embedded JavaScript code included in a web page ranking that identifies user selections of individual document results and identifies when the user returns to the results page, thus indicating the amount of time the user spent viewing the selected document result.

The tracking component is a proxy system through which user selections of the document results get routed. The tracking component can also include pre-installed software for the client (such as a toolbar plug-in to the client’s operating system).

Other implementations are also possible, for example, one that uses a feature of a web browser that allows a tag/directive to get included in a page, which requests the browser to connect back to the server with messages about links clicked by the user.

The recorded information gets stored in result selection logs. The recorded information includes log entries that state user interaction with each result document presented for each query submitted.

For each user selection of a result document presented for a query, the log entries state the query (Q), the paper (D), the user’s dwell time (T) on the document, the language (L) employed by the user, and the country (C) where the user is likely located (e.g., based on the server used to access the IR system) and a region code (R) identifying the metropolitan area of the user.

The log entries also record negative information, such as that a document result gets presented to a user but was not selected.

Other information such as:

  • Positions of clicks (i.e., user selections in the user interface
  • Information about the session (such as existence and type of previous clicks (Post-click session activity))
  • R scores of clicked results
  • IR scores of all results shown before click
  • Titles and snippets are displayed to the user before the click
  • User’s cookie
  • Cookie age
  • IP (Internet Protocol) address
  • User-agent of the browser
  • So on

The time (T) between the initial click-through to the document result and the users returning to the main page and clicking on another document result (or submitting a new search query) also gets recorded.

An assessment gets made about the time (T) about whether this time indicates a longer view of the document or a shorter one since more extended arguments generally show quality or relevance for the clicked-through result. This time assessment (T) can be made in conjunction with various weighting techniques.

The components shown can be combined in various manners and multiple system configurations. The scoring end tanking engines merge into a single ranking engine, such as the ranking engine. The rank modifier engine and the ranking engine can also get merged. In general, a ranking engine includes any software component that generates a ranking of document results after a query. Moreover, a ranking engine can fit a client system also (or rather than) in a server system.

Another example is the information retrieval system. The server system includes an indexing engine and a scoring/ranking engine.

In this system, A client system includes:

  • A user interface for presenting a ranking
  • A tracking component
  • Result selection logs
  • A ranking/rank modifier engine.

For example, the client system can include a company’s enterprise network and personal computers, in which a browser plug-in incorporates the ranking/rank modifier engine.

When an employee in the company initiates a search on the server system, the scoring/ranking engine can return the search results. An initial ranking or the actual IR scores for the results. The browser plug-in then re-ranks the results based on tracked page selections for the company-specific user base.

 

A Technique For  Query Categorization

This technique can be performed online (as part of query processing) or in an offline manner.

First image results responsive to the first query get received. Each of the first images gets associated with an order (such as an IR score) and a respective user behavior data (such as click data).

A number of the first images get selected where a metric for the respective behavior data for each selected image satisfies a threshold.

The selected first images get associated with several annotations based on the chosen first images’ content analysis. The image annotations can get persisted in image annotations.

Categories are then associated with the first query based on the annotations.

The query categorization associations can last in query categories.

Second image results responsive to a second query that is the same or the first query are then received.

(If the second query is not found in the query categorization, the second query can get transformed or “rewritten” to determine if an alternate form matches a query in the query categorization.)

In this example, the second query is the same as or can be rewritten as the first query.

The second image results are re-ordered based on the query categorization before being associated with the first query.

Query Categorization Based On Image Results is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/query-categorization-based-on-image-results/feed/ 0
Topicality Scores, Social Scores and User-Generated Content At Google https://gofishdigital.com/blog/topicality-scores-social-scores-and-user-generated-content/ https://gofishdigital.com/blog/topicality-scores-social-scores-and-user-generated-content/#respond Tue, 19 Apr 2022 06:14:06 +0000 https://gofishdigital.com/?p=5188 Just What is a Topicality Score? Topicality Scores give you an idea of what content on a webpage is about – what the topical subject of that page might be.  And they provide a way for Google to rank pages based on those topicality scores. A recent Google patent about searching was just published and […]

Topicality Scores, Social Scores and User-Generated Content At Google is an original blog post first published on Go Fish Digital.

]]>
Just What is a Topicality Score?

Topicality Scores give you an idea of what content on a webpage is about – what the topical subject of that page might be.  And they provide a way for Google to rank pages based on those topicality scores.

A recent Google patent about searching was just published and looks at Topicality Scores, Social Scores, and User Generated Content.

I have written about topicality scores at Google before.   The latest post was:  Topical Search Results at Google?

Search engines identify resources (e.g., images, audio, video, web pages, text, documents) relevant to a searcher’s needs and present information about the resources in a most helpful manner.

Related Content:

Search engines return search results in response to a searcher-submitted text query.

In response to an image search text query, the search engine returns a set of search results identifying resources responsive to the query.

A large number of search results can get returned for a given query.

It can get difficult for a searcher to choose the most relevant result or provide advice that the searcher is comfortable relying on.

A searcher may give more weight to search results associated with reviews, opinions, or other content related to the searcher’s social graph (e.g., contacts of the searcher) and other searchers.

These search results can get clouded by content associated with other searchers.  This may be when a search engine will look at Topicality Scores to better understand what those pages and the information on them are about.

Technologies For Searching

This patent describes technologies for searching, including topicality scores.

In general,  the subject matter from this patent includes:

  • Receiving a search query
  • Identifying potential search results responsive to the search query, the potential search results corresponding to digital content stored in computer-readable storage media
  • Deciding that the potential search results include user-generated content that gets generated using computer-implemented social services
  • Retrieving data associated with the searcher-generated content, the data including scores
  • Choosing, based on the scores, that the searcher-generated content is to get provided as a search result
  • In generating SERPs, the search results include web-based search results and at least a portion of the searcher-generated content.
  • Transmitting the search results to a client computing device for display to the searcher

Topicality Scores patent

These  can include the following features:

Topicality Scores

  • Determining a topicality score associated with the searcher-generated content is greater than or equal to threshold topicality scores, the topicality score being included in the scores, where determining that the searcher-generated content is to get provided as a search result occurs in response to determining that the topicality score associated with the searcher-generated content is greater than or equal to the threshold topicality score;
  • The topicality score indicates the degree to which the searcher-generated content pertains to the search query
  • And The topicality score indicates the degree to which the searcher-generated content relates to a matter of interest
    Actions further include determining that the searcher-generated content is recently generated content, wherein determining that the topicality score associated with the searcher-generated content is greater than or equal to the threshold topicality score occurs in response to determining that the searcher-generated content is recently generated content

Trending Search Queries

  • Deciding that the search query is a trending search query

User-Generated Content

  • Having the searcher-generated content is recently generated content, wherein determining that the topicality score associated with the searcher-generated content is greater than or equal to the threshold topicality score occurs in response to determining that the search query is a trending search query and determining that the searcher-generated content is recently generated content.

An Overall Score

  • Choosing that an overall score associated with the searcher-generated content is greater than or equal to an overall threshold score; the overall score gets included in the data, wherein determining that the searcher-generated content is to get provided as a search result occurs in response to determining that the overall score associated with the searcher-generated content is greater than or equal to the overall threshold score; actions further include determining that the search query is not a trending search query, wherein determining that the overall score associated with the searcher-generated content is greater than or equal to the overall threshold score occurs in response to determining that the search query is not a trending search query. The score reflects the quality of the searcher-generated content and the relevance of the searcher-generated content to the searcher

A Digital Image

  • Picking that the searcher-generated content comprises a digital image
  • Noting that the digital image is to get displayed within an image search results portion of the search results; actions further include determining that the searcher-generated content is without text associated with the digital image, wherein determining that the digital image is to get displayed within the image search results portion of the search results occurs in response to determining that the searcher-generated content is without text associated with the digital image; the searcher-generated content includes content generated by the searcher;

User-Generated Content Generated by An Author User

  • The searcher-generated content comprises content generated by an author user; the author user is a member of a social graph of the searcher; the searcher-generated content includes at least one an electronic message, text provided in a chat session, a post to a social networking service, a digital image; and the social computer-implemented services include at least one of:
  • Social networking services,
  • Electronic messaging service
  • Chat service
  • Micro-blogging service
  • Blogging service
  • Digital content sharing service.

 

This recently granted patent is at:

 

Selective presentation of content types and sources in search
Inventors: Daniel Belov, Matthew E. Kulick, Adam D. Bursey, David Yen, and Maureen Heymans
Assignee: GOOGLE LLC
US Patent 11,288,331
Granted: March 29, 2022
Filed: May 15, 2019

Abstract

Implementations of the present disclosure include actions of receiving a search query, identifying potential search results responsive to the search query, the potential search results corresponding to digital content stored in computer-readable storage media, determining that the potential search results include user-generated content that is generated using computer-implemented social services, receiving data associated with the user-generated content, the data including scores, determining, based on the scores, that the user-generated content is to be provided as a search result, generating search results, the search results including web-based search results and at least a portion of the user-generated content, and transmitting the search results to a client computing device for display to the searcher.

 

Aspects of this specification are directed to retrieving and displaying searcher-generated content in search results.

Searcher-generated content can include content that is generated using social computer-implemented services.

Social Computer-Implemented Services

Example social computer-implemented services can include:

  • Social networking service
  • Electronic messaging service
  • Chat service
  • Micro-blogging service
  • Blogging service
  • Digital content sharing service

User-Generated Content

The user-generated content can include:

Content provided in:

  • Electronic messages
  • Chat sessions
  • Posts to social networking services
  • Content posted to sharing services (e.g., photo sharing services)
  • Content posted to a blogging service.

For purposes of illustration, and by way of non-limiting example, implementations of the present disclosure will get discussed in the context of digital content generated and distributed by searchers of social networking services.

The present disclosure can be applied to other content types including, for example, electronic message content and chat content.

Search results can get generated based on a search query provided by a searcher. The search results can include publicly available content. The search results can consist of searcher-generated content. Searcher-generated content provides a range that the searcher and other searchers generate. Whether and how the searcher-generated content is displayed in the search results can get determined based on the characteristics of the searcher-generated content

Access Controlled Content

Searcher-generated content can include content that is access controlled. Access controlled content can consist of content that is associated with privacy settings such that only select users can access the content. Example access-controlled content can include content provided in electronic messages, chat sessions, and posts to social networking services. For example, an electronic message can have privacy settings.

The content of the electronic message is only accessible to the author of the electronic message and the recipients to whom the electronic message was sent to. As another example, a chat session can have privacy settings such that the content of the chat session is only accessible to the participants in the chat session. As another example, a post to a social networking service can have privacy settings such that the content of the post is only accessible to the author of the bar and to searchers whom the author has allowed access.

Author Users Associated With A Particular Searcher Can Get Identified Using A Social Graph

Author users associated with a particular searcher can bget identified using a social graph of the searcher. A social graph can refer to a single social chart or multiple interconnected social graphs as used in this specification. Different social graphs can get generated for different types of connections a user has. For example, a user can get connected with chat contacts in one social graph, electronic message contacts in a second social graph, and connections (or contacts) from a particular social networking service in a third-social chart.

Each social graph can include edges to additional individuals or entities at higher degrees of separation from the user. These contacts can, in turn, have other contacts at another degree of separation from the user. Similarly, a user’s connection to someone in a particular social network can then get used to identifying additional connections based on that person’s connections. The distinct social graphs can include edges connecting social graphs to other social graphs.

Types of Connections And Social Graphs

Types of connections and social graphs can include, but are not limited to other searchers in which the searcher is:

  • Direct contact (e.g., searcher mail or chat contact, direct contacts on social sites)
  • Indirect contact (e.g., friends of friends, connections of searchers that have a direct connection to the searcher).
  • The Content generated by individuals (e.g., blog posts, reviews).

The social graph can include connections within a single network or across multiple networks (separable or integrated). Public social graph relationships can also be considered. In some examples, public relationships can get established through public profiles and public social networking services.

Sources of the social graph information

The searcher’s social graph is a collection of connections (such as searchers, and resources) identified as having a relationship to the searcher within a specified degree of separation. The searcher’s social graph can include people and particular content at different degrees of separation.

For example, the social graph of a searcher can include:

  • Friends,
  • Friends of friends (e.g., as defined by a searcher, social graphing site, or another metric)
  • The searcher’s social circle
  • People followed by the searcher (such as subscribed blogs, feeds, or websites)
  • Co-workers
  • Fother specifically identified the content of interest to the searcher (e.g., particular websites)

The diagram shows a searcher and example connections that extend a searcher’s social graph to people and content both within a system and across external networks and shown at different degrees of separation. For example, a searcher can have a:

  • Profile or contacts list that includes a set of identified friends
  • Links to external resources (e.g., web pages)
  • Subscriptions to the content of the system (e.g., a system that provides various content and applications including e-mail, chat, video, photo albums, feeds, or blogs)

Each of these groups can get connected to other searchers or resources at another degree of separation from the searcher. For example, the friends of the searcher each have their own profile that includes links to resources as well as friends of the respective friends.

The Social Graph Of The Searcher

Connections to a searcher within a specified number of degrees of separation can get considered in the social graph of the searcher. The number of degrees of separation used in determining the searcher’s social graph can get specified by the searcher. A default number of degrees of separation is used. Moreover, a dynamic number of degrees of separation can get used that is based on, for example, the type of connection.

The membership and degree of separation in the social graph are based on other factors, including the frequency of interaction. For example, a frequency of interaction by the searcher (e.g., how often the searcher visits a particular social graphing site) or type of interaction (e.g., endorsing or selecting items associated with friends). As interaction changes, the relationship of a particular contact in the social graph can also dynamically change. Thus, the social graph can get dynamic rather than static.

Social signals can get layered over the social graph (e.g., using weighted edges or other weights between connections in the social graph). These signals, for example, frequency of interaction or type of interaction between the searcher and a particular connection, can then get used to weight particular connections in the social graph or social graphs without modifying the actual social graph connections. These weights can change as the interaction with the searcher changes.

Social graphs can get stored using suitable data structures (e.g., list or matrix type data structures). Information describing any aspect of a stored social graph can get considered relationship data. For example, relationship data can include information describing how particular members of a searcher’s social graph are connected to the searcher (e.g., through what social path is a particular entity connected to the searcher).

 

Social Signals In the Social Graph

Relationship data can also include information describing any relevant social signals incorporated in the searcher’s social graph. Relationship data can get stored in a relationship lookup table (e.g., a hash table).

Suitable keys for locating values (e.g., relationship data) within the lookup table can include information describing the respective identities of both a searcher and any member of the searcher’s social graph. For example, a suitable key for locating relationship data within the lookup table can get (Searcher X, Searcher Y), where Searcher Y is a member of Searcher X’s social graph.

 

social posts scores

 

 

 

Using Social Graph Information

The system identifies a searcher. The searcher can get identified, for example, based on a searcher profile associated with the system. The searcher profile can get identified, for example, when the searcher logs in to the system using a searcher name, electronic message address, or another identifier.

The system finds the searcher’s social graph. The searcher’s social graph identifies people and resources associated with the searcher, for example, in which the searcher has indicated an interest. The social graph is limited to a specified number of degrees of separation from the searcher or particular relationships or types of interaction with the searcher.

The searcher’s social graph is generated by another system and provided upon request. In some examples, the searcher’s social graph can get provided as an index that identifies each member of the searcher’s social graph and indicates services, through which the searcher and the member are connected (e.g., electronic message contacts, social networking contacts, etc.).

The Searcher’s Social Graph Is Determined Using Searcher Profile Data

To look at Topicality scores, the searcher’s social graph is determined using searcher profile data, as well as extracting information from searchers and resources identified in the searcher profile data. For example, the searcher’s profile can include a list of the searcher’s friends. The searcher’s friends can include friends within the system (e.g., using the same e-mail or chat service that is affiliated with the system) or external to the system (e.g., social graphs or a list of contacts associated with third-party applications or services providers). The searcher’s profile can also include a list of subscriptions to which the searcher belongs (e.g., identifying content that the searcher follows, for example, particular blogs or feeds).

The searcher’s profile can also include external links identified by the searcher. These links can identify particular content of interest. The searcher’s profile also identifies other aliases used by the searcher (e.g., as associated with particular content providers or social graph sources).

A searcher may have a first identity for a chat application and a second identity for a restaurant review website. These two identities can get linked together in order to unify the content associated with that searcher.

The social graph can get further expanded by extracting information from the identified people and content in the searcher’s profile. For example, public profile information can exist for identified friends from which information can get extracted (e.g., their friends, links, and subscriptions). The searcher can adjust the members of the social graph directly. For example, the searcher can group their contacts (e.g., e-mail contacts) into particular groups accessed by the system in building the searcher’s social graph.

Similarly, a searcher can prevent the system from adding members to the searcher’s social graph, for example, by an opt-out option or by keeping contacts out of the particular groups used by the system to generate the social graph. Privacy features provide a searcher with an opt-in or opt-out option to allow or prevent, respectively, being included (or remove the searcher if already included) as a member of another’s social graph. Thus, searchers can have control over what personal information or connection information, if any, is included in social graphs.

The System Can Identify Information Associated With The Searcher’s Social Graph

The system can identify information associated with the searcher’s social graph. The identified information associated with the searcher’s social graph can include, for example, content or postings to web resources subscribed to by the searcher (e.g., particular blogs and microblogs). The identified information can also include content generated by members of the searcher’s social graph. For example, members of a searcher’s social graph can generate content including local reviews (e.g., for restaurants or services), video reviews and ratings, product reviews, book reviews, blog comments, news comments, maps, public web annotations, public documents, streaming updates, photos, and photo albums.

The system can index the identified information associated with the searcher’s social graph for use in information retrieval. Identified information associated with the searcher’s social graph can get indexed by generating and incorporating suitable data structures, such as social restrictions, in an existing search index.

The system can generate social restrictions by mapping the identified information to corresponding web resources referenced in a search index and determining the social connection between the web resources and the searcher. For example, the system can access a relationship lookup table which includes relationship data describing a searcher’s social graph to determine such social connections. In some examples, social restrictions can get provided in the form of an information tag associated with a referenced web resource included in the search index.

 

Retrieving and Presenting Search Results Including Social Graph Information

The search system receives a search query from a searcher. For example, the searcher can input a search query into a search interface of a particular system. The search query includes terms and can get general or directed to particular types of resources (e.g., a web search or an image search).

The searcher can submit the search query from a client device. The client can get a computer coupled to the search system through a local area network (LAN) or wide area network (WAN), e.g., the Internet. The search system and the client device are single machines. For example, a searcher can install a desktop search application on the client device. The searcher can submit the search query to a search engine within the search system.

When the searcher submits the search query, the search query is transmitted through a network to the search system. The search system can get implemented as, for example, computer programs running on computers in locations that are coupled to each other through a network.

Retrieving Search Results Relevant To The Received Query

The search system retrieves search results including search results associated with the searcher’s social graph. For example, the system can retrieve search results including content generated by members of the searcher’s social graph. The search system can include a search engine for retrieving search results relevant to the received query. The search engine can include:

  • An indexing engine that indexes resources (e.g., web documents such as web pages, images, or news articles on the Internet) found in a corpus (e.g., a collection or repository of content)
  • A search index that stores the index information
  • A resource locator for identifying resources within the search index that are responsive to the query (for example, by implementing a query text matching routine)
  • In some examples, the search engine can also include a ranking engine (or other software) to rank web resources that match the query

The indexing and ranking of the web resources can get performed using conventional or other techniques. The identified information associated with the searcher’s social graph can get included in the same index as other resources or a separate index. Consequently, a separate search can get performed for general search results responsive to the query, as well as particular search results that identify resources associated with the searcher’s social graph (e.g., endorsed web resources).

The system presents search results including search results associated with the searcher’s social graph. For example, the search system can present search results representing content generated by members of the searcher’s social graph and the searcher themself.

The search engine can transmit retrieved search results through the network to the client device for presentation to the searcher, for example, as search results on a web page to get displayed in a web browser running on the client device. The search system presents responsive search results associated with the searcher’s social graph together in a cluster, separate from any general search results. The system presents search results associated with the searcher’s social graph intermixed with any retrieved general search results.

SERPs That Includes Results Associated With The Searcher’s Social Graph

The search results page displays example search results responsive to the example query “safari in Tanzania.” In the depicted example, the displayed search results include web search results and image search results. The web search results include search results. The search results are associated with resources (e.g., web pages) that are publicly accessible on the Internet.

The search result includes searcher-generated content that is deemed to get relevant to the search query. In the example, the search result includes access-controlled content provided as a post that is distributed using a social networking service. For example, the author user “Jane Friend” generated the post and distributed the post to select searchers. In the depicted example, the distribution for the post is provided as “Limited,” indicating that only searchers selected by the author user are able to access the post.

Consequently, “Jane Friend” is a member of the searcher’s social graph and the searcher has been identified in the distribution. In some examples, the distribution can include a public distribution, such that any searcher, whether the contact of the author user, is able to access the post.

The image results include responsive search results associated with images that are publicly available and images that are associated with a social graph of the user. For example, the image results can include images. In the depicted example, the images can include publicly available images and the image includes an image that is posted by a member of the searcher’s social graph. For example, the image can get an image posted by the searcher “Jane Friend,” who authored the post provided as the search result.

Searcher-Generated Content In SERPs Based On A Searcher’s Social Graph

The example components include a search component, a content data source, a searcher-generated content data source, and a profile data source. In some examples, the search component can get provided as computer programs executed using computing devices (e.g., servers). In some examples, each of the data sources can get provided as computer-readable storage devices (e.g., databases).

The search component can communicate with each of the data sources via a network (e.g., a local area network (LAN) or wide area network (WAN), the Internet). The search component receives searcher input, processes the searcher input based on data of provided from the data sources, and generates search results. The searcher input can get provided via a computing device (e.g., a client computing device) and the search results can get provided to the computing device for display to the searcher.

The search component can identify a searcher profile based on the searcher input and can retrieve profile data corresponding to the searcher from the profile data source. In some examples, the searcher profile data can include a contact index. The contact index can get used to identifying members of the searcher’s social graph. For example, the searcher’s social graph can include the searcher’s U.sub.1, . . . U.sub.n.

The searcher input can include a search query that is received by the search component. In response to receiving the search query, the search component can process data provided by the content data source and the searcher-generated data source to generate search results. In some examples, in response to receiving the search query, the search component can retrieve the contact index 510 corresponding to the searcher that provided the search query (e.g., based on the searcher’s log-in information).

 

Accessing The Searcher-Generated Data Source

The search component can access the searcher-generated data source to retrieve searcher-generated content that may get relevant to the search results and that the search searcher is allowed access to. In some examples, the searcher-generated content can include electronic messages, chats, posts to social networking services, blog posts, and micro-blog posts.

The searcher-generated content can get content that is generated by members of the searcher’s social graph or content that is generated by the searcher themselves.

The search component can receive the searcher-generated content and data associated with the searcher-generated content. The search component can determine whether particular searcher-generated content is to get provided as search results. In some examples, and as discussed in further detail herein, the search component can determine whether and how to display particular searcher-generated content as search results based on the parameters. In some examples, whether the particular searcher-generated content is to get displayed can get determined based on the search query.

By way of non-limiting example, the searcher-generated content can include a post that is posted to a social networking service. Example data associated with the post can include a timestamp,  topicality scores (TS), and post scores (PS) (also referred to as an overall score).

The timestamp indicates the time that the post was distributed to the social networking service. In some examples, the timestamp indicates a time when an event occurred to the post. Example events can include a comment on the post, a re-sharing of the post, and an endorsement of the post.

 

The Topicality Score Indicates The Degree To Which The Content Pertains To The Search Query

Topicality scores can indicate the degree to which the content f the post pertains to the search query. In some examples, topicality scores can indicate the degree to which the content of the post pertains to a matter of interest. In some examples, content can pertain to a matter that is recently in the news.

For example, a matter of interest can include a natural disaster and can get a frequent topic of content distributed on the Internet within a given time period. If the content of the post relates to the natural disaster, the post may get deemed to get topical and can have associated topicality scores reflecting this.

 

The Post Score and Topicality Scores

In some examples, the post score (or overall score) reflects the quality of the post and the relevance of the post to the particular searcher. For example, the post can have a first post score associated therewith that reflects the quality of the post and the relevance of the post to a first searcher. The post can have a second post score associated therewith that reflects the quality of the post and the relevance of the post to a second searcher. The first post score and the second post score can get different from one another.

The topicality scores and the post scores are generated by a scoring service and can get provided to the searcher-generated content data store.

Whether the searcher-generated content is to get displayed in the search results can get determined based on the search query. It can get determined whether the search query provided by the searcher is a trending search query.

A Trending Search Query

A trending search query can include a search query that is frequently provided to a searching service for a given period of time. By way of non-limiting example, a first search query can get provided to the searching service X times by various searchers within the last Y days. A second search query can get provided to the searching service Z times by various searchers within the last Y days. A first frequency can get determined based on X and a second frequency can get determined based on Z.

The first frequency and the second frequency can get compared to a threshold frequency. If a frequency is greater than or equal to the threshold frequency, the associated search query can get deemed to get a trending search query. For example, the first frequency is greater than or equal to the threshold frequency and the second frequency is less than the threshold frequency. Consequently, the first search query is determined to get a trending search query, and the second search query is not determined to get a trending search query.

Searcher-generated content can get identified as a potential search result based on the relevance of the searcher-generated content to the search query. In some examples, if the identified searcher-generated content is determined to get sufficiently recent and is determined to get sufficiently topical, the searcher-generated content is displayed as a search result.

If the searcher-generated content is not deemed to get sufficiently recent or the searcher-generated content is not deemed to get sufficiently topical, it can get determined whether the search query used to identify the searcher-generated content as a potential search result is a trending query. If the search query is a trending query if the searcher-generated content is deemed to et somewhat recent and if the searcher-generated content is determined to get somewhat topical, the searcher-generated content is displayed as a search result.

If The Search Query Is Not A Trending Query

If the query is not a trending query, if the searcher-generated content is not deemed to get somewhat recent or if the searcher-generated content is not determined to get somewhat topical, and, if the post score of the searcher-generated content is greater than or equal to a threshold post score, the searcher-generated content is displayed as a search result.

If the search query is not a trending query, if the searcher-generated content is not deemed to get somewhat recent or if the searcher-generated content is not determined to get somewhat topical, and if the post score of the searcher-generated content is less than a threshold post score, the searcher-generated content is not displayed as a search result.

In some examples, whether searchser-generated content is sufficiently recent can get determined based on a current time (t.sub.CURR), the timestamp of the searcher-generated content (t.sub.POST), and a first threshold (t.sub.THR1).

The current time is provided as the time at which the search query is submitted by the searcher In some examples, a time difference (t.sub.DIFF) can get determined as a difference between the current time and the timestamp of the searcher-generated content. If the time difference is less than the first threshold, the searcher-generated content can get determined to get sufficiently recent.

Whether searcher-generated content is somewhat recent can get determined based on the current time, the timestamp of the searcher-generated content, and a second threshold (t.sub.THR2). In some examples, if the time difference is less than the second threshold, the searcher-generated content can get determined to get somewhat recent. In some examples, the first threshold is less than the second threshold.

Whether Searcher-Generated Content Has Sufficient Topicality Scores

Whether searcher-generated content is sufficiently topical can get determined based on a topicality score of the searcher-generated content (TS.sub.POST) and a first topicality score threshold (TS.sub.THR1). If the topicality score of the searcher-generated content is less than the first topicality score threshold, the searcher-generated content can gete determined to get sufficiently topical.

Whether searcher-generated content is somewhat topical can get determined based on topicality scores of the searcher-generated content and a second topicality score threshold (TS.sub.THR2). If the topicality scores of the searcher-generated content are less than the second topicality score threshold, the searcher-generated content can get determined to get somewhat topical. In some examples, the first topicality score threshold is greater than the second topicality score threshold.

If it is determined that the searcher-generated content is to get displayed in the search results, how and where the searcher-generated content is displayed can get determined. In some examples, the searcher-generated content can get displayed at the bottom of a search results page. In some examples, the searcher-generated content can get displayed within other search results (e.g., in the middle of a search results page).

By way of non-limiting example, if the time difference, discussed above, is less than a third threshold (t.sub.THR3) and the topicality score is greater than or equal to a third threshold topicality score (TS.sub.THR3), the searcher-generated content is provided within other search results (e.g., in the middle or towards the top of a search results page).

The first threshold is equal to the third threshold. In some examples, the topicality scores threshold is equal to the third topicality score threshold. It can get determined that the searcher-generated content of the search result is associated with a time difference that is less than the third threshold and topicality scores that are greater than or equal to the third threshold topicality scores.

Consequently, the Searcher-generated content of the search result is displayed in line with the other search results.

 

Searcher-Generated Content That Includes An Image

Searcher-generated content that includes an image can get analyzed to determine where to display the searcher-generated content within the search results. If the searcher-generated content includes a single image and text, the searcher-generated content can get displayed as a web search result. If the searcher-generated content includes images without text, the image can get displayed within the image search results.

The image can get an image that was provided in a post that was distributed using a social networking service and that did not include text. Consequently, the image is displayed in the image search results instead of the underlying post getting displayed as a search result in and of itself. If the searcher-generated content includes a plurality of images with text, the searcher-generated content can get displayed as a web search result web the images can get displayed as image search results.

An Account With The Searcher’s Confidential Or Non-Public Searcher-Generated Content

A searcher may provide permission (e.g., to a search engine) to access an account containing the searcher’s confidential or non-public searcher-generated content. The searcher may give a search engine permission to access an electronic messaging account, a calendar, a cloud drive, and so forth. The search engine may:

  • Index messages or other content in the account
  • Retrieve messages or other content that match a search query
  • Present these messages, or portions thereof, in search results

If an input search query does not specifically request electronic messaging content (e.g., if the query were to read “biking in Tahoe” only), the search engine may still make confidential or non-public search content available to the searcher. A search query (e.g., “biking in Tahoe”), does not include the option to identify the type of searcher-generated content that it contains. For example, the option can specify electronic messages.

Additional options may get is available to provide relevant content, e.g., from a searcher’s online calendar, cloud drive, and so forth.

Selecting a corresponding option displays the corresponding content. For example, selecting the option to view electronic messages may cause the display of portions of electronic messages. Selecting a displayed electronic message may direct the searcher to their messaging account to view the entire contents of that message. The same may get true for other types of content, such as calendar content and cloud drive documents.

Processes Involving Topicality Scores From The Present

For convenience, the topicality scores process will get described using a system including computing devices that performs the process.

  • The ID of the searcher is determined
  • And the ID of the searcher can get determined based on searcher log-in information (e.g., searcher name and password)
  • A contact index corresponding to the searcher ID is retrieved
  • A search query is received
  • Whether the search query is a trending search query
  • If the search query is a trending search query, a trending search query indicator is set

Whether Search Results Include Searcher-Generated Content

Search results are generated and are received. It is determined whether the search results include searcher-generated content. In the example context, it is determined whether the search results include digital content (e.g., posts) distributed by contacts of the searcher within a computer-implemented social networking service. If the search results do not include searcher-generated content, the search results are displayed.

If the SERPs include searcher-generated content, it is determined whether the searcher-generated content is to get displayed in the search results. In the example context, it is determined whether digital content (e.g., posts) distributed by contacts of the searcher within the computer-implemented social networking service is to get displayed.

If the searcher-generated content is not to get displayed, the searcher-generated content is removed from the search results and the search results are displayed. If it is determined that the searcher-generated content is to get displayed, the searcher-generated content is blended with the other search results and the search results are displayed.

 

 

 

 

Topicality Scores, Social Scores and User-Generated Content At Google is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/topicality-scores-social-scores-and-user-generated-content/feed/ 0
Processing and Editing Natural Language Queries https://gofishdigital.com/blog/processing-and-editing-natural-language-queries/ https://gofishdigital.com/blog/processing-and-editing-natural-language-queries/#respond Mon, 11 Apr 2022 15:10:20 +0000 https://gofishdigital.com/?p=5148 A newly granted Google patent has come out about processing and editing natural language queries. As seen from Google, patents assigned to the search engine provide us with insights about processes developed at Mountain View, Ca. This one lets us look at how Google is working on editing natural language queries. Last May, I wrote […]

Processing and Editing Natural Language Queries is an original blog post first published on Go Fish Digital.

]]>
A newly granted Google patent has come out about processing and editing natural language queries.

As seen from Google, patents assigned to the search engine provide us with insights about processes developed at Mountain View, Ca. This one lets us look at how Google is working on editing natural language queries.

Last May, I wrote about another Google Patent in a post worth revisiting called: Natural Language Query Responses.

Related Content:

Computing systems capable of editing natural language queries interpret the user’s statement and immediately take some action, such as performing a search or generating an item.

But, if the machine interpretation of the user’s statement is off by a single word or a slight nuance, the interpretation of the statement can be completely wrong, useless, and even detrimental. To remedy this, existing systems require the user to repeat the entire statement, possibly varying a few words, to achieve the desired result.

 

 

Systems and methods get disclosed herein for editing natural language queries. A receiver circuitry receives the natural language query from a user. A natural language interpreter circuitry parses the natural language query to convert the natural language query into several categories and several variables. Each variable in the number of variables corresponding to one class in the number of types.

A user interface displays the number of categories and the number of variables. It allows the user to change at least one variable in the number of variables by providing a natural language utterance.

Processing Natural Language Queries

editing natural language queries

Another aspect relates to a system including means for editing natural language queries. Receiving means receiving the natural language query from a user.

Natural language interpreting means parsing the natural language query to convert the natural language query into several categories and several variables. Each variable in the number of variables corresponding to one class in the number of types.

Interfacing means displaying the number of categories and the number of variables and allowing the user to change at least one variable in the number of variables by providing a natural language utterance.

The natural language query is a request to display a list of files on a web-based storage system. The number of categories may include at least two of:

  • File type
  • File owner
  • time
  • location

The method may further include means for filtering a number of user files on the web-based storage system based on the prevalence of categories and most variables.

The user modifies at least one variable by selecting at least one variable and speaking a phrase to replace at least one variable. The system may further comprise means for allowing the user to change the natural language query by saying a word to add one or more categories and variables to the natural language query.

The system may further comprise means for determining whether to update the natural language query or generate a new query based on many categories and variables in the natural language utterance provided by the user.

Allowing the user to change at least one variable may get a modification to the natural language query, and the means for allowing the user to modify at least one variable may further enable the user to undo the change to return to the natural language query.

Filtering Lists of Natural Language Queries

filtering lists of natural language Queries

The system includes filtering a list of items based on categories and variables, providing the filtered list of items to the user, and flagging items in the filtered list of things in response to receiving a user request to flag the items.

In response to receiving a user input indicative of a request for a machine-generated natural language query that would result in the number of categories and the number of variables, the interfacing means may further provide the machine-generated natural language query to the user.

Editing Natural Language Queries

A system for editing natural language queries – in particular, a device gets described that allows for efficient processing and editing of queries in a natural language format. But, it will get understood by one of ordinary skill in the art that the systems and methods described herein may get adapted and modified as is appropriate for the application getting addressed and that the systems and methods described herein may get employed in other suitable applications, and that such other additions and modifications will not leave from the scope thereof.

Generally, the computerized systems described herein may comprise one or more engines, which include a processing device or devices, such as a computer, microprocessor, logic device, or other device or processor that gets configured with hardware, firmware, and software to carry out one or more of the computerized methods described herein.

The patent provides for editing and processing queries in a natural language format. The device described is easy to use and allows a user to give instructions to a device for displaying and organizing documents. The systems and methods described overcome many technical difficulties associated with existing natural language interpreters and get described as a web-based storage system, which may communicate with other systems over a network to store and share user data.

In general, one of the ordinary skills in the art will understand that the systems and methods described herein apply to systems that get interconnected without departing from the scope thereof.

 

Systems and methods for editing and replaying natural language queries
Inventors: Robert Brett Rose, Gregory Brandon Owen, and Keith Charles Bottner
Assignee: Google LLC
US Patent: 11,288,321
Granted: March 29, 2022
Filed: May 30, 2019

Abstract

A sign of a first natural language utterance identifying a user request gets received. A natural language query gets generated based on the first natural language utterance. The natural language query comprises

(i) a plurality of categories, and
(ii) a plurality of variables.

A sign of a second natural language utterance identifying a modification to the user request gets received. Whether to change the natural language query or to generate a new natural language query based on the second natural language utterance gets determined.

Responsive to determining that the natural language query is to get modified based on the second natural language utterance, at least one of the plurality of variables of the plurality of categories of the natural language query gets modified to correspond to the second natural language utterance. A response to the user request gets provided based on the modified natural language query.

 

Editing Natural Language Queries Conclusion

The earlier patent on Natural language processing goes into more detail on grammar rules. This one clarifies that the Google search engine uses natural language processing in the communication between it and humans using it to search. I’ve summarized the summary of the patent., skipping past the legal analysis behind the patent filing.  I have also been writing about Human to computer dialog patents from Google which also focus on how people and computers interact with each other.  This is a movement towards better input and question answering with the search engine.

 

 

 

Processing and Editing Natural Language Queries is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/processing-and-editing-natural-language-queries/feed/ 0
Document Processing Using Structured Key-Value Pairs https://gofishdigital.com/blog/document-processing-using-structured-key-value-pairs/ https://gofishdigital.com/blog/document-processing-using-structured-key-value-pairs/#respond Thu, 31 Mar 2022 13:04:21 +0000 https://gofishdigital.com/?p=5124 Why Key-Value Pairs in this document processing system? Writing this post reminded me of a 2007 post I wrote about local search and structured data where Key-value pairs were an important aspect of that 2007 patent.  The post was: Structured Information in Google’s Local Search. t struck me as interesting to see Google writing about […]

Document Processing Using Structured Key-Value Pairs is an original blog post first published on Go Fish Digital.

]]>
Why Key-Value Pairs in this document processing system?

Writing this post reminded me of a 2007 post I wrote about local search and structured data where Key-value pairs were an important aspect of that 2007 patent.  The post was:

Structured Information in Google’s Local Search.

t struck me as interesting to see Google writing about inserting key-value pairs in a document processing system like the one here, with a machine Learning approach at its heart, getting into technical SEO.

Related Content:

The usages of key-value Ppairs are still important now after 15 years.

 

Document Processing At Google

document processing with key value pairs

Understanding document processing (e.g., invoices, pay stubs, sales receipts, and the like) is a crucial business need. A large fraction (e.g., 90% or more) of enterprise data gets stored and represented in unstructured documents. Extracting structured data from records can be expensive, time-consuming, and error-prone.

This patent describes a document processing parsing system and a method implemented as computer programs on computers in locations that convert unstructured documents to structured key-value pairs.

The parsing system gets configured to document processing to identify “key” textual data and corresponding “value” textual data in the paper. The key defines a label that characterizes (i.e., is descriptive of) a corresponding value.

For example, the key “Date” may correspond to the value “2-23-2019”.

There is a method performed by data processing apparatus, which provides an image of a document to a detection model, wherein: the detection model gets configured to process the image by values of a plurality of detection model parameters to generate an output that defines bounding boxes generated for the idea.

Each bounding box generated for the image gets predicted to enclose a key-value pair comprising critical textual data and value textual data, wherein the necessary textual data defines a label that characterizes the textual value data.

Each of the bounding boxes generated for the image: identifies textual information enclosed by the bounding box using an optical character recognition technique; determining whether the textual data held by the bounding box defines a key-value pair; and in response to determining that the textual data enclosed by the bounding box represents a key-value pair, providing the key-value couple for use in characterizing the document.

The detection model is a neural network model.

The neural network model comprises a convolutional neural network.

The neural network model gets trained on a set of training examples. Each training example comprises a training input and a target output; the training input includes a training image of a training document. The target output contains data defining bounding boxes in the training image enclosing a respective key-value pair.

The document is an invoice.

document processing - customer invoice

Providing an image of a document to a detection model comprises: identifying a particular class of the paper; and providing the idea of the document to a detection model that gets trained to process copies of the specific type.

  • Determining whether the textual data enclosed by the bounding box defines a key-value pair comprises:
  • Deciding that the textual information possessed by the bounding box includes a key from a predetermined set of valid keys;
  • Finding a type of a part of textual data held by the bounding box that does not have the key; identifying a location of suitable varieties for values corresponding to the key
  • Choosing that the style of the part of the textual data enclosed by the bounding box that does not include the key gets included in the set of valid types for values corresponding to the key.
  • Learning that a set of valid types for values corresponding to the key comprises: mapping the key to the collection of suitable kinds for values corresponding to the key using a predetermined mapping.

The set of valid keys and the mapping from keys to corresponding locations of suitable types for values corresponding to the keys get provided by a user.

The bounding boxes have a rectangular shape.

The method further comprises: receiving the document from a user; and converting the paper to an image, wherein the painting depicts the document.

A method performed by the document processing system, the method comprising:

  • Providing an image of a document to a detection model configured to process the image to identify in the image bounding boxes predicted to enclose a key-value pair comprising critical textual data and value textual data, wherein the key defines a label that characterizes a value corresponding to the key; for each of the bounding boxes generated for the image,
  • Identifying textual data enclosed by the bounding box using an optical character recognition technique and determining whether the textual information held by the bounding box defines a key-value pair
  • Outputting the key-value team for use in characterizing the document.

The detection model is a machine learning model with parameters that can be trained on a training data set.

The machine learning model comprises a neural network model, particularly a convolutional neural network.

The machine learning model gets trained on a set of training examples, and each training example has a training input and a target output.

 

The training input comprises a training image of a training document. The target output includes data-defining bounding boxes in the training image that each encloses a respective key-value pair.

The document is an invoice.

Providing an image of a document to a detection model comprises: identifying a particular class of the paper; and providing the idea of the document to a detection model that gets trained to process documents of the specific type.

Is It a Key-Value Pair?

Determining whether the textual data enclosed by the bounding box defines a key-value pair means:

  • Deciding that the textual information possessed by the bounding box includes a key from a predetermined set of valid keys
  • Finding a type of a part of textual data held by the bounding box that does not have the key
  • Noting a location of suitable varieties for values corresponding to the key
  • Picking that the style of the part of the textual data enclosed by the bounding box that does not include the key gets included in the set of valid types for values corresponding to the key.

Identifying a set of valid types for values corresponding to the key comprises: mapping the key to the collection of proper kinds for values corresponding to the key using a predetermined mapping.

The set of valid keys and the mapping from keys to corresponding locations of suitable types for values corresponding to the keys get provided by a user.

The bounding boxes have a rectangular shape.

The method further comprises: receiving the document from a user; and converting the paper to an image, wherein the painting depicts the document.

According to another aspect, there is a system comprising:  computers; and storage devices coupled to computers, wherein the storage devices store instructions that, when executed by computers, cause computers to perform operations comprising the operations of the earlier described method.

Advantages Of This Document Processing Approach

document processing-flowchart

The system described in this specification can get used to convert large numbers of unstructured documents into structured key-value pairs. Thus, the system obviates the need for extracting structured data from unstructured documents, which can be expensive, time-consuming, and error-prone.

The system described in this specification can identify key-value pairs in documents with a high level of accuracy (e.g., for some types of documents, with greater than 99% accuracy). Thus, the system may be suitable for deployment in applications (e.g., processing financial documents) that need a high level of accuracy.

The system described in this specification can generalize better than some conventional systems, i.e., it has improved generalization capabilities compared to some traditional methods.

In particular, by leveraging a machine-learned detection model trained to recognize visual signals that distinguish key-value pairs in documents, the system can identify key-value pairs of the specific style, structure, or content of the papers.

The Identifying Key-Value Pairs in Document Processing Patent

Identifying key-value pairs in documents
Inventors: Yang Xu, Jiang Wang, and Shengyang Dai
Assignee: Google LLC
US Patent: 11,288,719
Granted: March 29, 2022
Filed: February 27, 2020

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for converting unstructured documents to structured key-value pairs.

In one aspect, a method comprises: providing an image of a document to a detection model, wherein: the detection model gets configured to process the image to generate an output that defines bounding boxes generated for the image; and each bounding box generated for the image gets predicted to enclose a key-value pair comprising key textual data and value textual data, wherein the key textual data defines a label that characterizes the value textual data, and for each of the bounding boxes generated for the image: identifying textual data enclosed by the bounding box using an optical character recognition technique, and determining whether the textual data enclosed by the bounding box defines a key-value pair.

An Example Parsing System

The parsing system is an example of a method implemented as computer programs on computers in locations where the systems, components, and techniques described below get implemented.

The parsing system gets configured to process a document (e.g., an invoice, pay stub, or sale receipt) to identify key-value pairs in the paper. A “key-value pair” refers to a key and a corresponding value, generally textual data. “Textual data” should get understood to refer to at least: alphabetical characters, numbers, and special symbols. As described earlier, a key defines a label that characterizes a corresponding value.

The system may receive the document in a variety of ways.

For example, the system can receive the paper as an upload from a remote system user over a data communication network (e.g., using an application programming interface (API) made available by the system). The document can get represented in any appropriate unstructured data format, for example, as a Portable Document Format (PDF) document or as an image document (e.g., a Portable Network Graphics (PNG) or Joint Photographic Experts Group (JPEG) document).

Identify Key-Value Pairs In Document Processing

The system uses a detection model, an optical character recognition (OCR) engine, and a filtering engine to identify key-value pairs in document processing.

The detection model gets configured to process an image of the document to generate an output that defines bounding boxes in the picture. Each gets predicted to enclose textual data representing a respective key-value pair. That is, each bounding box gets expected to have textual information that defines:

(i) a key, and
(ii) a value corresponding to the key. For example, a bounding box may enclose the textual data “Name: John Smith,” which defines the key “Name” and the corresponding value “John Smith.” The detection model may be configured to generate bounding boxes that enclose a single key-value pair (i.e., rather than many key-value couples).

The image of the document is an ordered collection of numerical values that represent the paper’s visual appearance. The image may be a black-and-white image of the document. In this example, the picture may get described as a two-dimensional array of numerical intensity values. As another example, the image may be a color image of the document. In this example, the picture may get represented as a multi-channel image. Each channel corresponds to a respective color (e.g., red, green, or blue) and gets defined as a two-dimensional array of numerical intensity values.

The bounding boxes may be rectangular bounding boxes. A rectangular bounding box may get represented by the coordinates of a particular corner of the bounding box and the corresponding width and height of the bounding container. More generally, other bounding box shapes and other ways of representing the bounding boxes are possible.

While the detection model may recognize and use any frames or borders present in the document as visual signals, the bounding boxes are not constrained to align (i.e., be coincident) with any existing structures of boundaries current in the paper. Moreover, the system may generate the bounding boxes without displaying the bounding boxes in the image of the document.

That is, the system may generate data defining the bounding packages without giving a visual sign of the position of the bounding boxes to a user of the system.

The detection model is generally a machine learning model, that is, a model having a set of parameters that can get trained on a set of training data. The training data includes many training examples, each of which includes:

(i) a training image that depicts a training document, and
(ii) a target output that defines bounding boxes enclose a respective key-value pair in the training image.

The training data may get generated by manual annotation, that is, by a person identifying bounding boxes around key-value pairs in the training document (e.g., using an appropriate annotation software).

Training the detection model using machine learning techniques on a set of training data enables it to recognize visual signals that will allow it to identify key-value pairs in documents. For example, the detection model may be trained to recognize local signals (e.g., text styles and the relative spatial positions of words) and global signals (e.g., the presence of borders in the document) to identify key-value pairs.

The visual cues that enable the detection model to remember key-value teams in records generally do not include signals representing the explicit meaning of the words in the document.

Visual Signals That Distinguish Key-Value Pairs

Training the detection model to recognize visual signals that distinguish key-value pairs in documents enables the detection model to “generalize” beyond the training data used to prepare the detection model. The trained detection model might process an image depicting a document to generate bounding boxes enclosing key-value pairs in the paper even if the copy was not included in the training data used to train the detection model.

In one example, the detection model may be a neural network object detection model (e.g., including convolutional neural networks), where the “objects” correspond to key-value pairs in the document. The trainable parameters of the neural network model include the weights of the neural network model, for example, weights that define convolutional filters in the neural network model.

The neural network model may get trained on the training data set using an appropriate machine learning training procedure, for example, stochastic gradient descent. In particular, at each training iteration, the neural network model may process training images from a “batch” (i.e., a set) of training examples to generate bounding boxes predicted to enclose respective key-value pairs in the training images. The system may test an aim function that characterizes a measure of similarity between the bounding boxes generated by the neural network model and the bounding boxes specified by the corresponding target outputs of the training examples.

The measure of similarity between two bounding boxes may be, for example, a sum of squared distances between the respective vertices of the bounding boxes. The system can determine gradients of the aim function won the neural network parameter values (e.g., using backpropagation) and after that use the slopes to adjust the current neural network parameter values.

In particular, the system can use the parameter update rule from any appropriate gradient descent optimization algorithm (e.g., Adam or RMSprop) to adjust the current neural network parameter values using the gradients. The system trains the neural network model until a training termination criterion is met (e.g., until a predetermined number of training iterations have been performed or a change in the value of the object aim function between training iterations falls below a predetermined threshold).

Before using the detection model, the system may identify a “class” of the document (e.g., invoice, pay stub, or sales receipt). A user of the system may identify the class of the record upon providing the document to the system. The method may use a classification neural network to classify the class of the paper. The system may use OCR techniques to identify the text in the document and, after that, place the document’s style based on the text in the document. In a particular example, in response to determining the phrase “Net Pay,” the system may identify the paper class as a “pay stub.”

In another particular example, in response to identifying the phrase “Sales tax,” the system may identify the class of the document as “invoice.” After identifying the particular class of the record, the system may use a detection model that gets trained to process copies of the specific class. The method may use a detection model that got trained on training data that included only documents of the same particular class as the document.

Using a detection model that gets trained to process documents of the same class as the document may enhance the performance of the detection model (e.g., by enabling the detection model to generate bounding boxes around key-value pairs with greater accuracy).

For each bounding box, the system processes the part of the image enclosed by the bounding box using the OCR engine to identify the textual data (i.e., the text) held by the bounding box. In particular, the OCR engine identifies the text enclosed by a bounding box by identifying each alphabetical, numerical, or unique character enclosed by the bounding box. The OCR engine can use any appropriate technique to identify the text surrounded by a bounding box.

The filtering engine determines whether the text enclosed by a bounding box represents a key-value pair. The filtering engine can decide if the text surrounding the bounding box represents a key-value pair appropriately. For example, the filtering engine may determine whether the text enclosed by the bounding box includes a valid key from a predetermined set of right keys for a given bounding box. For example, the collection of valid keys may consist of: “Date,” “Time,” “Invoice #,” “Amount Due,” and the like.

In comparing different portions of text to determine whether the text enclosed by the bounding box includes a valid key, the filtering engine may determine that two pieces of text are “matching” even if they are not identical. For example, the filtering engine may determine that two portions of the reader are matching even if they include different capitalization or punctuation (e.g., the filtering system may determine that “Date,” “Date:” “date,” and “date:” are all matching).

In response to determining that the text enclosed by the bounding box does not include a valid key from the right keys, the filtering engine determines that the text surrounded by the bounding box does not represent a key-value pair.

In response to determining that the text enclosed by the bounding box includes a valid key, the filtering engine identifies a “type” (e.g., alphabetical, numerical, temporal) of the part of text enclosed by the bounding box not identified as the key (i.e., the “non-key” text). For example, for a bounding box that has the text: “Date: 2-23-2019”, where the filtering engine identifies “Date:” as the key (as described earlier), the filtering engine may identify the type of the non-key text “2-23-2019” as being “temporal.”

Besides identifying the type of the non-key text, the filtering engine identifies a set of valid types for values corresponding to the key. In particular, the filtering engine may map the key to a group of helpful data types for values corresponding to the key by a predetermined mapping. For example, the filtering engine may map the key “Name” to the corresponding value data type “alphabetical,” indicating that the value corresponding to the key should have an alphabetical data type (e.g., “John Smith”).

As another example, the filtering engine may map the key “Date” to the corresponding value data type “temporal,” indicating that the value corresponding to the key should have a temporal data type (e.g., “2-23-2019” or “17:30:22”).

The filtering engine determines whether the type of the non-key text gets included in the set of valid kinds for values corresponding to the key. In response to determining that the style of the non-key text gets included in the collection of suitable types for values corresponding to the legend, the filtering engine determines that the text enclosed by the bounding box represents a key-value pair. In particular, the filtering engine identifies the non-key text as the value corresponding to the key. Otherwise, the filtering engine determines that the text enclosed by the bounding box does not represent a key-value pair.

The set of valid keys and the mapping from right keys to locations of helpful data types for values corresponding to the valid keys may get provided by a system user (e.g., through an API made available by the system).

After identifying key-value pairs from the text enclosed by respective bounding boxes using the filtering engine, the system outputs the identified key-value pairs. For example, the system can provide the key-value teams to a remote user of the system over a data communication network (e.g., using an API made available by the system). As another example, the system can store data defining the identified key-value pairs in a database (or other data structure) accessible to the system’s user.

In some cases, a system user may request that the system identify the value corresponding to the particular key in the document (e.g., “Invoice #”). In these cases, rather than identifying and providing every key-value pair in the record, the system may process the text placed in respective bounding boxes until the requested key-value team recognizes and executes the ordered key-value pair.

As described above, the detection model can get trained to generate bounding boxes that each enclose a respective key-value pair. Or, rather than using a single detection model, the system may include:

(i) a “key detection model” that gets trained to generate bounding boxes that enclose respective keys, and
(ii) a “value detection model” that gets trained to generate bounding boxes that enclose respective values.

The system can identify key-value pairs from the key bounding boxes and the value bounding boxes appropriately. For example, for each team of bounding boxes that includes a key bounding box and a value bounding box, the system can generate a “match score” based on:

(i) the spatial proximity of the bounding boxes,
(ii) whether the key bounding box encloses a valid key, and
(iii) whether the type of the value enclosed by the value bounding box gets included in a set of valid types for values corresponding to the key.

The system may identify the key enclosed by a key bounding box and the value surrounded by a value bounding box as a key-value pair if the match score between the key bounding box and the value bounding box exceeds a threshold.

An Example Of An Invoice Document

A user of the document processing system may provide the invoice (e.g., as a scanned image or a PDF file) to the parsing system.

Bounding boxes are generated by the detection model of the parsing system. Each bounding box is predicted to enclose textual data that defines a key-value pair. The detection model does not generate a bounding box that has text (i.e., “Thank you for your business!”) since this text does not represent a key-value pair.

The parsing system uses OCR techniques to identify the text inside each bounding box and thereafter identifies good key-value pairs enclosed by the bounding boxes.

The key (i.e., “Date:”) and the value(i.e., “2-23-2019”) enclosed by the bounding box.

Key-Value Pairs And Document Processing

A parsing system programmed by this specification can perform document processing.

The system receives a document as an upload from a remote system user over a data communication network (e.g., using an API made available by the system). The document can be represented in any appropriate unstructured data format, such as a PDF document or an image document (e.g., a PNG or JPEG document).

The system converts the document to an image, that is, an ordered collection of numerical values that represents the visual appearance of the paper. For example, the image may be a black-and-white image of the document that gets described as a two-dimensional array of numerical intensity values.

By a set of detection model parameters to generate an output that defines bounding boxes in the image of the document. Each bounding box gets predicted to enclose a key-value pair including critical textual data and value textual data, where the key defines a label that characterizes the value.

The detection model may be an object detection model that includes convolutional neural networks.

 

 

 

 

 

 

 

 

 

 

Document Processing Using Structured Key-Value Pairs is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/document-processing-using-structured-key-value-pairs/feed/ 0
Determining Dialog States For Language Models Updated https://gofishdigital.com/blog/determining-dialog-states-for-language-models-updated/ https://gofishdigital.com/blog/determining-dialog-states-for-language-models-updated/#respond Tue, 15 Mar 2022 16:00:52 +0000 https://gofishdigital.com/?p=5086 The First Claims of Determining Dialog States For Language Models   Chances are that you have seen human-to-computer dialog patents from Google.  I have written about some in the past.   Here are two that provide a lot of details about such dialog: Human to Computer Dialog at Google Unsolicited Content in Human to Computer Dialog […]

Determining Dialog States For Language Models Updated is an original blog post first published on Go Fish Digital.

]]>
The First Claims of Determining Dialog States For Language Models

determining Dialog States Using Language Modelss

 

Chances are that you have seen human-to-computer dialog patents from Google.  I have written about some in the past.   Here are two that provide a lot of details about such dialog:

In addition to looking carefully at patents involving human to computer dialog, it is worth spending time with Natural Language Processing, and communications between human beings and computers.  I have also written about a few of those.  Here are a couple of them:

This Google Determining Dialog States For Language Models patent has been updated twice now, with the latest version being granted earlier this week.  The lastest first claim is a little longer and has some new words added to it.

Related Content:

Ideally, these patents have to start with a deep look at the language of the claims.

The Second version of Determining dialog states for language models, filed on 18, 2018, and granted February 4, 2020, starts off with the following claim:

  • What is claimed is:
  • 1. A computer-implemented method, comprising:
  • Receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device
  • Determining an initial prediction for the unknown stage of the multi-stage voice dialog
    Providing, by the computing device and to a voice dialog system,
  • (i) the audio data for the voice input to the computing device and
  • (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog
  • Receiving, by the computing device and from the voice dialog system, a transcription of the voice input, wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog, wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and
  • (ii) additional information that describes a context of the voice input, and wherein the additional information that describes the context of the voice input is independent of content of
  • the voice input; and presenting the transcription of the voice input with the computing device.

The first version of this continuation patent, Determining dialog states for language models, filed March 16, 2016, and granted May 22, 2018, begins with this claim:

  • What is claimed is:
  • 1. A computer-implemented method, comprising:
  • Receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device
  • Determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to: (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and
    (ii) a set of n-grams
  • Receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device; selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state; biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and transcribing the voice input using the biased language model.

The most recent first claim in the latest version of this patent, Determining dialog states for language models, was filed January 2, 2020, and granted on March 1, 2022. It tells us:

  • What is claimed is:
  • 1. A computer-implemented method, comprising:
  • Obtaining transcriptions of voice inputs from a training set of voice inputs, wherein each voice input in the training set of voice inputs is directed to one of a plurality of stages of a multi-stage voice activity
  • Obtaining display data associated with each voice input from the training set of voice inputs that characterizes content that is designated for display when the associated voice input is received; generating a plurality of groups of transcriptions, wherein each group of transcriptions includes a different subset of the transcriptions of voice inputs from the training set of voice inputs
  • Assigning each group of transcriptions to a different dialog state of a dialog-state model that includes a plurality of dialog states, wherein each dialog state of the plurality of dialog states: corresponds to a different stage of the multi-stage voice activity; and is mapped to a respective set of the display data characterizing content that is designated for display when voice inputs from the training set of voice inputs that are associated with the group of transcriptions assigned to the dialog state are received; for each group of transcriptions, determining a representative set of n-grams for the group, and associating the representative set of n-grams for the group with the corresponding dialog state of the dialog-state model to which the group is assigned, wherein the representative set of n-grams determined for the group of transcriptions comprise n-grams-satisfying a threshold number of occurrences in the group of transcriptions assigned to the dialog state of the dialog-state model
  • Receiving a subsequent voice input and first display data characterizing content that was displayed on a screen when the subsequent voice input was received, the subsequent voice input directed toward a particular stage of the multi-stage voice activity
    Determining a match between the first display data and the respective set of display data mapped to the dialog state in the dialog-state model that corresponds to the particular stage of the multi-voice activity
  • Processing, with a speech recognizer, the subsequent voice input, and the first display data, including biasing the speech recognizer using the representative set of n-grams associated with the dialog state in the dialog-state model that corresponds to the particular stage of the multi-voice activity
    \

Comparing the Claims of the Determining Dialog States for Language Models

These are some of the differences that I am seeing with the different versions of the patent:

1. All three versions tell us that they are about”voice inputs,” which act as part of a training set.

So unlike the previous patents about Dialog states between humans and computers, which focused on the content of the dialog, this patent primarily looks at verbal language and actual voice inputs.

2. The second and third versions of the patent describe breaking transcripts of the voice inputs into ngrams, which can be helpful n calculating statistics about occurrences of the voice inputs used.

3. The claim of the newest and third version of the Patent etermining dialog states for language models mentions the use of a speed recognizer.

  • What is claimed is:
  • 1. A computer-implemented method, comprising: receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device; determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to:
  • (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and
  • (ii) a set of n-grams; receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device
  • Selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state
  • Biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state
  • Transcribing the voice input using the biased language model.

 

Determining dialog states for language models

Inventors: Petar Aleksic, and Pedro J. Moreno Mengibar
Assignee: Google LLC
U.S. Patent: 11,264,028
Granted: March 1, 2022
Filed: January 2, 2020

 

Abstract

Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

 

Determining Dialog States For Language Models Updated is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/determining-dialog-states-for-language-models-updated/feed/ 0
Identifying Entity Attribute Relations https://gofishdigital.com/blog/identifying-entity-attribute-relations/ https://gofishdigital.com/blog/identifying-entity-attribute-relations/#respond Wed, 02 Mar 2022 15:31:02 +0000 https://gofishdigital.com/?p=5013 This patent, granted March 1, 2022, is about identifying entity-attribute relationships in bodies of text. Search applications, like search engines and knowledge bases, try to meet a searcher’s informational needs and show the most advantageous resources to the searcher. Structured Data May Help With Identifying Attribute Relationships Better   Identifying attributes entity relationships gets done […]

Identifying Entity Attribute Relations is an original blog post first published on Go Fish Digital.

]]>

This patent, granted March 1, 2022, is about identifying entity-attribute relationships in bodies of text.

Search applications, like search engines and knowledge bases, try to meet a searcher’s informational needs and show the most advantageous resources to the searcher.

Structured Data May Help With Identifying Attribute Relationships Better

 
Identifying attributes entity relationships gets done in structured search results.
 
Structured search results present a list of attributes with answers for an entity specified in a user request, such as a query.
 
So, the structured search results for “Kevin Durant” may include attributes such as salary, team, birth year, family, etc., along with answers that provide information about these attributes.
 
Constructing such structured search results can need identifying entity-attribute relations.
 
An entity-attribute relation is a particular case of a text relation between a pair of terms.
 
The first term in the pair of terms is an entity, a person, place, organization, or concept.
 
The second term is an attribute or a string that describes an aspect of the entity.
 
Examples include:
  • “Date of birth” of a person
  • “Population” of a country
  • “Salary” of the athlete
  • “CEO” of an organization

Providing more information in content and schema (and structured data) about entities gives a search engine more information to explore better information about the specific entities, to test and collect data, disambiguate what it knows, and have more and better confidence about the entities that it is aware of.

Entity-Attribute Candidate Pairs

 This patent obtains an entity-attribute candidate pair to define an entity and an attribute, where the attribute is a candidate attribute of the entity.  In addition to learning from facts about entities in structured data, Google can use information by looking at the context of that information and learn from vectors and co-occurrence of other words and facts about those entities too. 
Take a look at the word vectors patent to get a sense of how a search engine may now get a better sense of the meanings and context of words and information about entities. (This is a chance to learn from patent exploration about how Google is now doing some  of the things it is doing.)  Google collects facts and data about the things it indexes and may learn about the entities that it has in its index, and the attributes it knows about them.
It does this in:
  • Determining, with sentences that include the entity and attribute, whether the attribute is an actual attribute of the entity in the entity-attribute candidate pair
  • Generating embeddings for words in the set of sentences that include the entity and the attribute
  • Creating, with known entity-attribute pairs, a distributional attribute embedding for the entity, where the distributional attribute embedding for the entity specifies an embedding for the entity based on other attributes associated with the entity from the known entity-attribute pairs
  • Based on embeddings for words in the sentences, the distributional attribute embedding for the entity, and for the attribute, whether the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair.

Embeddings For Words Get Made Of Sentences With The Entity And The Attribute

 Building a first vector representation specifying the first embedding of words between the entity and the point in the set of sentences
  • Making a second vector representation defining a double embedding for the entity based on the set of sentences
  • Constructing a third vector representation for a third embedding for the attribute based on the set of sentences
  • Picking, with a known entity attribute, combines a distributional attribute embedding for the entity, means making a fourth vector representation, using available entity-attribute pairs, specifying the distributional attribute embedding for the entity.
  •  Building a distributional attribute embedding with those known entity-attribute pairs means developing a fifth vector representation with available entity-attribute teams and the distributional attribute embedding for the attribute.
  •  Deciding, based on the embeddings for words in the set of sentences, the distributional attribute embedding for the entity, and the distributional attribute embedding for the attribute, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair
  • Determining, based on the first vector representation, the second vector representation, the third vector representation, the fourth vector representation, and the fifth vector representation, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair
  •  Choosing, from the first vector representation, the second vector representation, the third vector representation, the fourth vector representation, and the fifth vector representation, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair, get performed using a feedforward network.
  •  Picking, based on the first vector representation, the second vector representation, the third vector representation, the fourth vector representation, and the fifth vector representation, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair, comprises:
  •  Generating a single vector representation by concatenating the first vector representation, the second vector representation, the third vector representation, the fourth vector representation, and the fifth vector representation; inputting the single vector representation into the feedforward network
  • Determining, by the feedforward network and using the single vector representation, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair
 
Making a fourth vector representation, with known entity-attribute pairs, specifying the distributional attribute embedding for the entity comprises:
  • Identifying a set of attributes associated with the entity in the known entity-attribute teams, wherein the set of attributes omits the attribute
  • Generating a distributional attribute embedding for the entity by computing a weighted sum of characteristics in the set of attributes
 
Choosing a fifth vector representation, with known entity-attribute pairs, specifying the distributional attribute embedding for the attribute comprises
  • Identifying, using the attribute, a set of entities from among the known entity-attribute couples; for each entity in the collection of entities
  • Determining a set of features associated with the entity, where the location of attributes does not include the attribute
  • Generating a distributional attribute embedding for the entity by computing a weighted sum of characteristics in the collection of attributes

 The Advantage Of More Accurate Entity-Attribute Relations Over Prior Art Model-Based Entity-Attribute Identification

Earlier art entity-attribute identification techniques used model-based approaches such as natural language processing (NLP) features, distant supervision, and traditional machine learning models, which identify entity-attribute relations by representing entities and attributes based on data sentences. These terms appear.
 
In contrast, the innovations described in this specification identify entity-attribute relations in datasets by using information about how entities and attributes get expressed in the data within which these terms appear and by representing entities and attributes using other features that get known to get associated with these terms. This enables representing entities and attributes with details shared by similar entities, improving the accuracy of identifying entity-attribute relations that otherwise cannot be discerned by considering the sentences within which these terms appear.
 
For example, consider a scenario in which the dataset includes sentences that have two entities, “Ronaldo” and “Messi,” getting described using a “record” attribute, and a penalty where the entity “Messi” gets escribed using a “goals” attribute. In such a scenario, the prior art techniques may identify the following entity attribute pairs: (Ronaldo, record), (Messi, log), and (Messi, goals). The innovations described in this specification go beyond these prior art approaches by identifying entity-attribute relations that might not be discerned by how these terms get used in the dataset.
 
Using the above example, the innovation described in this specification determines that “Ronaldo” and “Messi” are similar entities because they share the “record” attribute and then represent the “record” attribute using the “goals” attribute. In this way, the innovations described in this specification, for example, can enable identifying entity-attribute relations, e.g., (Cristiano, Goals), even though such a relationship may not be discernible from the dataset.
 

 The Identifying Attribute Relationships Patent

identifying entitiy attribute relations flowchart

 
Inventors: Dan Iter, Xiao Yu, and Fangtao Li
Assignee: Google LLC
US Patent: 11,263,400
Granted: March 1, 2022
Filed: July 5, 2019
 
 
Abstract
 
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that ease identifying entity-attribute relationships in text corpora.
Methods include determining whether an attribute in a candidate entity-attribute pair is an actual attribute of the entity in the entity-attribute candidate pair.
This includes generating embeddings for words in the set of sentences that include the entity and the attribute and generating, using known entity-attribute pairs.
This also includes generating an attribute distributional embedding for the entity based on other attributes associated with the entity from the known entity-attribute pairs, and generating an attribute distributional embedding for the attribute based on known attributes associated with known entities of the attribute in the known entity-attribute pairs.
Based on these embeddings, a feedforward network determines whether the attribute in the entity-attribute candidate pair is an actual attribute of the entity in the entity-attribute candidate pair.

Identifying Entity Attribute Relationships In Text

 
A candidate entity-attribute pair (where the attribute is a candidate attribute of entity) is input to a classification model. The classification model uses a path embedding engine, a distributional representation engine, attribute engine, and a feedforward network. It determines whether the attribute in the candidate entity-attribute pair is an essential entity in the candidate entity-attribute pair.
 
The path embedding engine generates a vector representing an embedding of the paths or the words that connect the everyday occurrences of the entity and the attribute in a set of sentences (e.g., 30 or more sentences) of a dataset. The distributional representation engine generates vectors representing an embedding for the entity and attributes terms based on the context within which these terms appear in the set of sentences. The distributional attribute engine generates a vector representing an embedding for the entity and another vector representing an embedding for the attribute.
 
The attribute distributional engine’s embedding for the entity gets based on other features (i.e., attributes other than the candidate attribute) known to get associated with the entity in the dataset. The detailed distributional engine’s embedding for the quality gets based on different features associated with known entities of the candidate attribute.
 
The classification model concatenates the vector representations from the path embedding engine,  the distributional representation engine, and the distributional attribute engine into a single vector representation. The classification model then inputs the single vector representation into a feedforward network that determines, using the single vector representation, whether the attribute in the candidate entity-attribute pair is an essential attribute of the entity in the candidate entity-attribute pair.
Suppose the feedforward network determines that the point in the candidate entity-attribute pair is necessary for the entity in the candidate entity-attribute pair. In that case, the candidate entity-attribute pair gets stored in the knowledge base along with other known/actual entity-attribute pairs.
 

Extracting Entity Attribute Relations

 
The environment includes a classification model that, for candidate entity-attribute pairs in a knowledge base, determines whether an attribute in a candidate entity-attribute pair is an essential attribute of the entity in the candidate pair. The classification model is a neural network model, and the components get described below. The classification model can also be used using other supervised and unsupervised machine learning models.
 
The knowledge base, which can include databases (or other appropriate data storage structures) stored in non-transitory data storage media (e.g., hard drive(s), flash memory, etc.), holds a set of candidate entity-attribute pairs. The candidate entity-attribute pairs get obtained using a set of content in text documents, such as webpages and news articles, obtained from a data source. The Data Source can include any source of content, such as a news website, a data aggregator platform, a social media platform, etc.
 
The data source obtains news articles from a data aggregator platform. The data source can use a model. The supervised or unsupervised machine learning model (a natural language processing model) generates a set of candidate entity-attribute pairs by extracting sentences from the articles and tokenizing and labeling the extracted sentences, e.g., as entities and attributes, using part-of-speech and dependency parse tree tags.
The data source can input the extracted sentences into a machine learning model. For example, it can get trained using a set of training sentences and their associated entity-attribute pairs. Such a machine learning model can then output the candidate entity-attribute teams for the input extracted sentences.
 
In the knowledge base, the data source stores the candidate entity-attribute pairs and the sentences extracted by the data source that include the words of the candidate entity-attribute pairs. The candidate entity-attribute pairs are only stored in the knowledge base if the number of sentences in which the entity and attribute are present satisfies (e.g., meets or exceeds) a threshold number of sentences (e.g., 30 sentences).
 
A classification model determines whether the attribute in a candidate entity-attribute pair (stored in the knowledge base) is an actual attribute of the entity in the candidate entity-attribute pair. The classification model includes a path embedding engine 106, a distributional representation source, an attribute engine, and a feedforward network. As used herein, the term engine refers to a data processing apparatus that performs a set of tasks. The operations of these engines of the classification model in determining whether the attribute in a candidate entity-attribute pair is an essential attribute of the entity.
 

An Example Process For Identifying Entity Attribute Relations

Operations of the process are described below as being performed by the system’s components, and functions of the process are described below for illustration purposes only. Operations of the process can get accomplished by any appropriate device or system, e.g., any applicable data processing apparatus. Functions of the process can also get implemented as instructions stored on a non-transitory computer-readable medium. Execution of the instructions causes data processing apparatus to perform operations of the process.
 
The knowledge base obtains an entity-attribute candidate pair from the data source.
 
The knowledge base obtains a set of sentences from the data source that include the words of the entity and the attribute in the candidate entity-attribute pair.
 
Based on the set of sentences and the candidate entity-attribute pair, the classification model determines whether the candidate attribute is an actual attribute of the candidate entity. The set of penalties can be a large number of sentences, e.g., 30 or more sentences.

The Classification Model Performing The Following Operations

  •  Embeddings for words in the set of sentences that include the entity and the attribute get described in greater detail below concerning the process below
  • Created using known entity-attribute pairs, a distributional attribute embedding for the entity, which gets described in greater detail below concerning operation
  • Building, using the known entity-attribute pairs and distributional attribute embedding for the attribute, which gets described in greater detail below concerning operation
  • Choosing, based on the embeddings for words in the set of sentences, the distributional attribute embedding for the entity, and the distributional attribute embedding for the attribute, whether the attribute in the entity-attribute candidate pair is an essential attribute of the entity in the entity-attribute candidate pair, which gets described in greater detail below concerning operation.
 
The path embedding engine generates a first vector representation specifying the first words embedding between the entity and the attribute in the sentences. The path embedding engine detects relationships between candidate entity-attribute terms by embedding the paths or the words that connect the everyday occurrences of these terms in the set of sentences.
For the phrase “snake is a reptile,” the path embedding engine generates an embedding for the track “is a,” which can get used to detect, e.g., genus-species relationships, that can then get used to identifying other entity-attribute pairs.

Generating The Words Between the Entity And The Attribute

 
The path embedding engine does the following to generate words between the entity and the attribute in the sentences. For each sentence in the set of sentences, the path embedding engine first extracts the dependency path (which specifies a group of words) between the entity and the attribute. The path embedding engine converts the sentence from a string to a list, where the first term is the entity, and the last term is the attribute (or, the first term is the attribute and the previous term is the entity).
 
Each term (which is also referred to as an edge) in the dependency path gets represented using the following features: the lemma of the term, a part-of-speech tag, the dependency label, and the direction of the dependency path (left, right or root). Each of these features gets embedded and concatenated to produce a vector representation for the term or edge (V.sub.e), which comprises a sequence of vectors (V.sub.l, V.sub.pos, V.sub.dep, V.sub.dir), as shown by the below equation: {right arrow over (v)}.sub.e=[{right arrow over (v)}.sub.l,{right arrow over (v)}.sub.pos,{right arrow over (v)}.sub.dep,{right arrow over (v)}.sub.dir]
 
The path embedding engine then inputs the sequence of vectors for the terms or edges in each path into an long short-term memory (LSTM) network, which produces a single vector representation for the sentence (V.sub.s), as shown by the below equation: {right arrow over (v)}.sub.s=LSTM({right arrow over (v)}.sub.e.sup.(1). . . {right arrow over (v)}.sub.e.sup.(k))
 
Finally, the path embedding engine inputs the single vector representation for all sentences in the set of sentences into an attention mechanism, which determines a weighted mean of the sentence representations (V.sub.sents(e,a)), as shown by the below equation: {right arrow over (v)}.sub.sents(e,a)=ATTN({right arrow over (v)}.sub.s.sup.(1). . . {right arrow over (v)}.sub.s.sup.(n))
 
The distributional representational model generates a second vector representation for the entity and a third vector representation for the attribute based on the sentences. The distributional representation engine detects relationships between candidate entity-attribute terms based on the context within which point and the entity of the candidate entity-attribute pair occur in the set of sentences. For example, the distributional representation engine may determine that the entity “New York” gets used in the collection of sentences in a way that suggests that this entity refers to a city or state in the United States.
 
As another example, the distributional representation engine may determine that the attribute “capital” gets used in the set of sentences in a way that suggests that this attribute refers to a significant city within a state or country. Thus, the distributional representation engine generates a vector representation specifying an embedding for the entity (V.sub.e) using the context (i.e., the set of sentences) within which the entity appears. The distributional representation engine generates a vector representation (V.sub.a) specifying an embedding for the attribute using the set of sentences in which the feature appears.
 
The distributional attribute engine generates a fourth vector representation specifying a distributional attribute embedding for the entity using known entity-attribute pairs. The known entity-attribute pairs, which get stored in the knowledge base, are entity-attribute pairs for which it has gotten confirmed (e.g., using prior processing by the classification model or based on a human evaluation) that each attribute in the entity-attribute pair is an essential attribute of the entity in the entity-attribute couple.
 
The distributional attribute engine performs the following operations to determine a distributional attribute embedding that specifies an embedding for the entity using some (e.g., the most common) or all the other known attributes among the known entity-attribute pairs with which that entity gets associated.

Identifying Other Attributes For Entities

For entities in the entity-attribute candidate pair, the distributional attribute engine identifies attributes other than those included in the entity-attribute candidate pair associated with the entity in the known entity-attribute teams.
 
For an entity “Michael Jordan” in the candidate entity-attribute pair (Michael Jordan, famous), the attribute distributional engine can use the known entity-attribute pairs for Michael Jordan, such as (Michael Jordan, wealthy) and (Michael Jordan, record), to identify attributes such as affluent and description.
 
The attribute distributional engine then generates an embedding for the entity by computing a weighted sum of the identified known attributes (as described in the preceding paragraph), where the weights get learned using through an attention mechanism, as shown in the below equation: {right arrow over (v)}.sub.e=ATTN(.epsilon.(.alpha..sub.1) . . . .epsilon.(.alpha..sub.m))
 
The distributional attribute engine generates a fifth vector representation specifying a distributional attribute embedding for the attribute using the known entity-attribute pairs. The distributional attribute engine performs the following operations to determine a model based on some (whether the most common) or all of the known attributes associated with known entities of the candidate attribute.
For the point in the entity-attribute candidate pair, the distributional attribute engine identifies the known entities among the known entity-attribute couples that have the quality.
 
For each identified known entity, the distributional attribute engine identifies other attributes (i.e., attributes other than the one included in the entity-attribute candidate pair) associated with the entity in the known entity-attribute teams. The distributional attribute engine can identify a subset of attributes from among the identified attributes by:
 
(1) Ranking attributes based on the number of known entities associated with each entity, such as assigning a higher rank to attributes associated with a higher number of entities than those associated with fewer entities)
 

Identifying Entity Attribute Relations is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/identifying-entity-attribute-relations/feed/ 0
Searching Quotes of Entities Modified at Google https://gofishdigital.com/blog/searching-quotes-of-entities-modified-at-google/ https://gofishdigital.com/blog/searching-quotes-of-entities-modified-at-google/#respond Fri, 18 Feb 2022 19:12:29 +0000 https://gofishdigital.com/?p=4979 The Patent Behind Searching Quotes of Entities Has Been Modified by Continuation Patent Again   When Google updates some processes, they may file an updated patent to protect the intellectual property behind the process.  This may mean filing a patent where most of the description for the patent is identical or nearly a duplicate to […]

Searching Quotes of Entities Modified at Google is an original blog post first published on Go Fish Digital.

]]>
The Patent Behind Searching Quotes of Entities Has Been Modified by Continuation Patent Again

 

Searching Quotes of Entities

When Google updates some processes, they may file an updated patent to protect the intellectual property behind the process.  This may mean filing a patent where most of the description for the patent is identical or nearly a duplicate to earlier versions of the patent.  Titles sometimes change a little, but the list of authors mostly remains the same (I have seen one where a new author was added.)

Related Content:

Google’s patent “Systems and methods for searching quotes of entities using a database” has been updated a second time.  To try to understand what has changed involved reading through the patent’s claims and seeing how the description behind how the patent works has changed.

When the USPTO decides whether or not to grant a patent, they have prosecuting agents go through the claims to see if they are new, non-obvious, and useful. Since a continuation patent is trying to update the protection and use the date of the original patent as the start of the exclusion period, the patent agent makes sure that those new claims are valid before granting a continuation patent.

I first wrote about this patent in an earlier post at Go Fish Digital: Google Searching Quotes of Entities.  If you want a good idea of how the process behind that patent worked when it came out originally, I would recommend reading through that post before you go too much further here.

I followed up with a post at SEObythesea: Quote Searching Updated at Google to Focus on Videos.  It describes the changes to the process described in the claims to the post, after the first continuation patent.

Those claims have been updated again and provide hints at how Google treats entity information that may have been initially kept in the knowledge graph.

Comparing Claims From The Searching Quotes of Entities Patent Versions

August 8, 2017 – Systems and methods for searching quotes of entities using a database:

1. A computerized system for searching and identifying quotes, the system comprising: a memory device that stores a set of instructions; and at least one processor that executes the set of instructions to: receive a search query for a quote from a user; parse the query to identify one or more key words; match the one or more key words to knowledge graph items associated with candidate subject entities in a knowledge graph stored in one or more databases, wherein the knowledge graph includes a plurality of items associated with a plurality of subject entities and a plurality of relationships between the plurality of items; determine, based on the matching knowledge graph items, a relevance score for each of the candidate subject entities; identify, from the candidate subject entities, one or more subject entities for the query based on the relevance scores associated with the candidate subject entities; identify a set of quotes corresponding to the one or more subject entities; determine quote scores for the identified quotes based on at least one of the relationship of each quote to the one or more subject entities, the recency of each quote, or the popularity of each quote; select quotes from the identified quotes based on the quote scores; and transmit information to a display device to display the selected quotes to the user.

February 5, 2019 – Systems and methods for searching quotes of entities using a database:

1. A method comprising the following operations performed by one or more processors: receiving audio content from a client device of a user; performing audio analysis on the audio content to identify a quote in the audio content; determining the user as an author of the audio content based on recognizing the user as the speaker of the audio content; identifying, based on words or phrases extracted from the quote, one or more subject entities associated with the quote; storing, in a database, the quote, and an association of the quote to the subject entities and to the user being the author; subsequent to storing the quote and the association: receiving, from the user, a search query; parsing the search query to identify that the search query requests one or more quotes by the user about one or more of the subject entities; identifying, from the database and responsive to the search query, a set of quotes by the user corresponding to the one or more of the subject entities, the set of quotes including the quote; selecting the quote from the quotes of the set based at least in part on the recency of each quote; and transmitting, in response to the search query, information for presenting the selected quote to the user via the client device or an additional client device of the user.

Compare those first two claims to the first claim from the newest version of the patent, which was granted earlier this week.  It has a few changes from the first two versions.

February 15, 2022 – Systems and methods for searching quotes of entities using a database:

1. A computer system, the system comprising: a memory device that stores a set of instructions; and at least one processor that executes the set of instructions to: retrieve an electronic resource, wherein the electronic resource is a webpage or is a document; parse the electronic resource to identify one or more key words; match the one or more key words to a subject entity from a subject entity database; identify a plurality of quotes based on the subject entity of the subject entity database, wherein each quote of the plurality of quotes is identified from an additional electronic resource comprising a webpage; identify an additional subject entity that is associated with the subject entity of the subject entity database; select a subset of the identified plurality of quotes based on the subset of the identified quotes being associated with the additional subject entity; determine quote scores for the subset of identified quotes, wherein each of the quote scores is for a corresponding one of the quotes of the subset and is determined based on one or multiple of: a relationship of the corresponding quote to the subject entity, a recency of the corresponding quote, and a popularity of the corresponding quote; select, based on the quote scores, a quote from the subset of identified quotes; and transmit information to a client device accessing the electronic resource, wherein transmitting the information causes the client device to display the selected quote and a selectable hyperlink to the webpage from which the selected quote was identified.

The titles of the patents haven’t changed, and no authors were added.  The Drawings and most of the descriptions are the same.

  1. The first claim refers to “matching keywords to knowledge graph items.”
  2. The second First Claim does not include the knowledge graph and says that the process is about “performing audio analysis on the audio content to identify a quote in the audio content.”
  3. The Newest First Claim replaces the knowledge graph from the first version when it says it will “match the one or more keywords to a subject entity from a subject entity database.”
  4. Unlike the first two versions, the Newest First Claim describes making the quote information attributable.

So I Am Left With Questions About Searching Quotes of Entities

 

  1. Why is a subject entity database introduced, and why might that be different from the knowledge graph? It sounds like it could be a thesaurus of information that isn’t browseable and transparent to searchers the way that information from the knowledge graph might be. Is other information about entities kept separate from the knowledge graph, too, until it is decided how best to display that information?
  2. Where are the quotes stored?  The second first claim tells us that audio content is analyzed instead of looking in the knowledge graph. In the third first claim, searching quotes of entities is done by looking through quotes in the subject entity database. When I first read the second version of the patent, I took it to mean that information about the quotes was kept in an index of video information.  It is, however, likely that Google may have information about some quotes that it doesn’t necessarily have a video for.
  3. Where is quote information coming from? The third first claim tells us that it may provide to the searcher a “selectable hyperlink to the webpage from which the selected quote was identified.” The two earlier versions state that the quote may be presented to a searcher but does not mention in any way providing attribution to the source of the quote or information about it. Attribution now seems more important to Google, where the second first claim seemed to assume that information may be coming from YouTube.

 

The Newest Version of the Searching Quotes of Entities Patent

Systems and methods for searching quotes of entities using a database
Inventors: Eyal Segalis, Gal Chechik, Yossi Matias, Yaniv Leviathan, and Yoav Tzur
Assignee: GOOGLE LLC
US Patent: 11,250,052
Granted: February 15, 2022
Filed: December 26, 2018

Abstract

Systems and methods are provided for searching and identifying quotes in response to a query from a user.

Consistent with certain embodiments, systems and methods are provided for identifying one or more subject entities associated with the query and identifying, from a database or search results obtained in response to the query, a set of quotes corresponding to the one or more subject entities.

Further, systems and methods are provided for determining quote scores for the identified quotes based on at least one of the relationships of each quote to the one or more subject entities, the recency of each quote, and the popularity of each quote.

Additionally, systems and methods are provided for organizing the identified quotes in a rank order based on the quote scores and selecting quotes based on the rank order or the quote scores. In addition, systems and methods are provided for transmitting information to display the selected quotes on a display device.

 

 

 

Searching Quotes of Entities Modified at Google is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/searching-quotes-of-entities-modified-at-google/feed/ 0
Accurate Establishment Locations in Google https://gofishdigital.com/blog/accurate-establishment-locations-in-google/ https://gofishdigital.com/blog/accurate-establishment-locations-in-google/#comments Mon, 07 Feb 2022 17:10:27 +0000 https://gofishdigital.com/?p=4926 Placing Accurate Establishment Locations Establishments, such as restaurants, gas stations, grocery stores, and other businesses, are constantly opening, closing, and moving to different locations. I have seen local companies referred to previously in Google patents as local business entities, but like the establishment’s name for them. Such movement of businesses is every day, and tracking […]

Accurate Establishment Locations in Google is an original blog post first published on Go Fish Digital.

]]>
Placing Accurate Establishment Locations

Establishments, such as restaurants, gas stations, grocery stores, and other businesses, are constantly opening, closing, and moving to different locations. I have seen local companies referred to previously in Google patents as local business entities, but like the establishment’s name for them. Such movement of businesses is every day, and tracking new locations is helpful.

Related Content:

Google has relied on other sites listing location information for businesses consistently, however getting large numbers of other sites to update location information can be time-consuming and take a lot of effort (and potentially some cost.)  It is in Google’s best interest to indicate the proper locations of new businesses as quickly as possible.

I have written about business locations and the importance of location prominence in ranking search results for local search. But this is the first Google patent I have seen about tracking the sites of businesses to track accurate establishment locations. I have also written about authority pages for businesses in specific areas, and this patent brings the idea of authority web pages back to local search again.

Directories tracking such businesses need to be updated to maintain accurate establishment locations.

Someone may need to update the guide manually when an incorrect establishment gets linked to an area. This need for manual input may result in delays or failure to update a directory resulting in inaccurate establishment locations. That is a problem that this patent aims at helping to solve.

This patent relates to placing accurate establishment locations.

The first image may receive processing devices, including location data associated with the first image’s capture.

The processing devices may then identify a set of images, including geographic location information and identification marks. And each identification mark is associated with accurate establishment locations.

  • Determine whether the first image contains an identification mark of any set of images.
  • Decide that one of the establishments associated with one of the identification marks in the first image gets located within a set proximity of the first image location.
  • Update a location database by associating one of the establishments with a set location within the set proximity of the first image location.

Another embodiment provides a system for determining and updating accurate establishment locations.

This system may include computing devices having processors; and memory storing instructions, executable by the processors. The instructions may include receiving a first image including accurate establishment locations associated with the first image’s capture, wherein the location data consists of a first image location;

  • Identifying, with computing devices, a set of images, wherein each image of the collection of images include geographic location information and identification marks, wherein each identification mark is associated with establishments.
  • Comparing the first image to the set of images.
  • Determine, based on the comparison; the first image contains one of the identification marks.
  • Determining that one of the establishments, associated with one of the identification marks contained in the first image, is currently located within a set proximity of the first image was captured.
  • Update a location database by associating one of the establishments with a set location within the set proximity of the first image location.

 

Establishment anchoring with geolocated imagery
Inventors: Brian Edmond Brewington, and Kirk Johnson
Assignee: Google LLC
US Patent: 11,232,149
Granted: January 25, 2022
Filed: July 24, 2019

Abstract

The technology relates to determining an establishment’s presence at geolocation.

A computing device may receive a first image, including location data associated with the first image’s capture.

A set of images, including location information and identification marks associated with establishments, may also be received.

The computing device may compare the first image to determine whether the first image contains an identification mark and decide that the establishment, associated with the identification mark from the first image, is currently located within a set proximity of the first image location.

The computing device may also update a location database by associating one of the establishments with a location within a set proximity of the first image location.

 

 

Logo-Labeled Images Help Determine Accurate Establishment Locations

The technology relates to determining an establishment’s presence at a specific geographic location. For example, images received from various sources may contain location information, such as geolocation information.

These images may get analyzed by processing devices to determine whether the pictures include any identification marks indicative of an establishment. For every photo that provides such an identification mark, a logo label indicating the image contains an identification mark may be associated with the idea.

In another example, logo-labeled images may be retrieved from a storage system that stores snapshots of identification marks such as logos.

Further, an establishment associated with the identification mark may also be associated with the image.

A captured image, taken at a known location, may then be compared to a set of the logo labeled pictures associated with a place within a predetermined distance of the area of a captured image. In this regard, the captured images may be searched for any identification marks in the set of logo-labeled images. Upon finding a matching identification mark, the presence of an establishment associated with the matched identification mark may be anchored at the location of the captured image.

To associate, or disassociate, an establishment at or from a specific location, publically available images, for example, web-based pictures from the Internet, maybe gathered. Photos from websites may get collected and stored in a database or cache. For example, a web crawler may continually crawl through Internet websites and keep every picture found.

Further, the web crawler may store the images associated with the web address from which the image was found. Photographs may be retrieved from storage systems such as those that hold various photos or those that specifically store photos of identification marks such as logos.

Each image may be assigned a label that identifies, suggests, or otherwise indicates the contents of the picture. For example, automatic photo label technology may attach labels to each photo with confidence levels.

Images that include identification marks of an establishment, such as a logo, business name, sign, etc., may be labeled as “logo.” Each image labeled as a logo may also be associated with a location, such as an address or geolocation.

In this regard, each logo labeled image may contain either implicit or explicit location information.

Additionally, for any web-based images, each web-based image may be associated with an address found on the website from which the web-based idea was gets found.

 

Authority Webpages for Identification Marks

Websites associated with a logo labeled web-based image may be considered authority webpages for the identification mark within the web-based image.

A captured image may then be compared to logo-labeled images. In this regard, the processing devices may perform an image-within-image search to determine whether any portion of the captured image matches any identification marks found in the logo-labeled images.

Image-within-image searching may get performed using an image matching algorithm. While performing the image-within-image search, variants of the captured image and logo-labeled images may also be compared.

The captured image may also be compared to a set of logo-labeled images. In this regard, the captured image may be compared only to a group of logo-labeled images within a predetermined distance of the captured image.

On finding a matching identification mark between the captured image and one of the logo images, the establishment associated with the matched identification mark may be anchored at or associated with a location within a set proximity of the area of the captured image.

As such, location data, such as mapping data, business directories, etc., may be updated to provide current location information for the establishment associated with the matched identification mark.

Further, if the identification mark is associated with an authoritative website, and the location of the captured image is at or near the area found on the traditional website, the anchoring of the establishment may be done with high confidence.

In one example, if an establishment moves or closes, newly captured images may not include an identification mark present in past captured images.

Accordingly, as newly captured images get compared to logo-labeled images, an identification mark that was not present before may start to appear, and the previous identification mark may no longer be current.

This use of images might indicate that the establishment associated with the previous identification mark should get marked as closed or moved.

Mapping data, business directories, and other location-dependent data may be continuously updated to provide accurate establishment locations.

In addition, such an indication of closure may be further verified by searching the authority webpages of the previous identification mark and the new identification mark. Suppose the authority page of the unique identification mark indicates the first location, and the authority page of the old identification mark indicates a different location than the first location.

In that case, high confidence can be inferred that a new establishment is present at the first location.

To Associate or Disassociate An Establishment At A Certain Location

Many images may be collected. In one example, web-based images may be gathered from the Internet and stored as a collection of web-based images. In this regard, a web crawler may continually crawl through internet websites and keep every picture found.

The images from the websites may be gathered and stored in a database, cache, etc., of the storage system, such as an example of a company’s website, “Circle Pies.” A web crawler may crawl to the website by going to the web address of Circle Pies.

Accurate Establishment locations

 

The web crawler may then review the data on the website and determine that the website contains two web-based images. Based on this determination, the web crawler may store web-based photos, for example, at a storage system.

The web crawler may also store the web-based images associated with the website’s web address on which the image was found. For example, the website may get located at the web address “https://www.a.com/circlepies.” The web-based photos may then be stored in association with the web address in a collection of web-based images stored at the storage system.

Images may be retrieved from other storage systems, such as those storing various images and associated EXIF data and those that specifically hold images of identification marks such as logos.

In this example, each logo image may get associated with address or location information for the logo corresponding to the business or location where the logo can be found.

Each collected image may be assigned a label that indicates the contents of the image. For example, automatic photo label technology may attach labels to each web-based photo with confidence levels. Labels may include “person” for a picture of an individual and “car” for images that identify cars.

Confidence levels may have a rating, such as a value from 0-1, 1-100, or 1-10 or other rating systems, which indicates that a label applied to one of the images accurately describes the contents of the image.

The processors may analyze images to determine whether they include identification marks that indicate an establishment. Establishments may consist of businesses, organizations, associations, condominiums, etc. In this regard, the photo label technology may get used to determine images with identification marks and assign a logo label indicating the image contains an identification mark. Photographs that include identification marks of an establishment, such as a logo, business name, clip art, sign, etc., may be labeled as “log” by the automated photo label technology. Images that include identification marks of an establishment may be assigned a label other than “log” to indicate the pictures have identification marks.

For example, images that include identification marks of an establishment may get clustered into a group of pictures with labels such as “business name,” “sign,” “clip art,” etc. As another example, images may already get associated with information identifying the image as including a logo specified in EXIF data. This information may also label the image as having a logo, for example, associating the image with the logo label.

Additionally, an establishment associated with an identification mark may also be associated with the image. For example, the automatic photo label technology may find that image is an image of a pizza, and the automated photo label technology assigns the image a ” food label.” In one example, techniques that analyze contents within a photo to assign an annotation describing the contents to the photo, such as those that use statistical classification methods to index pictures, may be used to label photos automatically.

A Trained Machine Learning Model for Accurate Establishment Locations

This use of images to update accurate establishment locations was interesting to learn about. The patent tells us that a machine learning model may be trained on manually labeled images relative to a reference taxonomy.

The trained machine learning model may automatically assign labels to images by the reference taxonomy. For images, the automatic photo label technology may determine that image is the logo for the establishment Circle Pies, and therefore a logo label may be assigned to the image. Further, the image may also be associated with the establishment of Circle Pies.

Each image labeled as a logo may also be associated with a location, such as an address or geolocation. Each logo labeled image may contain explicit location information stored directly in the metadata associated with each logo labeled image. (This is the most references I have seen to EXIF data in images in a Google patent.)

For example, an embodiment may include a precise longitude and latitude reading in the image’s metadata, such as the EXIF information. EXIF data may provide the location where the photo was captured.

Alternatively, or in addition to the explicit location information, implicit location information may get derived from determining the location of objects captured in each image. For example, an image may have caught the Statue of Liberty, and the Statue of Liberty’s site may be known, and estimation of where the idea was captured can be made based on the known location.

The estimated location can be refined based on image data, such as the direction in the image was captured. Implicit image location for a web-based image may be inferred from the website where the photo was found. For example, a website that hosts a web-based image may include an address, and the address on the website may then be associated with the web-based image hosted on the website.

Additionally, each web-based image may be associated with an address found on the website from which the web-based image was or gets found. The website includes a street address, and a logo labeled web-based image may then be associated with a street address as its location.

Authority WebPages Associated with Locations

Websites associated with a logo labeled web-based image may be considered authority webpages for the identification mark within the web-based image.

In other words, an authority page may be an official or unofficial webpage of the establishment associated with the identification mark found within the respective web-based image.

For example, the website may be the official website for establishing “Circle Pies.” In this regard, the website at web address may be made an authority page for the web-based image, including the identification mark belonging to the establishment Circle Pies. Accordingly, the web-based image may be associated with an indication that it was found on an authority page.

Websites that contain copyrighted or proprietary material may not be used as authority pages.

Images Can Help Indicate Locations

Logo labeled images may have been captured or otherwise stored at a storage system associated with locations. A set of the logo-labeled images may be within a predetermined distance of the place. Accordingly, the logo-labeled images found within a location radius may be added to or included in the set of logo-labeled images. Thus, logo-labeled images captured at geolocations may be added to or included in the set of the logo-labeled photos compared to the captured image.

In addition,  the set of logo-labeled images may also be identified based on the confidence level of the images. In this regard, a given logo labeled images may be added or included in a set of logo labeled images if the assigned confidence level of the provided logo labeled image meets or is above a minimum threshold value.

Upon finding a matching identification mark between the captured image and one of the logo labeled images, the processors, such as processors of server computing devices, may anchor or associate the establishment associated with the matched identification mark to or with the location of the captured image.

The establishment may get associated within a predetermined proximity of the captured image. The set location may be a street address. As such, location data stored in a location database, such as mapping data, business directories, etc., may be updated to provide accurate establishment locations associated with the matched identification mark.

To ensure anchoring gets done with high confidence, specific criteria may be required to get met before anchoring the identification mark at the location of the captured image. For example, suppose the identification mark is associated with an authoritative website, and the captured image’s place is at the authoritative website’s location. In that case, the anchoring of the establishment may be with high confidence.

To ensure high confidence, a set number of newly captured images, such as five or more or less, may need to have the same matching identification mark before an establishment gets anchored to the captured image’s location.

What to Do If Your Business Moves, According to the Patent

It looks like you ideally should take new photos of your new location and submit them to Google.   You should also update the authority site for the business and make sure that the new location is listed there. According to the patent:

In one example, if an establishment moves or closes, newly captured images may not include an identification mark present in past captured images. Accordingly, as freshly caught images get compared to a set of logo-labeled images, an identification mark that was not present before may start to appear, and the previous identification mark may no longer be current. This identification might indicate an establishment associated with the last mark identification should get marked as closed or moved.

In addition, such an indication of closure may be further verified by searching the authority webpages of the previous identification mark and the new identification information. Suppose the authority page of the new identification mark indicates the first location. In that case, the authority page of the old identification mark suggests a different location than the first location. In that case,  high confidence can be inferred that a new establishment is present at the first location.

 

 

 

 

 

 

 

 

Accurate Establishment Locations in Google is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/accurate-establishment-locations-in-google/feed/ 2
Modified Search Queries https://gofishdigital.com/blog/modified-search-queries/ https://gofishdigital.com/blog/modified-search-queries/#respond Wed, 19 Jan 2022 20:35:17 +0000 https://gofishdigital.com/?p=4819 Modified Search Queries Based on Misspellings and Synonyms This patent is about ranking modifications of a previous query based on a current query. Internet search engines provide information about Internet accessible documents such as web pages, images, text documents, and multimedia content. A search engine may identify the records in response to a searcher’s search […]

Modified Search Queries is an original blog post first published on Go Fish Digital.

]]>
Modified Search Queries Based on Misspellings and Synonyms

This patent is about ranking modifications of a previous query based on a current query.

Internet search engines provide information about Internet accessible documents such as web pages, images, text documents, and multimedia content. A search engine may identify the records in response to a searcher’s search query that includes search terms. The search engine ranks the documents based on the relevance of the documents to the query and the importance of the papers and provides search results that include aspects of and links to the identified records.

Related Content:

The searcher’s search query may be modified and used in identifying SERPs.

Search Query misspellings may be corrected to create modified search queries to identify the documents.

Search Query Synonyms may get used in creating modified search queries, and such modified search queries may be used to identify documents.

The present disclosure gets directed to methods and apparatus for ranking modifications of a previous query. For example, changes of an earlier query may be generated based on a current query issued after the last query.

For example, modifications of the previous query may be generated based on substituting n-grams of the last query with n-grams of the current query.

For example, the previous query may be [weather tomorrow], and the current query may be [how about on tuesday]. Modifications of the last query may be generated by substituting the n-gram “Tuesday” with each term of the previous query to form the changes [tuesday tomorrow] and [weaTuesdayesday].

Modifications of the previous query may additionally and alternatively get generated by substituting the n-gram “on Tuesday” with each term of the last query to form the changes [on Tuesday tomorrow], [weTuesdayn tuesday]. Each modification may be identified, and a ranking of each of the changes may be determined.

At least one of the modifications may get selected as a submission query based on the rankings of the modifications. The desired change may be submitted instead of, or in addition to, the current query.

modified search queries

How Modified Search Queries May Take Place

A computer-implemented method may get provided that includes the steps of:

  • Receiving a current query including many current query terms
  • Deciding on, based on of the current query terms, that the current query is indicative of an intent of the searcher to refine a query
  • Picking a previous query associated with the current query, the last query including a plurality of prior query terms and issued before the current query by at least one of a computing device and a searcher that gave the current query
  • Choosing a modification n-gram based on the current query terms
  • Generating modifications of the previous query that each includes the modification n-gram substituted in the prior query terms
  • Identifying, for each modification of multiple of the modifications
  • A popularity measure and a related concept measure, where the popularity measure is indicative of the popularity of the modification and the related concept measure is indicative of a likelihood of co-occurrence
  • In documents of the modification n-gram and the previous query terms replaced by the modification n-gram in the modification
  • Determining a ranking for each of  the modifications gets based on the popularity measure for the modification and the related concept measure of the modification
  • Selecting one modification of the modifications to utilize as a submission query when the ranking of the one modification is more prominent than at least the rankings of the other modifications

Modifying search queries may include the following features.

  1. Submitting one modification to a query system in place of the current query. The query system may be a search system, and the method may further include: determining search results responsive to the one modification; and providing the search results to the searcher.
  2. Determining, for at least a first modification of the modifications, a query pattern of the first modification, wherein the popularity measure for the first modification includes a query pattern popularity measure indicative of the popularity of the query pattern of the first modification. Determining the query pattern of the first modification may consist of: selecting a category of an n-gram in the first modification and substituting the n-gram with an identifier of the class.
  3. Ranking of the modification may get boosted when the modification n-gram of the transformation and the previous query terms replaced by the modification n-gram in the change indicate geographic locations.

The modified search results may further include:

  • Identifying a current query popularity measure indicative of the popularity of the current query based on previous queries
  • Choosing a ranking for the current query based on the current query popularity measure, wherein selecting the one modification of the modifications to utilize as a submission query occurs only when the ranking of the one transformation is more prominent than the ranking for the current query.
  • Select the current query to utilize as the submission query when the ranking for the current query is more prominent than the rankings of the other modifications.
  • Boosting the ranking of the current query, where the boost promotes the ranking of the current query relative to the modifications.
  • Utilizing a default measure for the current query for the related concept measure of the current query
  • Picking the ranking for the current query based on the default measure.
  • The previous query and the current query may be provided via spoken input of the searcher. Determining the current query indicates a potential intent to refine the previous query based on the last query and the current via spoken information of the searcher.
  • Determining the current query is indicative of a potential intent of the searcher to refine the previous query may get based on determining the current question includes refinement intent n-grams.

The modified search queries behind the patented can also include:

  • Finding a second modification n-gram based on the current query terms
  • Generating additional modifications of the previous query that each includes the second modification n-gram substituted for of the previous query terms
  • Identifying the popularity measure and the related concept measure for each of multiple of the additional modifications
  • Determining a ranking for each of the multiples of the additional modifications, based on the popularity measure for the further modification and the related concept measure of the further modification
  • Selecting the one modification of the modifications to utilize as a submission query occurs only when the ranking of the one transformation is also more prominent than the rankings of the additional improvements.
  • Deciding that a matching one of the additional modifications includes the same terms in the same order as a matching one of the modifications
  • Calculating a combined ranking of the matching one of the additional modifications and the matching one of the modifications, the combined ranking being more prominent than an individual ranking of either of the matching one of the different modifications.

A computer-implemented method may be provided that includes the steps of:

  • Receiving a current query including a plurality of current query terms
  • Choosing, based on the current query terms, that the current query is indicative of an intent of the searcher to refine a query; determining a modification n-gram based on the current query terms
  • Generating modifications of the previous query that each includes the change n-gram substituted in the prior query terms
  • Finding candidate queries, the candidate queries including multiple of the generated modifications and including the current query
  • Identifying, for each candidate query of the candidate queries: a popularity measure
  • Wherein the popularity measure is indicative of the popularity of the candidate query based on previous queries.
  • Determining a ranking of each candidate query, wherein the ranking of a given candidate query is based on the popularity measure for the given candidate query.
  • Selecting one candidate query of the candidate queries to utilize as a submission query, the selecting based on the ranking of the one candidate query.

The modified search queries patent is at:

Ranking modifications of a previous query
Inventors: Bruce Christensen, Kumar Pravir Gupta, and Jan Kuipers
Assignee: GOOGLE LLC
US Patent: 11,169,989
Granted: November 9, 2021
Filed: October 13, 2015

Abstract

Methods and apparatus related to ranking modifications of a previous query.

For example, modifications of a previous query may be generated based on a current query issued subsequent to the previous query by substituting n-grams of the previous query with n-grams of the current query. measures of each of the modifications may be identified and, based on such measures, a ranking of each of the modifications may be determined.

One of the modifications may be selected as a submission query based on the rankings of the modifications.

The submission query may be selected for submission in lieu of, or in addition to, the current query.

Modified Search Queries Conclusion

I have summarized the summary of the patent, and if you want to try to learn more about how it works, you may want to click through to see the whole patent.  There have been many patents from Google that involve rewriting search queries to correct misspellings or provide synonyms so that very similar meaning search results are received by a searcher.

I wanted to include links to some posts that are about providing those synonyms because there are a number of different ways that Google may choose to provide those to searchers.  Many of the modified search queries patents involving synonyms first started appearing at Google around 2003, and have evolved to synonym substitutions involving hummingbird and RankBrain.  One way that Google refers to modified search queries is rewriting queries. Google has been modifying queries for a long time, and these are some examples of posts about Google doing that:

5/25/2007 – Using a Local Category Synonym to Refine Queries

12/29/2008 – How a Search Engine Might Use Synonyms to Rewrite Search Queries

12/22/2009 – Google Search Synonyms Are Found in Queries

1/19/2010 – Google Synonyms Update

2/16/2011 – More Ways Search Engine Synonyms Might be Used to Rewrite Queries

8/12/2013 – How Google May Substitute Query Terms with Co-Occurrence

9/27/2013 – The Google Hummingbird Update and the Likely Patent Behind Hummingbird

10/27/2015 – Investigating Google RankBrain and Query Term Substitutions

12/21/2015 – How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories

8/23/2019 – How Google May Do Query Rewriting by Looking at a Searcher’s Prior Queries

9/27/2021 – Rewritten Queries and User-Specific Knowledge Graphs

Modified Search Queries is an original blog post first published on Go Fish Digital.

]]>
https://gofishdigital.com/blog/modified-search-queries/feed/ 0