|
ReconciliationServiceApi
Specification for a standard reconciliation service API.
IntroductionA reconciliation service is a web service that, given some text which is a name or label for something, and optionally some additional details, returns a ranked list of potential entities matching the criteria. The candidate text does not have to match each entity's official name perfectly, and that's the whole point of reconciliation--to get from ambiguous text name to precisely identified entities. For instance, given the text "apple", a reconciliation service probably should return the Apple Inc. company, the apple fruit, and New York city (also known as the Big Apple). Entities are identified by strong identifiers in some particular identifier space. In the same identifier space, identifiers follow the same syntax. For example, given the string "apple", a reconciliation service might return entities identified by "/en/apple", "/en/apple_inc", and "/en/apple_ii", in the Freebase ID space. Given the same string, another reconciliation service might return entities identified by URIs, which are in the URI space, so to speak. Some other identifier spaces are ISBN, LSID. (Although ISBN and LSID identifiers are numbers, for uniformity, they are still stored as strings in Google Refine.) Each reconciliation service can only reconcile to one single identifier space, but several reconciliation services can reconcile to the same identifier space. So that it will be easy for users to plug in a variety of interchangeable reconciliation services, we define the API for a standard reconciliation service here. A standard reconciliation service is a HTTP-based RESTful JSON-formatted API. It operates in 2 modes: single query mode and multiple query mode. The second is for optimization (reducing number of HTTP calls). Reconciliation services must implement support for both query modes as well as implementing the metadata query. (Much of this standard is based on the Freebase relevance service and the Freebase experimental recon service.) BEFORE YOU BEGIN WITH BELOWView our mailing list thread HERE where Tom Morris gives a helpful basic Reconciliation layout. Query RequestSingle Query ModeA call to a reconciliation service API for a single query looks like either of these: http://foo.com/bar/reconcile?query=...string...
http://foo.com/bar/reconcile?query={...json object literal...}If the query parameter is a string, then it's an abbreviation of query={"query":...string...}. Here are two live examples:
The query json object literal has a few fields
Each json object literal of the "properties" array is of this form {
"p" : string, property name, e.g., "country", or
"pid" : string, property ID, e.g., "/people/person/nationality" in the Freebase ID space
"v" : a single, or an array of, string or number or object literal, e.g., "Japan"
}A "v" object literal would have a single key "id" whose value is an identifier resolved previously to the same identity space. Here is an example of a full query parameter: {
"query" : "Ford Taurus",
"limit" : 3,
"type" : "/automotive/model",
"type_strict" : "any",
"properties" : [
{ "p" : "year", "v" : 2009 },
{ "pid" : "/automotive/model/make" , "v" : { "id" : "/en/ford" } }
]
}Multiple Query ModeA call to a standard reconciliation service API for multiple queries looks like this: http://foo.com/bar/reconcile?queries={...json object literal...}The json object literal has zero or more arbitrary key, each of which has a value in the same format as a single query, e.g. http://foo.com/bar/reconcile?queries={ "q0" : { "query" : "foo" }, "q1" : { "query" : "bar" } }"q0" and "q1" can be arbitrary strings. Note that the query counter starts at zero. Here is a live example: http://standard-reconcile.freebaseapps.com/reconcile?queries={%22q0%22:{%22query%22:%22boston%22,%22type%22:%22/music/musical_group%22},%22q1%22:{%22query%22:%22jaguar%22}} Query ResponseThe response for a single query is a JSON literal object {
"result" : [
{
"id" : ... string, database ID ...
"name" : ... string ...
"type" : ... array of strings ...
"score" : ... double ...
"match" : ... boolean, true if the service is quite confident about the match ...
},
... more results ...
],
... potentially some useful envelope data, such as timing stats ...
}For multiple queries, the response is a JSON literal object with the same keys as in the request {
"q0" : {
"result" : { ... }
},
"q1" : {
"result" : { ... }
}
}The service must also support JSONP through a callback parameter. Service MetadataWhen a service is called with just a JSONP callback parameter and no other paramters, then it must return its metadata as a JSON object literal with at least 3 fields "name", "identifierSpace", and "schemaSpace". Other fields are optional for reconciliation services which can make use of the default Freebase preview, suggest, etc services, but non-Freebase reconciliation services may need to implement them all. Here are two live examples:
{
"name" : "Netflix Reconciliation through Freebase",
"identifierSpace" : "http://rdf.freebase.com/ns/authority.netflix.movie",
"schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
"view" : {
"url" : "http://www.netflix.com/WiMovie//{{id}}"
},
"preview" : {
"url" : "http://netflix-reconcile.freebaseapps.com/preview/{{id}}",
"width" : 430,
"height" : 300
},
"suggest" : {
"type" : {
"service_url" : "http://netflix-reconcile.freebaseapps.com",
"service_path" : "/suggest_type",
"flyout_service_url" : "http://www.freebase.com"
},
"property" : {
"service_url" : "http://netflix-reconcile.freebaseapps.com",
"service_path" : "/suggest_property",
"flyout_service_url" : "http://www.freebase.com"
},
"entity" : {
"service_url" : "http://netflix-reconcile.freebaseapps.com",
"service_path" : "/suggest",
"flyout_service_path" : "/flyout"
}
},
"defaultTypes" : []
}Note that Freebase itself supports at least 3 identifier spaces:
Thus, it's not enough to say that the identifier space is http://www.freebase.com/; we have to use the URI of the specific property (in this case, mid). The schema space is the identifier space for types and properties. It might be different from the entities' identifier space. The other fields are for
Preview APIThe preview service API (complementary to the reconciliation service API) is quite simple. Pass it an identifier and it renders information about the corresponding entity in an HTML page, which will be shown in an iframe inside Google Refine. The given width and height dimensions tell Google Refine how to size that iframe. If there is no preview service specified in the reconciliation service's metadata, and if the entity identifier space is a Freebase ID or Mid identifier space, then Google Refine knows to use Freebase's topicblock service. Here is an example of how an entity's preview looks like http://www.freebase.com/widget/topic/en/apple_inc?mode=content Suggest APIsIn the "Start Reconciling" dialog box in Google Refine, you can specify which type of entities the column in question contains. For instance, the column might contains names of politicians. But you don't know the identifier corresponding to the "politician" type. So we need a suggest API that translates "politician" to something like, say, "/government/politician" if we're reconciling against Freebase. In the same dialog box, you can specify that other columns should be used to provide more details for the reconciliation. For instance, if there is a column specifying the politicians' home city, passing that data onto the reconciliation service might make reconciliation more accurate. You might want to specify how that second column is related to the column being reconciled, but you might not now how to specify "home city" as a precise relationship. So we need a suggest API that translates "home city" to something like "/people/person/places_lived". There is also a need for a suggest service for entities rather than just for types and properties. When a cell has no good candidate, then you would want to perform a search yourself (by clicking on "search for match" in that cell). These suggest APIs are required to work with a modified version of the Freebase Suggest widget. As illustrated on that site, each suggest API has 2 jobs to do:
The metadata for each suggest API (type, property, or entity) is as follows: {
"service_url" : "... url including only the domain ...",
"service_path" : "... optional relative path ...",
"flyout_service_url" : "... optional url including only the domain ...",
"flyout_service_path" : "... optional relative path ..."
}The service_url field is required and it should look like this: http://foo.com. There should be no trailing / at the end. The other fields are optional and have defaults if not provided:
The Freebase Suggest widgets embedded in Google Refine will concatenate _url and _path to make the full URL. Refer to the specification for a Suggest API for details. Google Refine Integration & TestingGoogle Refine includes a mechanism for adding reconciliation service API URLs. After choosing Reconcile->Start Reconciling for a column, look at the bottom of the page for two buttons : "Add Standard Service..." and "Add Namespaced Service..." Each column in Google Refine should only be reconciled against one identity space. To test and debug a new reconciliation service ... TODO | |||||||||||