I spent the last couple of days in Manchester at the “end of programme” meeting for the JISCexpo programme under which LOCAH is funded. It was a pretty busy couple of days with representatives of all the projects talking about their projects and their experiences and some of the issues arising.
Yesterday I found myself as “scribe” for a discussion on the “co-referencing” question, i.e. how to deal with the fact that different data providers assign and use different URIs for “the same thing”. And these are my rather hasty notes of that discussion.
- the creation/use of co-references is inevitable; people will always end up creating URIs for things for which URIs already exist;
- one approach to this problem has been the use of the owl:sameAs property. However, using this property makes a very “strong” assertion of equivalence with consequences in terms of inferencing
- the actual use of properties sometimes introduces a dimension of “social/community semantics” that may be at odds with the “semantics” provided by the creator/owner of a term
- the notion of “sameness” is often qualified by a degree of confidence, a “similarity score”, rather than being a statement of certainty
- the notion of “sameness”/similarity is often context-sensitive: rather than saying “X and Y are names for the same thing in all contexts”, we probably want to say something closer to “for the purposes of this application, or in this context, it’s sufficient to work on the basis that X and Y are names for the same thing”
- is there a contrast between approaches based on “top-down” “authority” and those based more on context-dependent “grouping”?
- how do we “correct” assertions which turn out to be “wrong”?
- we decide whether to make use of such assertions made by other parties, and those decisions are based on an understanding of their source: who made them, on what basis etc.
- such assessment may include a consideration of how many sources made/support an assertion
- it is easy for assertions of similarity to become “detached” from such information about provenance/attribution (if it is provided at all!)
- “Identity Links” in Tom Heath and Chris Bizer, Linked Data: Evolving the Web into a Global Data Space
- Glaser, H., Millard, I., Jaffri, A., Lewy, T. and Dowling, B. (2008). “On Coreference and The Semantic Web”
- Ben O’Steen, “Bundling” instances of author names together without using owl:sameAs”