In some (most) cases, these will be newly created URIs, under a domain that we (well, MIMAS and the Archives Hub service) own. For these URIs, the project is responsible for choosing the URIs and putting in place the mechanisms to ensure that their dereferencing results in the provision of some “useful information”. In other cases, we will simply be citing existing URIs, defined by other agencies who (hopefully!) provide for their dereferencing.
Our RDF data is being generated, at least in the first instance, by processing EAD XML documents, so we want to construct our URIs for our “things” from content within those XML documents. And we want to do so in a way that, as far as possible, ensures that each of those URIs is an unambiguous name/referrer, i.e. it identifies a single “thing”, and we don’t end up with a single URI being used for what are in fact two different things. On the other hand, we can live with the case where we end up with multiple URIs, all of which identify a single thing, because information can be added at a later stage to indicate that they are synonyms.
The other point to note is that the initial transformation step is being performed on a “document-by-document basis”, i.e. taking a single EAD document as input and outputting RDF/XML. So for any given resource, the information we generate – including the URI of the resource – is based only on the content of that document (and any generally applicable information we can embed in the transform itself). There may be other data “about” that “thing” in another EAD document but we don’t have access to it at the time of transformation.
Also, it’s desirable that we construct our URIs in such a way that if we need to re-run the transform, we generate the same URIs from the same input data (unless we explicitly decide to change the patterns for some reason).
Finally, although the patterns below often make use of human-readable strings from the EAD document content, I haven’t treated human-readability as a major consideration. Having said that, I’ve tended to make use of (slightly normalised forms of) human-readable strings where possible, rather than, say, creating opaque “hashes”.
As with other aspects of the work, at this stage, this is a first cut at tackling the issue, and we may revise our approaches based on the experience of applying them over the dataset. Having gone through and constructed patterns for the various resource types, looking back over them now, I think I can see a small number of distinct methods that we’ve used:
- Identifiers: For some of these “things”, the EAD documents contain some sort of formally assigned identification code or number, which unambiguously – at least within the scope of the Hub collection – identifies that instance within the set of resources of that type (i.e. it serves as a “reference” in the terms of the Designing URI Sets… document). This is the case, for example, with the languages of the materials, using the did/langmaterial/language/@langcode attribute value. A variant of this is the case where such an identifier can be constructed from a combination of multiple pieces of content. Repositories, for example, can be identified by the pair of country code (ead/eadheader/eadid/@countrycode) and maintenance agency code (ead/eadheader/eadid/@mainagencycode). For these cases a combination of the name of the resource type and that identification code provides the basis for the “reference” part of the URI.
- “Authority-Controlled” Names: For many of the “things”, however, the EAD documents do not contain such a code; rather, they refer to things only by name. In some cases, the form of the name is drawn from an “authority file” – indicated in the EAD document – and the name includes sufficient information (e.g. birth/death dates, titles etc for a person) to make the resulting string an unambiguous referrer within the set of names from that source. For these cases, a combination of a name for the authority file and the name provides the basis for the “reference”. However, this does depend on the creator of the EAD document having accurately transcribed the “authoritative” form of the name, at least sufficiently to maintain unambiguity of reference.
- “Rule-Based” Names: In other cases, the “thing” is named, not using a name from a controlled list, but rather a name constructed according to some codified set of rules, where the rules used are indicated in the EAD document. The intent behind such rules is to try to ensure consistency of form and unambiguity of reference. The National Council of Archives’ Rules for the Construction of Personal, Place and Corporate Names (one of the rule sets recommended to Hub data creators) states “A personal name is constructed by combining mandatory and optional components of the name so that the person concerned can be identified with certainty and distinguished from others bearing similar names. An individual should have only one authorised form of name and each name should apply to only one individual.”Typically, as for the “authority file” case, this is achieved through the inclusion of dates, titles etc for persons. For these cases, a combination of a name for the rules and the name itself should provide the basis for the “reference”. However, in practice, the picture with the Hub data is somewhat more complex. First, in some cases where it is claimed that rules are followed, the content itself indicates that this is not the case. For example, the NCA Rules mandate that a personal name should include “the year in which a person was born or died, the span of years of his/her lifetime or the approximate period covered by his/her activities”, even if those dates are estimated. But there are cases in the data marked up as following the NCA Rules which do not meet this requirement – e.g. personal names providing only surname and forename with no dates – , which I suspect may result in ambiguous references. Second, even where the rule is followed and the mandatory components are present, the distributed nature of Hub data creation means that I suspect there is still some possibility that a single personal name may be used in two different sources to refer to what in fact are two different people (Consider e.g. the case of two data providers using the name “Smith, John, fl 1920-1950”).
- “Locally-Scoped” Names: In other cases, the form of the name is neither authority-controlled nor rule-based, but nevertheless there is some expectation that the form of the name used is sufficient to make it an unambiguous referrer within some context. This is the case, for example, with the content of the did/origination element. The difficulty, however, is in establishing reliably what that context is. What is that “local scope”? We’ve tentatively taken the approach that such names have been constructed in such a way as at least to be unambiguous within the collection of submissions to the Hub by a single repository. So by combining the repository identifier and the name, hopefully, we can arrive at a “reference” which avoids ambiguity. Again, it may turn out that this assumption is unreliable, and results in ambiguous references, so we may need to revisit this approach.
- “Identifier Inheritance”: (I’m sure there must be a formal term for this but I’m not sure what it is!) In these cases the EAD document does not provide an unambiguous name for the “thing” itself; however the “thing” has a simple relationship with some other “thing” for which identification fits into one of the other categories. Where the relationship is one-to-one, a URI can be constructed by adopting the pattern for that other “thing” and substituting the name of the resource type. An example of this is the case of the “biographical history” associated with a “unit of description”. The unit of description has an identifier (based on a pattern described below) and since – in data constructed using the Hub template – each unit has at most one biographical history, replacing the “unit” resource type name with a “bioghist” resource type name gives us a suitable URI path, e.g. for a unit of description for which the URI path contains “/unit/gb15abc”, the URI for the biographical history would contain “/bioghist/gb15abc”.A variant of this is the case where the relationship is many-to-one, rather than one-to-one. Here the approach needs to be extended to include e.g. a sequence number to distinguish the multiple “things”. This is the approach taken for the Unit of Description, where a “child” (“part”) unit of description uses the URI of the “parent” (“whole”) unit suffixed with a sequence number, e.g. for a unit of description for which the URI path contains “/unit/gb15abc”, the URIs for the “child” units would contain “/unit/gb15abc-1”, “/unit/gb15abc-2” and so on. In theory, this should not be necessary as the unitid for a unit should be unique within an EAD document, but in practice we’ve found that this is not the case in the actual data. (In this case, the identifier would be “reproducable” only if any new units are inserted at the end of a sequence rather than in the middle).
So, with the caveat above that this is all somewhat tentative at this stage, I summarise below the approaches taken to generating URIs for instances of each of the classes in the Hub model. Note that sometimes, an instance of the same class is generated in different “contexts” within the EAD document, and in these cases different rules for URI construction may be applied in those different contexts, depending on the information available within the EAD document.
We haven’t yet finalised the domain name we’ll be using, so for the purposes of the following, {root} represents the domain and the first part of the path. Italicised text is used for the URI patterns (or parts of them); bold text is used for XPath(-ish!) representations of the source of data within the EAD XML document.
Finding Aid
Pattern(s)
{root}/id/findingaid/{eadid}
- eadid
- normalised form of ead/eadheader/eadid
Example:
{root}/id/findingaid/gb15sirernesthenryshackleton
EAD document
Pattern(s)
{root}/id/EAD/{eadid}
- eadid
- normalised form of ead/eadheader/eadid
Example(s)
{root}/id/ead/gb15sirernesthenryshackleton
Repository (Agent)
Pattern(s)
{root}/id/repository/{repositoryid}
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
Example(s)
{root}/id/repository/gb15
Repository (Place)
Pattern(s)
{root}/id/place/{repositoryid}
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
Example(s)
{root}/id/place/gb15
Unit of Description
Pattern(s)
{root}/id/unit/{unitid}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
Note: In principle, it should be possible to use c/unitid content rather than position in tree, but in practice, there are cases where unitid content is not unique within the EAD document.
Example(s)
{root}/id/unit/gb15sirernesthenryshackleton
{root}/id/unit/gb15sirernesthenryshackleton-1
Level
Pattern(s)
{root}/id/level/{level-name}
- level-name
- archdesc/@level or archdesc/@otherlevel or c{n}/@level or c{n}/@otherlevel
Example(s)
{root}/id/level/fonds
Language
Pattern(s)
http://lexvo.org/id/iso639-3/{langcode}
Note: use existing lexvo.org URIs for languages.
- langcode
- did/langmaterial/language/@langcode
Example(s)
http://lexvo.org/id/iso639-3/eng
Creation (Event)
Pattern(s)
{root}/id/creation/{unitid}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
Example(s)
{root}/id/creation/gb15sirernesthenryshackleton
Creation (Time)
Pattern(s)
{root}/id/creationtime/{unitid}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
Example(s)
{root}/id/creationtime/gb15sirernesthenryshackleton
Extent
Pattern(s)
{root}/id/extent/{unitid}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
Example(s)
{root}/id/extent/gb15sirernesthenryshackleton
Biographical History
Pattern(s)
{root}/id/bioghist/{unitid}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
Example(s)
{root}/id/bioghist/gb15sirernesthenryshackleton
Concept (Origination)
Pattern(s)
{root}/id/concept/agent/{repositoryid}/{origination-name}
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
Example(s)
{root}/id/concept/agent/gb15/sirernesthenryshackleton
Agent (Origination)
Pattern(s)
{root}/id/agent/{repositoryid}/{origination-name}
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
Example(s)
{root}/id/agent/gb15/sirernesthenryshackleton
Concept (ControlAccess – Subject)
Pattern(s)
{root}/id/concept/{source}/{subject-name}
{root}/id/concept/{repositoryid}/{subject-name}
- source
- controlaccess/subject/@source
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- subject-name
- normalised form of controlaccess/subject
Example(s)
{root}/id/concept/lcsh/antiquities
Concept (ControlAccess – Persname)
Pattern(s)
{root}/id/concept/person/{source}/{person-name}
{root}/id/concept/person/{rules}/{person-name}
{root}/id/concept/person/{repositoryid}/{person-name}
- source
- controlaccess/persname/@source
- rules
- controlaccess/persname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- person-name
- normalised form of controlaccess/persname/
Example(s)
{root}/id/concept/person/nra/shackletonernesthenry1874-1922sirknightexplorer
{root}/id/concept/person/ncarules/holdenwendyfl1990cartoonist
{root}/id/concept/person/gb1832/berlinisaiah1909-1997sirknighthistorian
Person (ControlAccess – Persname)
Pattern(s)
{root}/id/person/{source}/{person-name}
{root}/id/person/{rules}/{person-name}
{root}/id/person/{repositoryid}/{person-name}
- source
- controlaccess/persname/@source
- rules
- controlaccess/persname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- person-name
- normalised form of controlaccess/persname/
Example(s)
{root}/id/person/nra/shackletonernesthenry1874-1922sirknightexplorer
{root}/id/person/ncarules/holdenwendyfl1990cartoonist
{root}/id/person/gb1832/berlinisaiah1909-1997sirknighthistorian
Concept (ControlAccess – Famname)
Pattern(s)
{root}/id/concept/family/{source}/{family-name}
{root}/id/concept/family/{rules}/{family-name}
{root}/id/concept/family/{repositoryid}/{family-name}
- source
- controlaccess/famname/@source
- rules
- controlaccess/famname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- family-name
- normalised form of controlaccess/famname/
Example(s)
{root}/id/concept/family/nra/dundasviscountsmelvilledunira
{root}/id/concept/family/ncarules/boucicault
Family (ControlAccess – Famname)
Pattern(s)
{root}/id/family/{source}/{family-name}
{root}/id/family/{rules}/{family-name}
{root}/id/family/{repositoryid}/{family-name}
- source
- controlaccess/famname/@source
- rules
- controlaccess/famname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- family-name
- normalised form of controlaccess/famname/
Example(s)
{root}/id/family/nra/dundasviscountsmelvilledunira
{root}/id/family/ncarules/boucicault
Concept (ControlAccess – Corpname)
Pattern(s)
{root}/id/concept/organisation/{source}/{org-name}
{root}/id/concept/organisation/{rules}/{org-name}
{root}/id/concept/organisation/{repositoryid}/{org-name}
- source
- controlaccess/corpname/@source
- rules
- controlaccess/corpname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- org-name
- normalised form of controlaccess/corpname/
Example(s)
{root}/id/concept/organisation/nra/britishbroadcastingcorporation
{root}/id/concept/organisation/aacr2/dailymail%28london%2Cengland%29
{root}/id/concept/organisation/gb1578/vizards%2Csolicitors%2Cmonmouth
Organisation (ControlAccess – Corpname)
Pattern(s)
{root}/id/organisation/{source}/{org-name}
{root}/id/organisation/{rules}/{org-name}
{root}/id/organisation/{repositoryid}/{org-name}
- source
- controlaccess/corpname/@source
- rules
- controlaccess/corpname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- org-name
- normalised form of controlaccess/corpname/
Example(s)
{root}/id/organisation/nra/britishbroadcastingcorporation
{root}/id/organisation/aacr2/dailymail%28london%2Cengland%29
{root}/id/organisation/gb1578/vizards%2Csolicitors%2Cmonmouth
Concept (ControlAccess – Geogname)
Pattern(s)
{root}/id/concept/place/{source}/{place-name}
{root}/id/concept/place/{rules}/{place-name}
{root}/id/concept/place/{repositoryid}/{place-name}
- source
- controlaccess/geogname/@source
- rules
- controlaccess/geogname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- place-name
- normalised form of controlaccess/geogname/
Example(s)
{root}/id/concept/place/lcsh/mcmurdosound%28antarctica%29
{root}/id/concept/place/ncarules/canada
{root}/id/concept/place/gb982/meirionethshire%28wales%29
Place (ControlAccess – Geogname)
Pattern(s)
{root}/id/place/{source}/{place-name}
{root}/id/place/{rules}/{place-name}
{root}/id/place/{repositoryid}/{place-name}
- source
- controlaccess/geogname/@source
- rules
- controlaccess/geogname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- place-name
- normalised form of controlaccess/geogname/
Example(s)
{root}/id/place/lcsh/mcmurdosound%28antarctica%29
{root}/id/place/ncarules/canada
{root}/id/place/gb982/meirionethshire%28wales%29
Concept (ControlAccess – GenreForm)
Pattern(s)
{root}/id/concept/{source}/{genreform-name}
{root}/id/concept/{rules}/{genreform-name}
{root}/id/concept/{repositoryid}/{genreform-name}
- source
- controlaccess/genreform/@source
- rules
- controlaccess/genreform/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- genreform-name
- normalised form of controlaccess/genreform
Example(s)
{root}/id/concept/aat/buildingplans
Concept (ControlAccess – Function)
Pattern(s)
{root}/id/concept/{source}/{function-name}
{root}/id/concept/{rules}/{function-name}
{root}/id/concept/{repositoryid}/{function-name}
- source
- controlaccess/function/@source
- rules
- controlaccess/function/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- function-name
- normalised form of controlaccess/function
Example(s)
{root}/id/concept/agift/miningregulations
Book
Pattern(s)
{root}/id/document/{title}
- source
- controlaccess/title/@source
- rules
- controlaccess/title/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- title
- normalised form of controlaccess/title
Example(s)
{root}/id/document/aacr2/thecastlediaries1974-761980
Birth (Event)
Pattern(s)
{root}/id/birth/{source}/{person-name}
{root}/id/birth/{rules}/{person-name}
{root}/id/birth/{repositoryid}/{person-name}
- source
- controlaccess/persname/@source
- rules
- controlaccess/persname/@rules
- repositoryid
- normalised form of concatentation of ead/eadheader/eadid/@countrycode and ead/eadheader/eadid/@mainagencycode
- person-name
- normalised form of controlaccess/persname/
Example(s)
{root}/id/birth/nra/shackletonernesthenry1874-1922sirknightexplorer
{root}/id/birth/ncarules/allenjim1926-1999playwright
{root}/id/birth/gb1832/berlinisaiah1909-1997sirknighthistorian
Object
Pattern(s)
{object-uri}
- object-uri
- dao/@href or daogrp/daoloc/@href
Example(s)
http://library.kent.ac.uk/library/special/html/specoll/jack.gif
Object Group
Pattern(s)
{root}/id/group/{unitid}-{groupno}
- unitid
- normalised form of archdesc/did/unitid and position within archdesc/dsc/c tree
- groupno
- position within daogrp sequence for archdesc or c{n}
Example(s)
{root}/id/group/gb0254ms274-1
Time Interval (Year, Month, Day)
i.e. specific intervals of time.
Pattern(s)
http://reference.data.gov.uk/id/year/{yyyy}
http://reference.data.gov.uk/id/month/{yyyy}-{mm}
http://reference.data.gov.uk/id/day/{yyyy}-{mm}-{dd}
Note: use existing reference.data.gov.uk URIs for intervals.
- langcode
- did/langmaterial/language/@langcode
Example(s)
http://reference.data.gov.uk/id/year/1921
http://reference.data.gov.uk/id/month/1921-06
http://reference.data.gov.uk/id/day/1921-06-03