Modelling Copac data

With the Archives Hub data well under way, it was time to start looking at the Copac data.  The first decision to be made was which version of Copac data to use – consolidated or unconsolidated.  As part of the process of adding records to Copac they are de-duplicated, allowing different institutions’ records for the same item to be presented as one record, instead of several.  For more info on Copac de-duplication, see this blog post.

So our first question was: deal with the individual records from each library, or with the consolidated records created for Copac?  This made us think about the nature of what we were describing. The unconsolidated records (generally!) relate to the actual, physical ‘thing’ – what in FRBR would be the ‘item’.

The consolidated records are closer to (but by no means a perfect example of) the FRBR manifestation.  That is to say, they are describing different physical instances of the same theoretical work; in linked data terms, they are ‘same as’.  They aren’t perfect manifestation level records, as there may be other records on Copac for the same manifestation which haven’t been consolidated due to cataloguing differences.  At Copac, we err on the side of caution, and would rather have this happen, than have records which aren’t the same consolidated into the same record.

So we could do our mapping and our transformations at unconsolidated level, and then use ‘same as’ to link together the descriptions that would later be consolidated in Copac.  But as we’re accepting Copac’s judgement that they are describing the same set of items, why not save ourselves that trouble, and work from the consolidated description?  We can then hang the individual bibliographic records off this central unit of description.

This means that all of the information provided by the different libraries is related to the same unit of description.  The bibliographic records that go together to make up a consolidated Copac record may not contain all of the same information, but they won’t contain any contradictory information.  Thus two records which are the same in all details except date of publication (say 1983 in one, 1984 in the other) will not consolidate, but records which are the same in all details except that one contains a subject where the other does not, will consolidate.

In fact, subjects are one of the things (along with notes) that don’t affect consolidation at all.  We will combine all of the subjects that come in individual descriptions, so that a consolidated record might end up with the subjects:

Management

Management — theory

Management (theoretical)

Business & management

We will leave these in the linked data description for the same reason they are left in the Copac description – while such similar terms may seem superfluous, they actually increase discoverability, by providing multiple access points.  They will link into the central ‘unit of description’, rather than the individual bibliographic records.

Once we’d decided on this central unit of description (name TBD, but likely to be ‘Copac record’ or something boringly similar), other aspects of the description started to fall into place.  Some of these were straightforward – publication date, for instance, is fairly obviously a literal – while others took more thought and discussion.

Among the more complicated issues was that of creator.  We are working with MODS data, which has come from MARC data, and MARC allows you to have only one ‘creator’.  This creator sits in the 100s as the main access point, and all other contributors (including co-authors!) are relegated to the 700s, where they become what we have decided to call ‘other person associated with this unit of description’.  Not very snappy, but hopefully fairly accurate.  In theory, the role that this person has in the creation of the item should be reflected in a MARC indicator, but in practise this is not often included in descriptions.  Where it is indicated that a person (or a corporate body) is an editor, contributor, translator, illustrator etc, we can build these into the modelling; where not, they will have to be satisfied with the vague title of ‘associated person’.

This will work for most situations, but it does still leave room for error.  Where a person is named in the 700s with no indicator of role, it is possible that they are a person who was associated with one particular item, rather than the manifestation – a former owner or bookseller, for example.  While we do want to present this information, which works as another access point, and may be of interest to users, we have the problem that this information should really be associated with the item, not our quasi-manifestation.  This information only concerns one specific physical item, as described in one of the individual bibliographic records. Should it really have a link to our central unit of description?  If not, where do we link it to?  Our entries for individual bib records describe only the records themselves, not a physical real-world item.  It’s an interesting point, and one we’ll be dicussing more as the project goes on.

We’re continuing to work with Copac data, and will discuss other issues here as they arise.