I spent the last couple of days in Manchester at the “end of programme” meeting for the JISCexpo programme under which LOCAH is funded. It was a pretty busy couple of days with representatives of all the projects talking about their projects and their experiences and some of the issues arising.

Yesterday I found myself as “scribe” for a discussion on the “co-referencing” question, i.e. how to deal with the fact that different data providers assign and use different URIs for “the same thing”. And these are my rather hasty notes of that discussion.

  • the creation/use of co-references is inevitable; people will always end up creating URIs for things for which URIs already exist;
  • one approach to this problem has been the use of the owl:sameAs property. However, using this property makes a very “strong” assertion of equivalence with consequences in terms of inferencing
  • the actual use of properties sometimes introduces a dimension of “social/community semantics” that may be at odds with the “semantics” provided by the creator/owner of a term
  • the notion of “sameness” is often qualified by a degree of confidence, a “similarity score”, rather than being a statement of certainty
  • the notion of “sameness”/similarity is often context-sensitive: rather than saying “X and Y are names for the same thing in all contexts”, we probably want to say something closer to “for the purposes of this application, or in this context, it’s sufficient to work on the basis that X and Y are names for the same thing”
  • is there a contrast between approaches based on “top-down” “authority” and those based more on context-dependent “grouping”?
  • how do we “correct” assertions which turn out to be “wrong”?
  • we decide whether to make use of such assertions made by other parties, and those decisions are based on an understanding of their source: who made them, on what basis etc.
  • such assessment may include a consideration of how many sources made/support an assertion
  • it is easy for assertions of similarity to become “detached” from such information about provenance/attribution (if it is provided at all!)

Some references:

Explaining Linked Data to Your Pro Vice Chancellor

At the JISCEXPO Programme meeting today I led a session on ‘Explaining linked data to your Pro Vice Chancellor’, and this post is a summary of that session. The attendees were: myself (Adrian Stevenson), Rob Hawton, Alex Dutton, and Zeth, with later contributions from Chris Gutteridge.

It seemed clear to us that this is really about focussing on institutional administrative data, as it’s probably harder to sell the idea of providing research data in linked data form to the Pro VC. Linked data probably doesn’t allow you to do things that couldn’t do by other means, but it is easier than other approaches in the long run, once you’ve got your linked data available. Linked Data can be of value without having to be open:

“Southampton’s data is used internally. You could draw a ring around the data and say ‘that’s closed’, and it would still have the same value.”

== Benefits ==

Quantifying the value of linked data efficiencies can be tricky, but providing open data allows quicker development of tools, as the data the tools hook into already exist and are standardised.

== Strategies ==

Don’t mention the term ‘linked data’ to the Pro VC, or get into discussing the technology. It’s about the outcomes and the solutions, not the technologies. Getting ‘Champions’ who have the ear of the Pro VC will help.  Some enticing prototype example mash-up demonstrators that help sell the idea are also important. Also, pointing out that other universities are deploying and using linked open data to their advantage may help. Your University will want to be part of the club.

Making it easy for others to supply data that can be utilised as part of linked data efforts is important. This can be via Google spreadsheets, or e-mailing spreadsheets for example. You need to offload the difficult jobs to the people who are motivated and know what they’re doing.

It will also help to sell the idea to other potential consumers, such as the libraries, and other data providers. Possibly sell on the idea of the “increasing prominence of holdings” for libraries. This helps bring attention and re-use.

It’s worth emphasising that linked data simplifies the Freedom of Infomataion (FOI) process.  We can say “yes, we’ve already published that FOI data”. You have a responsibility to publish this data if asked via FOI anyway. This is an example of a Sheer curation approach.

Linked data may provide decreased bureaucracy. There’s no need to ask other parts of the University for their data, wasting their time, if it’s already published centrally. Examples here are estates, HR, library, student statistics.

== Targets ==

Some possible targets are: saving money, bringing in new business, funding, students.

The potential for increased business intelligence is a great sell, and Linked Data can provide the means to do this. Again, you need to sell a solution to a problem, not a technology. The University ‘implementation’ managers need to be involved and brought on board as well as the as the Pro VC.

It can be a problem that some institutions adopt a ‘best of breed’ policy with technology. Linked data doesn’t fit too well with this. However, it’s worth noting that Linked Data doesn’t need to change the user experience.

A lot of the arguments being made here don’t just apply to linked data. Much is about issues such as opening access to data generally. It was noted that there have been many efforts from JISC to solve the institutional data silo problem.

If we were setting a new University up from scratch, going for Linked Data from the start would be a realistic option, but it’s always hard to change currently embedded practice. Universities having Chief Technology Officers would help here, or perhaps a PVC for Technology?

Putting the Case for Linked Data

This is a summary of a break-out group discussion at the JISC Expo Programme meeting, July 2011, looking at ‘Skills required for Linked Data’.

We started off by thinking about the first steps when deciding to create Linked Data. We took a step back from the skills required and thought more about the understanding and the basic need and the importance of putting the case for Linked Data (or otherwise).

Do you have suitable data

Firstly, do you have data that is suitable to output as Linked Data. This comes down to the question: what is suitable data? It would be useful to provide more advise in this area.

Is Linked Data worth doing?

Why do you want Linked Data? Maybe you are producing data that others will find interesting and link into? If you give your data identifiers, others can link into it. But is Linked Data the right approach? Is what you really want open data more than Linked Data? Or just APIs into the data? Sometimes a simpler solution may give you the benefits that you are after.

Maybe for some organisations approaches other than Linked Data are appropriate, or are a way to start off – maybe just something as simple as outputting CSV. You need to think about what is appropriate for you.  By putting your data out in a more low-barrier way, you maybe able to find out more about who might use your data. However, there is an argument that it is very early days for Linked Data, and low levels of use right now may not reflect the potential for the data and how it is used in the future.

Are you the authority on the data? Is someone else the authority? Do you want to link into their stuff? These are the sorts of questions you need to be thinking about.

The group agreed that use cases would be useful here. They could act as a hook to bring people in. Maybe there should be somewhere to go to look up use cases – people can get a better idea of how they (their users) would benefit from creating Linked Data, they can see what others have done and compare their situation.

We talked around the issues involved in making a case for Linked Data. It would be useful if there was more information for people on how it can bring things to the fore. For example, we talked about set of photographs – a single photograph might be seen in a new context, with new connections made that can help to explain it and what it signifies.

What next?

There does appear to be a move of Linked Data a ‘clique’ into the mainstream – this should make it easier to understand and engage with. There are more tutorials, more support, more understanding. New tools will be developed that will make the process easier.

You need to think about different skills – data modelling and data transformation are very different things. We agreed that development is not always top down. Developers can be very self-motivated, in an environment where a continual learning process is often required. It may be that organisations will start to ask for skills around Linked Data when hiring people.

We felt that there is still a need for more support and more tutorials. We should move towards a critical mass, where questions raised are being answered and developers have more of a sense that there is help out there and they will get those answers. It can really help talking to other developers, so providing opportunities for this is important. The JISC Expo projects were tasked with providing documentation – explaining what they have done clearly to help others. We felt that these projects have helped to progress the Linked Data agenda and that it is an important encouraging people to acquire these skills to require processes and results to be written up.

Realistically, for many people, expertise needs to be brought in. Most organisations do not have resources to call upon. Often this is going to be cheaper than up-skilling – a steep learning curve can take weeks or months to negotiate whereas someone expert in this domain could do the work in just a few days. We talked about a role for (JISC) data centres in contributing to this kind of thing. However, we did acknowledge the important contribution that conferences, workshops and other events play in getting people familiar with Linked Data from a range of perspectives (as users of the data as well as providers). It can be useful to have tutorials that address your particular domain – data that you are familiar with.   Maybe we need a combination of approaches – it depends where you are starting from and what you want to know.  But for many people, the need to understand why Linked Data is useful and worth doing is an essential starting point.

We saw the value in having someone involved who is outward facing – otherwise there is a danger of a gap between the requirements of people using your data and what you are doing. There is a danger of going off in the wrong direction.

We concluded that for many, Linked Data is still a big hill to climb. People do still need hand-ups. We also agreed that Linked Data will get good press if there are products that people can understand – they need to see the benefits.

As maybe there is still an element of self-doubt about Linked Data, it is essential not just to output the data but to raise its profile, to advocate what you have done and why. Enthusiasm can start small but it can quickly spread out.

Finally, we agreed that people don’t always know where products are built around Linked Data. So, you may not realise how it is benefitting you. We need to explain what we have done as well as providing the attractive interface/product and we need to relate it to what people are familiar with.








Final Product Post: Archives Hub EAD to RDF XSLT Stylesheet

Archives Hub EAD to RDF XSLT Stylesheet

Please note: Although this is the ‘final’ formal post of the LOCAH JISC project, it will not be the last post. Our project is due to complete at the end of July, and we still have plenty to do, so there’ll more blog posts to come.

User this product is for: Archives Hub contributors, EAD aware archivists, software developers, technical librarians, JISC Discovery Programme (SALDA Project), BBC Digital Space.

Description of prototype/product:

We consider the Archives Hub EAD to RDF XSLT stylesheet to be a key product of the Locah project. The stylesheet encapsulates both the Locah developed Linked Data model and provides a simple standards-based means to transform archival data to Linked Data RDF/XML. The stylesheet can straightforwardly be re-used and re-purposed by anyone wishing to transform archival data in EAD form to Linked Data ready RDF/XML.

The stylesheet is available directly from

The stylesheet is the primary source from which we were able to develop, our main access point to the Archives Hub Linked Data. provides access to both human and machine-readable views of our Linked Data, as well as access to our SPARQL endpoint for querying the Hub data and a bulk download of the entire Locah Archives Hub Linked Dataset.

The stylesheet also provided the means necessary to supply data for our first ‘Timemap’ visualisation prototype. This visualisation currently allows researchers to access the Hub data by a small range of pre-selected subjects: travel and exploration, science and politics. Having selected a subject, the researcher can then drag a time slider to view the spread of a range of archive sources through time. If a researcher then selects an archive she/he is interested in on the timeline, a pin appears on the map below showing the location of the archive, and an call out box appears providing some simple information such as the title, size and dates of the archive. We hope to include data from other Linked Data sources, such as Wikipedia in these information boxes.

This visualisation of the Archives Hub data and links to other data sets provides an intuitive view to the user that would be very difficult to provide by means other than exploiting the potential of Linked Data.

Please note these visualisations are currently still work in progress:

Screenshots: home page:

Screenshot of homepage

Screenshot of homepage

Prototype visualisation for subject ‘science’ (work in progress):

Screenshot of Locah Visualisation for subject 'science'

Locah Visualisation for subject ‘science’

Working prototype/product:

There are a large number of resources available on the Web for using XSLT stylesheets, as well as our own ‘XSLT’ tagged blog posts.

Instructional documentation:

Our instructional documentation can be found in a series of posts, all tagged with ‘instructionaldocs‘. We provide instructional posts on the following main topics:

Project tag: locah

Full project name: Linked Open Copac Archives Hub

Short description: A JISC-funded project working to make data from Copac and the Archives Hub available as Linked Data.

Longer description: The Archives Hub and Copac national services provide a wealth of rich inter- disciplinary information that we will expose as Linked Data. We will be working with partners who are leaders in their fields: OCLC, Talis and Eduserv. We will be investigating the creation of links between the Hub, Copac and other data sources including DBPedia, and the BBC, as well as links with OCLC for name authorities and with the Library of Congress for subject headings.This project will put archival and bibliographic data at the heart of the Linked Data Web, making new links between diverse content sources, enabling the free and flexible exploration of data and enabling researchers to make new connections between subjects, people, organisations and places to reveal more about our history and society.

Key deliverables: Output of structured Linked Data for the Archives Hub and Copac services. A prototype visualisation for browsing archives by subject, time and location. Opportunities and barriers reporting via the project blog.

Lead Institution: UKOLN, University of Bath

Person responsible for documentation: Adrian Stevenson

Project Team: Adrian Stevenson, Project Manager (UKOLN); Jane Stevenson, Archives Hub Manager (Mimas); Pete Johnston, Technical Researcher (Eduserv); Bethan Ruddock, Project Officer (Mimas); Yogesh Patel, Software Developer (Mimas); Julian Cheal, Software Developer (UKOLN). Read more about the LOCAH Project team.

Project partners and roles: Talis are our technology partner on the project, providing us with access to store our data in the Talis Store. Leigh Dodds and Tim Hodson are our main contacts at the company. OCLC also partnered, mainly to help with VIAF. Our contacts at OCLC are John MacColl, Ralph LeVan and Thom Hickey. Ed Summers is also helping us out as a voluntary consultant.

The address of the LOCAH Project blog is . The main atom feed is

All reusable program code produced by the Locah project will be available as free software under the Apache License 2. You will be able to get the code from our project sourceforge repository.

The LOCAH dataset content is licensed under a Creative Commons CC0 1.0 licence.

The contents of this blog are available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

LOCAH Datasets
LOCAH Blog Content
Locah Code

Project start date: 1st Aug 2010
Project end date: 31st July 2011
Project budget: £100,000

LOCAH was funded by JISC as part of the #jiscexpo programme. See our JISC PIMS project management record.

Transforming EAD XML into RDF/XML using XSLT

This is a (brief!) second post revisiting my “process” diagram from an early post. Here I’ll focus on the “transform” process on the left of the diagram:

Diagram showing process of transforming EAD to RDF and exposing as Linked Data

The “transform” process is currently performed using XSLT to read an EAD XML document and output RDF/XML, and the current version of the stylesheet is now available:

(The data currently available via was actually generated using the previous version The 20110630 version includes a few tweaks and bug fixes which will be reflected when we reload the data, hopefully within the next week.)

As I’ve noted previously, we initially focused our efforts on processing the set of EAD documents held by the Archives Hub, and on the particular set of markup conventions recommended by the Hub for data contributors – what I sometimes referred to as the Archives Hub EAD “profile” – though in practice, the actual dataset we’ve worked with encompasses a good degree of variation. But it remains the case that the transform is really designed to handle the set of EAD XML documents within that particular dataset rather than EAD in general. (I admit that it also remains somewhat “untidy” – the date handling is particularly messy! And parts of it were developed in a rather ad hoc fashion as I amended things as I encountered new variations in new batches of data. I should try to spend some time cleaning it up before the end of the project.)

Over the last few months, I’ve also been working on another JISC-funded project, SALDA, with Karen Watson and Chris Keene of the University of Sussex Library, focusing on making available their catalogue data for the Mass Observation Archive as Linked Data.

I wrote a post over on the SALDA blog on how I’d gone about applying and adapting the transform we developed in LOCAH for use with the SALDA data. That work has prompted me to think a bit more about the different facets of the data and how they are reflected in aspects of the transform process:

  • aspects which are generic/common to all EAD documents
  • aspects which are common to some quite large subset of EAD documents (like the Archives Hub dataset, with its (more or less) common set of conventions)
  • aspects which are “generic” in some way, but require some sort of “local” parameterisation – here, I’m thinking of the sort of “name/keyword lookup” techniques I describe in the SALDA post: the technique is broadly usable but the “lookup tables” used would vary from one dataset to another
  • aspects which reflect very specific, “local” characteristics of the data – e.g., some of the SALDA processing is based on testing for text patterns/structures which are very particular to the Mass Observation catalogue data

What I’d like to do (but haven’t done yet) is to reorganise the transform to try to make it a little more “modular” and to separate the “general”/”generic” from the “local”/”specific”, so that it might be easier for other users to “plug in” components more suitable for their own data.