LOCAH continues as the ‘Linking Lives’ Project

On doing a bit of spring cleaning around here, I’ve noticed that we haven’t been linking very clearly to the project blog for ‘Linking Lives‘, the Locah continuation project, so here it is:


Linking Lives logo

Linking Lives is exploring ways to present Linked Data. It’s aiming to show that archives can benefit from being presented as a part of the diverse data sources on the Web to create full biographical pictures, enabling researchers to make connections between people and events.

Here’s the blurb from the Linking Lives ‘About Us’ page:

“The Linking Lives project (2011-12) is a follow on from the Locah project (2010-11) that created Linked Data for a sub-set of Archives Hub and Copac data. The Locah blog documents the whole process, from the data modelling through to decisions about URIs, external datasets and visualisation work.

The primary aim of Linking Lives is to explore ways to present Linked Data for the benefit of research. The Archives Hub data is rich in information about people and organisations, but many researchers want to access a whole range of data sources in order to get a full perspective for their research. We should recognise that researchers may not just be interested in archives. Indeed, they may not really have thought about using primary source material, but they may be very interested in biographical information, known and unknown connections, events during a person’s lifetime, etc. We want to show that archives can benefit from being presented not in isolation, but as a part of all of the diverse data sources that can be found to create a full biographical picture, and to enable researchers to make connections between people and events to create different narratives.

We will create a new Web interface that presents useful resources relating to individual people, and potentially organisations as well. We will explore various external data sources, assessing their viability and ease of use from both a Linked Data perspective (adding them to our Linked Data output) and a researcher’s perspective (adding them to the user interface).

We have many ideas about what we can do – the possibilities for this type of work are endless – but with limited time and resources we will have to prioritise, test out various options and see what works and what doesn’t and what each option requires to implement.

In addition to the creation of an interface, we want to think about the pressing issues for Linked Data: provenance, trust, authenticity. By creating an interface for researchers, we will be able to gain a greater appreciation of whether this type of approach is effective. We will be evaluating the work, asking researchers to feedback to us, and, of course, we will also be able to see evidence of use of the site through our Web logs.

We’ll be updating you via this blog, and we are very interested in any thoughts that you have about the work, so please do leave comments, or contact us directly.”

Explaining Linked Data to Your Pro Vice Chancellor

At the JISCEXPO Programme meeting today I led a session on ‘Explaining linked data to your Pro Vice Chancellor’, and this post is a summary of that session. The attendees were: myself (Adrian Stevenson), Rob Hawton, Alex Dutton, and Zeth, with later contributions from Chris Gutteridge.

It seemed clear to us that this is really about focussing on institutional administrative data, as it’s probably harder to sell the idea of providing research data in linked data form to the Pro VC. Linked data probably doesn’t allow you to do things that couldn’t do by other means, but it is easier than other approaches in the long run, once you’ve got your linked data available. Linked Data can be of value without having to be open:

“Southampton’s data is used internally. You could draw a ring around the data and say ‘that’s closed’, and it would still have the same value.”

== Benefits ==

Quantifying the value of linked data efficiencies can be tricky, but providing open data allows quicker development of tools, as the data the tools hook into already exist and are standardised.

== Strategies ==

Don’t mention the term ‘linked data’ to the Pro VC, or get into discussing the technology. It’s about the outcomes and the solutions, not the technologies. Getting ‘Champions’ who have the ear of the Pro VC will help.  Some enticing prototype example mash-up demonstrators that help sell the idea are also important. Also, pointing out that other universities are deploying and using linked open data to their advantage may help. Your University will want to be part of the club.

Making it easy for others to supply data that can be utilised as part of linked data efforts is important. This can be via Google spreadsheets, or e-mailing spreadsheets for example. You need to offload the difficult jobs to the people who are motivated and know what they’re doing.

It will also help to sell the idea to other potential consumers, such as the libraries, and other data providers. Possibly sell on the idea of the “increasing prominence of holdings” for libraries. This helps bring attention and re-use.

It’s worth emphasising that linked data simplifies the Freedom of Infomataion (FOI) process.  We can say “yes, we’ve already published that FOI data”. You have a responsibility to publish this data if asked via FOI anyway. This is an example of a Sheer curation approach.

Linked data may provide decreased bureaucracy. There’s no need to ask other parts of the University for their data, wasting their time, if it’s already published centrally. Examples here are estates, HR, library, student statistics.

== Targets ==

Some possible targets are: saving money, bringing in new business, funding, students.

The potential for increased business intelligence is a great sell, and Linked Data can provide the means to do this. Again, you need to sell a solution to a problem, not a technology. The University ‘implementation’ managers need to be involved and brought on board as well as the as the Pro VC.

It can be a problem that some institutions adopt a ‘best of breed’ policy with technology. Linked data doesn’t fit too well with this. However, it’s worth noting that Linked Data doesn’t need to change the user experience.

A lot of the arguments being made here don’t just apply to linked data. Much is about issues such as opening access to data generally. It was noted that there have been many efforts from JISC to solve the institutional data silo problem.

If we were setting a new University up from scratch, going for Linked Data from the start would be a realistic option, but it’s always hard to change currently embedded practice. Universities having Chief Technology Officers would help here, or perhaps a PVC for Technology?

Final Product Post: Archives Hub EAD to RDF XSLT Stylesheet

Archives Hub EAD to RDF XSLT Stylesheet

Please note: Although this is the ‘final’ formal post of the LOCAH JISC project, it will not be the last post. Our project is due to complete at the end of July, and we still have plenty to do, so there’ll more blog posts to come.

User this product is for: Archives Hub contributors, EAD aware archivists, software developers, technical librarians, JISC Discovery Programme (SALDA Project), BBC Digital Space.

Description of prototype/product:

We consider the Archives Hub EAD to RDF XSLT stylesheet to be a key product of the Locah project. The stylesheet encapsulates both the Locah developed Linked Data model and provides a simple standards-based means to transform archival data to Linked Data RDF/XML. The stylesheet can straightforwardly be re-used and re-purposed by anyone wishing to transform archival data in EAD form to Linked Data ready RDF/XML.

The stylesheet is available directly from http://data.archiveshub.ac.uk/xslt/ead2rdf.xsl

The stylesheet is the primary source from which we were able to develop data.archiveshub.ac.uk, our main access point to the Archives Hub Linked Data. Data.archiveshub.ac.uk provides access to both human and machine-readable views of our Linked Data, as well as access to our SPARQL endpoint for querying the Hub data and a bulk download of the entire Locah Archives Hub Linked Dataset.

The stylesheet also provided the means necessary to supply data for our first ‘Timemap’ visualisation prototype. This visualisation currently allows researchers to access the Hub data by a small range of pre-selected subjects: travel and exploration, science and politics. Having selected a subject, the researcher can then drag a time slider to view the spread of a range of archive sources through time. If a researcher then selects an archive she/he is interested in on the timeline, a pin appears on the map below showing the location of the archive, and an call out box appears providing some simple information such as the title, size and dates of the archive. We hope to include data from other Linked Data sources, such as Wikipedia in these information boxes.

This visualisation of the Archives Hub data and links to other data sets provides an intuitive view to the user that would be very difficult to provide by means other than exploiting the potential of Linked Data.

Please note these visualisations are currently still work in progress:


Data.archiveshub.ac.uk home page:

Screenshot of data.archiveshub.ac.uk homepage

Screenshot of data.archiveshub.ac.uk homepage

Prototype visualisation for subject ‘science’ (work in progress):

Screenshot of Locah Visualisation for subject 'science'

Locah Visualisation for subject ‘science’

Working prototype/product:


There are a large number of resources available on the Web for using XSLT stylesheets, as well as our own ‘XSLT’ tagged blog posts.

Instructional documentation:

Our instructional documentation can be found in a series of posts, all tagged with ‘instructionaldocs‘. We provide instructional posts on the following main topics:

Project tag: locah

Full project name: Linked Open Copac Archives Hub

Short description: A JISC-funded project working to make data from Copac and the Archives Hub available as Linked Data.

Longer description: The Archives Hub and Copac national services provide a wealth of rich inter- disciplinary information that we will expose as Linked Data. We will be working with partners who are leaders in their fields: OCLC, Talis and Eduserv. We will be investigating the creation of links between the Hub, Copac and other data sources including DBPedia, data.gov.uk and the BBC, as well as links with OCLC for name authorities and with the Library of Congress for subject headings.This project will put archival and bibliographic data at the heart of the Linked Data Web, making new links between diverse content sources, enabling the free and flexible exploration of data and enabling researchers to make new connections between subjects, people, organisations and places to reveal more about our history and society.

Key deliverables: Output of structured Linked Data for the Archives Hub and Copac services. A prototype visualisation for browsing archives by subject, time and location. Opportunities and barriers reporting via the project blog.

Lead Institution: UKOLN, University of Bath

Person responsible for documentation: Adrian Stevenson

Project Team: Adrian Stevenson, Project Manager (UKOLN); Jane Stevenson, Archives Hub Manager (Mimas); Pete Johnston, Technical Researcher (Eduserv); Bethan Ruddock, Project Officer (Mimas); Yogesh Patel, Software Developer (Mimas); Julian Cheal, Software Developer (UKOLN). Read more about the LOCAH Project team.

Project partners and roles: Talis are our technology partner on the project, providing us with access to store our data in the Talis Store. Leigh Dodds and Tim Hodson are our main contacts at the company. OCLC also partnered, mainly to help with VIAF. Our contacts at OCLC are John MacColl, Ralph LeVan and Thom Hickey. Ed Summers is also helping us out as a voluntary consultant.

The address of the LOCAH Project blog is http://archiveshub.ac.uk/locah/ . The main atom feed is http://archiveshub.ac.uk/locah/feed/atom

All reusable program code produced by the Locah project will be available as free software under the Apache License 2. You will be able to get the code from our project sourceforge repository.

The LOCAH dataset content is licensed under a Creative Commons CC0 1.0 licence.

The contents of this blog are available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

LOCAH Datasets
LOCAH Blog Content
Locah Code

Project start date: 1st Aug 2010
Project end date: 31st July 2011
Project budget: £100,000

LOCAH was funded by JISC as part of the #jiscexpo programme. See our JISC PIMS project management record.

Lifting the Lid on Linked Data at ELAG 2011

Myself and Jane have just given our ‘Lifting the Lid on Linked Data‘ presentation at the ELAG European Library Automation Group Conference 2011 in Prague today. It seemed to go pretty well. There were a few comments about the licensing situation for the Copac data on the #elag2011 twitter stream, which is something we’re still working on.

[slideshare id=8082967&doc=elag2011-locah-110524105057-phpapp02]

Archives Hub Linked Data Release

We’re very pleased to announce the release of http://data.archiveshub.ac.uk, the first Linked Data set produced by the LOCAH project. The team has been working hard since the beginning of the project on modelling the complex archival data and transforming it into RDF Linked Data. This is now available in a variety of forms via the data.archiveshub.ac.uk home page. A number of previous blog posts outline the modelling and transformation process, the RDF terms used in the data, and the challenges and opportunities arising along the way. A forthcoming post will provide some example queries for accessing data from the SPARQL query endpoint. The data and content is licensed under a Creative Commons CC0 1.0 licence.

We’re working on a visualisation prototype that provides an example of how we link the Hub Data with other Linked Data sources on the Web using our enhanced dataset to provide a useful graphical resource for researchers.

One important point to note is that this initial release is a selected subset, representative of the Hub collection descriptions as a proof of concept, and does not contain the full Archives Hub dataset at present, although we are very keen to explore this in the future.

We still have some work to do, this being the initial release of the Hub data. Some revisions for a later release will address a few issues including reconciling our internal person and subject names, and will also contain some further enhancements to the data to include links to Library of Congress subject headings and further links to DBPedia based on subject terms. We also hope to include links for place names using Geonames and Ordnance Survey.

We encourage feedback on the data, the model and any other aspect of data.archiveshub.ac.uk, so please leave comments or contact us directly.

We are also working hard on our other main LOCAH release, the Copac Linked Data. Our first version of the model for this is now finished, and we have the data in our test triple store. We hope to release this in about a month’s time.

I’d personally like to thank the LOCAH team for all their hard work on this exciting and challenging project. I’d also like to thank our technology partner, Talis for kindly providing our Linked Data store.

LOD-LAM: International Linked Open Data in Libraries, Archives, and Museums Summit

LOD LAMI’m really pleased to announce that I was asked to join the organising committee for the International Linked Open Data in Libraries, Archives, and Museums Summit that will take place this June 2-3, 2011 in San Francisco, California, USA. There’s still time to apply until February 28th, and funding is available to help cover travel costs.

The International Linked Open Data in Libraries, Archives, and Museums Summit (“LOD-LAM”) will convene leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:

  • Identify the tools and techniques for publishing and working with Linked Open Data.
  • Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata.
  • Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions.

For more information see http://lod-lam.net/summit/about/.

The principal organiser/facilitator is Jon Voss (@LookBackMaps), Founder of LookBackMaps, along with Kris Carpenter Negulescu, Director of Web Group, Internet Archive, who is project managing.

I’m very chuffed to be part of the illustrious Organising Committee:

Lisa Goddard (@lisagoddard), Acting Associate University Librarian for Information Technology, Memorial University Libraries.
Martin Kalfatovic (@UDCMRK), Assistant Director, Digital Services Division at Smithsonian Institution Libraries and the Deputy Project Director of the Biodiversity Heritage Library.
Mark Matienzo (@anarchivist), Digital Archivist in Manuscripts and Archives at the Yale University Library.
Mia Ridge (@mia_out), Lead Web Developer & Technical Architect, Science Museum/NMSI (UK)
Tim Sherratt (@wragge), National Museum of Australia & University of Canberra
MacKenzie Smith, Research Director, MIT Libraries.
Adrian Stevenson (@adrianstevenson), UKOLN; Project Manager, LOCAH Linked Data Project.
John Wilbanks (@wilbanks), VP of Science, Director of Science Commons, Creative Commons.

It’ll be a great event I’m sure, so get your application in ASAP.

Locah Lightening at Dev8d

This is just a quick post to say that I’ll be giving a “lightening talk” on the Locah project at 2.45pm this Wednesday 16th February at the Dev8d developer event in London. If you’ve got any questions or would like to know more about the project, then please come along to the session. I should be at Dev8d for the full two days, so grab me anytime if you can’t make the session.

I’ll also be participating in a panel session on Linked Data as well, but I’m not sure when this is scheduled for yet.

Abstract for the talk:

“The Locah project is making records from the Archives Hub service and Copac service available as Linked Data. The Archives Hub is an aggregation of archival metadata from repositories across the UK; Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries. In each case the aim is to provide Linked Data according to the principles set out by Tim Berners-Lee, so that we make our data interconnected with other data and contribute to the growth of the Semantic Web. The talk will touch on data modelling, the selection of vocabularies and the design of URI patterns. It will look at the practical realities of how we are turning the Archives Hub EAD data and Copac MODS data into RDF XML, and then loading it into triple stores. The talk will conclude with a look at some of the main opportunities and barriers to the creation and use of Linked Data. There will be a panel session on linked data where delegates can ask further questions.”

I’ve also added my tune to the Dev8d playlist, the sublime ‘French Disko‘ by Stereolab.

Postscript 22nd February 2011:

I’ve now uploaded my slides from this talk to slideshare and embedded them below. The talk was primarily aimed at developers with the assumption that they knew a bit about RDF and Linked Data, so it doesn’t discuss these except in passing. I was mainly trying to give some specifics on the technicalities involved, and what platforms and tools we’re using, so people can follow the same path if they wanted. Please comment below with any questions.

It was another great #dev8d this year, and especially useful for me in terms of learning more about Linked Data related technologies. Top job to organiser Mahendra and the rest of the UKOLN team involved.

[slideshare id=7000641&doc=dev8d2011-110221081440-phpapp02]

LOCAH Project – Projected Timeline, Workplan & Overall Project Methodology

Project Plan

WP1:  Project Management.

  • Project management to support the project, the relationships with project partners, and with the funders.

WP2:  Data Modelling

  • Model Archives Hub EAD data and Copac data to RDF

WP3:  Technical Development – Linked Data Interface

  • Transform RDF modelled to RDF XML.
  • Enrich Hub and Copac data with data/links from sources such as DBPedia, BBC, LOC, VIAF, Musicbrainz, Freebase
  • Provide both RDF and HTML documents for Archives Hub and Copac resources with stable well designed URIs
  • Provide a SPARQL endpoint for the Hub Linked Data resources
  • Look at feasibility of providing RESTful API interface to the Hub and Copac Linked Data resources

WP4: Prototype Development

  • Test and refine requirements for proposed prototypes
  • Design user interfaces for prototype
  • Technical development and testing of the user interfaces

WP5: ‘Opportunities and Barriers’ Reporting

  • Design and implement  procedures for logging ongoing projects issues
  • Analyse and synthesise logged issues around known Linked Data issues
  • Report on opportunities and barriers using the project blog outlining methods and recommendations on how to overcome, mediate or mitigate against issues identified wherever possible.

WP6: Advocacy and Dissemination

  • Report on ongoing project progress and findings at JISC programme events
  • Demonstrate project outputs and report to communities on the findings of the opportunities and barriers reporting at relevant conferences and workshops


WPMonth 1 2 3 4 5 6 7 8 9 10 11 12

Project Management and Staffing

Adrian Stevenson will project manage LOCAH to ensure that the workplan is carried out to the timetable, and that effective dissemination and evaluation mechanisms are implemented according to the JISC Project Management guidelines. Consortium agreements in line with JISC guidelines will be established for the project partners. UKOLN will lead on all the workpackages. Staff who will work on LOCAH are already in post.

Support for Standards, Accessibility and Other Best Practices

LOCAH will adhere to the guidance and good practice provided by JISC in the Standards Catalogue and JISC Information Environment. The primary technology methodologies, standards and specifications adopted for this project will be:

  • Metadata standards: EAD, MODS, Dublin Core
  • Berners-Lee,T. (2006). ‘Linked Data – Design Issues’
  • Berners-Lee,T. (1998). ‘W3C Style: Cool URIs don’t change’
  • Cabinet Offices ‘Designing URI Sets for the UK Public Sector’
  • Dodds, L., Davis, I., ‘Linked Data Patterns’
  • W3C Web Accessibility Initiative (WAI)

LOCAH Project – Project Team Relationships and End User Engagement

Project Team

Adrian Stevenson

Adrian Stevenson

Adrian Stevenson is a project manager and researcher at UKOLN. He has managed the highly successful SWORD project since May 2008 and also manages the JISC Information Environment Technical Review project. He has extensive experience of the implementation of interoperability standards, and has a long-standing interest in Linked Data. Adrian will manage LOCAH, and will be involved in the data modelling work, testing and the opportunities and barriers reporting.

Jane Stevenson

Jane Stevenson

Jane Stevenson is the Archives Hub Coordinator at Mimas. In this role, she manages the day-to- day running of the Archives Hub service. She is a registered archivist with substantial experience of cataloguing, implementation of data standards, dissemination and online service provision. She has expertise in the use of Encoded Archival Description for archives, and will be involved in the data modelling work, mapping EAD to RDF, testing as well as the opportunities and barriers reporting.

Pete Johnston

Pete Johnston

Pete Johnston is a Technical Researcher at Eduserv. His work has been primarily in the areas of metadata/resource description, with a particular interest in the use of Semantic Web technologies and the Linked Data approach. He participates in a number of standards development activities, and is an active contributor to the work of the Dublin Core Metadata Initiative. He was also a co-editor of the Open Archives Initiative Object Reuse and Exchange (OAI ORE) specifications.

Pete joined Eduserv in May 2006 from UKOLN, University of Bath, where he advised the UK education and cultural heritage communities on strategies for the effective exchange and reuse of information. Pete will be involved in the data modelling work, mapping EAD and MODS to RDF, software testing and the opportunities and barriers report.

Bethan Ruddock

Bethan Ruddock

Bethan Ruddock is involved in content development activity for both the Archives Hub and Copac. She is currently working on a year-long project to help expand the coverage of the Archives Hub through the refinement of our automated data import routines. Bethan also undertakes a range of outreach and promotional activities, collaborating with Lisa on a number of publications. Bethan will be involved in the modelling work of transforming MODS to RDF.

Julian Cheal

Julian Cheal

Julian Cheal is a software developer at UKOLN. He is currently working on the analysis and visualisation of UK open access repository metadata from the RepUK project. He has experience of writing software to process metadata at UKOLN, and has previous development experience at Aberystwyth University. Julian will be mainly involved in developing the prototype and visualisations.

Ashley Sanders

Ashley Sanders

Ashley Sanders is the Senior Developer for Copac, and has been working with the service since his inception. He is currently leading the technical work involved in the Copac Re-Engineering project, which involves a complete overhaul of the service. Ashley will be involved in the development work of transforming MODS to RDF.

Shirley Cousins is a Coordinator for the Copac service. Shirley will be involved in the work of transforming MODS to RDF.

An additional Mimas developer will provide the development work for transforming the Archives Hub EAD data to RDF. This person will be allocated from existing Mimas staff in post.

Talis are our technology partner on the project, kindly providing us with access to store our data in the Talis Store. Leigh Dodds is our main contact at the company. Talis is a privately owned UK company that is amongst the first organisations to be applying leading edge Semantic Web technologies to the creation of real-world solutions. Talis has significant expertise in semantic web and Linked Data technologies, and the Talis Platform has been used by a variety of organisations including the BBC and UK Government as part of data.gov.uk.

OCLC are also partnering us, mainly to help out with VIAF. Our contacts at OCLC are John MacColl, Ralph LeVan and Thom Hickey. OCLC is a worldwide library cooperative, owned, governed and sustained by members since 1967. Its public purpose is to work with its members to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration. Its Research Division works with the community to identify problems and opportunities, prototype and test solutions, and share findings through publications, presentations and professional interactions.

Engagement with the Community


Several key stakeholder groups have been identified: end users, particularly historical researchers, students & educators; data providers, including RLUK and the libraries & archives that contribute data to the services; the developer community; the library community; the archival sector and more broadly, the cultural heritage sector.

End users

Copac and the Archives Hub services are heavily used by historical researchers and educators. Copac is one of JISC’s most heavily used services, averaging around one million sessions per month. Around 48% of HE research usage can be attributed historical research. Both services can directly engage relevant end users, and have done so successfully in the past to conduct market research or solicit feedback on service developments. In addition, channels such as twitter can be used to reach end users, particularly the digital humanities community.

Data providers; Library Community; Archival Community; Cultural Heritage Sector

Through the Copac and Archives Hub Steering Committees we have the means to consult with a wide range of representatives from the library and archival sectors. The project partners have well- established links with stakeholders such as RLUK, SCONUL, and the UK Archives Discovery Network, which represents all the key UK archives networks including The National Archives and the Scottish Archives Networks. The Archives Hub delivers training and support to the UK archives community, and can effectively engage its contributors through workshops, fora, and social media. OCLC’s community engagement channels will also provide a valuable means of sharing project outputs for feedback internationally. The key project partners are also engaged in the Resource Discovery Taskforce Vision implementation planning, as well as the JISC/SCONUL Shared Services Proposal. Outputs from this project will be shared in both these contexts. In addition, we will proactively share information with bodies such as the MLA, Collections Trust and Culture24.

Developer Community

As a JISC innovation support centre, UKOLN is uniquely placed to engage the developer community through initiatives such as the DevCSI programme, which is aimed at helping developers in HE to realise their full potential by creating the conditions for them to be able to learn, to network effectively, to share ideas and to collaborate.


The primary channel for disseminating the project outputs will be the UKOLN hosted blog. End users will be primarily engaged for survey feedback via the Copac and Archives Hub services. Social media will be used to reach subject groups with active online communities (e.g. Digital Humanities). Information aimed at the library and archival community, including data providers, will be disseminated through reports to service Steering Group meetings, UKAD meetings, the Resource Discovery Taskforce Vision group, the JISC/SCONUL Shared Services Proposal Group, as well as professional listservs. Conference presentations and demonstrations will be proposed for events such as ILI, Online Information, and JISC conferences. An article will be written for Ariadne. The developer community will be engaged primarily through the project blog, twitter, developer events & the Linked Data competition.