LOCAH Project – Aims, Objectives and Final Outputs

This is the first of a number of posts outlining our project plan in line with the requirements of the call document. So here we are – our aims, objectives and intended final outputs:

The LOCAH project aims to make records from the JISC funded Archives Hub service, and records from the JISC funded Copac service available as Linked Data. In each case, the aim is to provide persistent URIs for the key entities described in that data, dereferencing to documents describing those entities. The information will be made available as web pages in XHTML containing RDFa and also Linked Data RDF/XML. SPARQL endpoints will be provided to enable the data to be queried. In addition, consideration will be given to the provision of a simple query API for some common queries.

Making resources available as structured data

The work will involve:

  1. Analysis & modelling of the current data and the selection (or definition) of appropriate RDF vocabularies.
  2. Design of suitable URI patterns (based on the current guidelines for UK government data).
  3. Development of procedures to transform existing data formats to RDF. Either:
    • uploading of that transformed data to an RDF store (such as a Talis Platform instance ) and development of application to serve data from that store, or
    • development of an application to serve RDF data from an existing data store.
  4. The former will be the case for the Hub data; the latter is likely to be used for Copac.

  5. We intend to enhance the source data with links between these two datasets and with existing Linked Data sets made available by other parties (e.g. DBpedia, Geonames, the Virtual International Authority File, Library of Congress’ Subject headings). This process may include simple name lookups and also the use of services such as EDINA Unlock, OpenCalais and Muddy to identify entities from text fragments. Given that Copac is in a transition phase to a new database during the project, we will be taking a more lightweight approach to structuring and enhancing Copac data. We will then be able to make a comparison between the outcomes of a lightweight rapid approach to producing Linked Data for Copac, and the relatively resource intensive data enrichment approach for the Archives Hub.
  6. We will look to provide resources such as dataset-level descriptions (using vOID and/or DCat) and semantic sitemaps.
  7. The project will adopt a lightweight iterative approach to the development and testing of the exposed structured content. This will involve the rapid development of interfaces to Hub and Copac data that will be tested against existing third party Linked Data tools and data sets. The evaluated results will feed into the further phases of development.

The result will be the availability of two new quality-assured datasets which are “meshable” with other global Linked Data sources. In addition, the documents made available will be accessible to all the usual web search and indexing services such as Google, contributing to their searchability and findability, and thereby raising the profile of these Mimas JISC services to research users. In common parlance, the resources will have more “Google juice”.

Prototype Data Visualisations

We also suggest a number of end user prototype ideas. These would provide attractive and compelling data visualisations based around a number of visualisation concepts. We intend to produce one prototype. We intend to use the ideas suggested as the basis for this, but given the iterative nature of the project, it may end up being something quite different. We will produce additional prototypes if time and resources allow.

The project intends to hold a small developer competition to gather further end use cases and prototype ideas run by the UKOLN DevCSI team on behalf of the project.

Opportunities and Barriers Reporting

We will log ongoing projects issues as they arise to inform our opportunities and barriers reporting that we will deliver via posts on the LOCAH project blog. We will outline and discuss the methods and solutions we have adopted to overcome, mediate or mitigate against these, wherever this has been possible.

The methods and solutions we establish will iteratively feed into the ongoing development process. This will mean that we are able to work out solutions to issues as they arise, and implement them in the next phase of rapid development.

We are keen to engage with the other projects funded as part of the jiscExpo call, and any additional UK HE projects working at implementing Linked Data solutions. The project team has very strong links with the Linked Data community: we will look to engage the community by stimulating debate about implementation problems via the project blog. We will also set up a project Twitter feed to generate discussion on the project #locah tag. In addition, we will engage via relevant JISCmail lists as well as the UK Government Data Developers and the Linked Data API Google discussion groups that several members of the team are already part of.