Explaining Linked Data to Your Pro Vice Chancellor

At the JISCEXPO Programme meeting today I led a session on ‘Explaining linked data to your Pro Vice Chancellor’, and this post is a summary of that session. The attendees were: myself (Adrian Stevenson), Rob Hawton, Alex Dutton, and Zeth, with later contributions from Chris Gutteridge.

It seemed clear to us that this is really about focussing on institutional administrative data, as it’s probably harder to sell the idea of providing research data in linked data form to the Pro VC. Linked data probably doesn’t allow you to do things that couldn’t do by other means, but it is easier than other approaches in the long run, once you’ve got your linked data available. Linked Data can be of value without having to be open:

“Southampton’s data is used internally. You could draw a ring around the data and say ‘that’s closed’, and it would still have the same value.”

== Benefits ==

Quantifying the value of linked data efficiencies can be tricky, but providing open data allows quicker development of tools, as the data the tools hook into already exist and are standardised.

== Strategies ==

Don’t mention the term ‘linked data’ to the Pro VC, or get into discussing the technology. It’s about the outcomes and the solutions, not the technologies. Getting ‘Champions’ who have the ear of the Pro VC will help.  Some enticing prototype example mash-up demonstrators that help sell the idea are also important. Also, pointing out that other universities are deploying and using linked open data to their advantage may help. Your University will want to be part of the club.

Making it easy for others to supply data that can be utilised as part of linked data efforts is important. This can be via Google spreadsheets, or e-mailing spreadsheets for example. You need to offload the difficult jobs to the people who are motivated and know what they’re doing.

It will also help to sell the idea to other potential consumers, such as the libraries, and other data providers. Possibly sell on the idea of the “increasing prominence of holdings” for libraries. This helps bring attention and re-use.

It’s worth emphasising that linked data simplifies the Freedom of Infomataion (FOI) process.  We can say “yes, we’ve already published that FOI data”. You have a responsibility to publish this data if asked via FOI anyway. This is an example of a Sheer curation approach.

Linked data may provide decreased bureaucracy. There’s no need to ask other parts of the University for their data, wasting their time, if it’s already published centrally. Examples here are estates, HR, library, student statistics.

== Targets ==

Some possible targets are: saving money, bringing in new business, funding, students.

The potential for increased business intelligence is a great sell, and Linked Data can provide the means to do this. Again, you need to sell a solution to a problem, not a technology. The University ‘implementation’ managers need to be involved and brought on board as well as the as the Pro VC.

It can be a problem that some institutions adopt a ‘best of breed’ policy with technology. Linked data doesn’t fit too well with this. However, it’s worth noting that Linked Data doesn’t need to change the user experience.

A lot of the arguments being made here don’t just apply to linked data. Much is about issues such as opening access to data generally. It was noted that there have been many efforts from JISC to solve the institutional data silo problem.

If we were setting a new University up from scratch, going for Linked Data from the start would be a realistic option, but it’s always hard to change currently embedded practice. Universities having Chief Technology Officers would help here, or perhaps a PVC for Technology?

LOD-LAM: International Linked Open Data in Libraries, Archives, and Museums Summit

LOD LAMI’m really pleased to announce that I was asked to join the organising committee for the International Linked Open Data in Libraries, Archives, and Museums Summit that will take place this June 2-3, 2011 in San Francisco, California, USA. There’s still time to apply until February 28th, and funding is available to help cover travel costs.

The International Linked Open Data in Libraries, Archives, and Museums Summit (“LOD-LAM”) will convene leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:

  • Identify the tools and techniques for publishing and working with Linked Open Data.
  • Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata.
  • Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions.

For more information see http://lod-lam.net/summit/about/.

The principal organiser/facilitator is Jon Voss (@LookBackMaps), Founder of LookBackMaps, along with Kris Carpenter Negulescu, Director of Web Group, Internet Archive, who is project managing.

I’m very chuffed to be part of the illustrious Organising Committee:

Lisa Goddard (@lisagoddard), Acting Associate University Librarian for Information Technology, Memorial University Libraries.
Martin Kalfatovic (@UDCMRK), Assistant Director, Digital Services Division at Smithsonian Institution Libraries and the Deputy Project Director of the Biodiversity Heritage Library.
Mark Matienzo (@anarchivist), Digital Archivist in Manuscripts and Archives at the Yale University Library.
Mia Ridge (@mia_out), Lead Web Developer & Technical Architect, Science Museum/NMSI (UK)
Tim Sherratt (@wragge), National Museum of Australia & University of Canberra
MacKenzie Smith, Research Director, MIT Libraries.
Adrian Stevenson (@adrianstevenson), UKOLN; Project Manager, LOCAH Linked Data Project.
John Wilbanks (@wilbanks), VP of Science, Director of Science Commons, Creative Commons.

It’ll be a great event I’m sure, so get your application in ASAP.

LOCAH Project – Aims, Objectives and Final Outputs

This is the first of a number of posts outlining our project plan in line with the requirements of the call document. So here we are – our aims, objectives and intended final outputs:

The LOCAH project aims to make records from the JISC funded Archives Hub service, and records from the JISC funded Copac service available as Linked Data. In each case, the aim is to provide persistent URIs for the key entities described in that data, dereferencing to documents describing those entities. The information will be made available as web pages in XHTML containing RDFa and also Linked Data RDF/XML. SPARQL endpoints will be provided to enable the data to be queried. In addition, consideration will be given to the provision of a simple query API for some common queries.

Making resources available as structured data

The work will involve:

  1. Analysis & modelling of the current data and the selection (or definition) of appropriate RDF vocabularies.
  2. Design of suitable URI patterns (based on the current guidelines for UK government data).
  3. Development of procedures to transform existing data formats to RDF. Either:
    • uploading of that transformed data to an RDF store (such as a Talis Platform instance ) and development of application to serve data from that store, or
    • development of an application to serve RDF data from an existing data store.
  4. The former will be the case for the Hub data; the latter is likely to be used for Copac.

  5. We intend to enhance the source data with links between these two datasets and with existing Linked Data sets made available by other parties (e.g. DBpedia, Geonames, the Virtual International Authority File, Library of Congress’ Subject headings). This process may include simple name lookups and also the use of services such as EDINA Unlock, OpenCalais and Muddy to identify entities from text fragments. Given that Copac is in a transition phase to a new database during the project, we will be taking a more lightweight approach to structuring and enhancing Copac data. We will then be able to make a comparison between the outcomes of a lightweight rapid approach to producing Linked Data for Copac, and the relatively resource intensive data enrichment approach for the Archives Hub.
  6. We will look to provide resources such as dataset-level descriptions (using vOID and/or DCat) and semantic sitemaps.
  7. The project will adopt a lightweight iterative approach to the development and testing of the exposed structured content. This will involve the rapid development of interfaces to Hub and Copac data that will be tested against existing third party Linked Data tools and data sets. The evaluated results will feed into the further phases of development.

The result will be the availability of two new quality-assured datasets which are “meshable” with other global Linked Data sources. In addition, the documents made available will be accessible to all the usual web search and indexing services such as Google, contributing to their searchability and findability, and thereby raising the profile of these Mimas JISC services to research users. In common parlance, the resources will have more “Google juice”.

Prototype Data Visualisations

We also suggest a number of end user prototype ideas. These would provide attractive and compelling data visualisations based around a number of visualisation concepts. We intend to produce one prototype. We intend to use the ideas suggested as the basis for this, but given the iterative nature of the project, it may end up being something quite different. We will produce additional prototypes if time and resources allow.

The project intends to hold a small developer competition to gather further end use cases and prototype ideas run by the UKOLN DevCSI team on behalf of the project.

Opportunities and Barriers Reporting

We will log ongoing projects issues as they arise to inform our opportunities and barriers reporting that we will deliver via posts on the LOCAH project blog. We will outline and discuss the methods and solutions we have adopted to overcome, mediate or mitigate against these, wherever this has been possible.

The methods and solutions we establish will iteratively feed into the ongoing development process. This will mean that we are able to work out solutions to issues as they arise, and implement them in the next phase of rapid development.

We are keen to engage with the other projects funded as part of the jiscExpo call, and any additional UK HE projects working at implementing Linked Data solutions. The project team has very strong links with the Linked Data community: we will look to engage the community by stimulating debate about implementation problems via the project blog. We will also set up a project Twitter feed to generate discussion on the project #locah tag. In addition, we will engage via relevant JISCmail lists as well as the UK Government Data Developers and the Linked Data API Google discussion groups that several members of the team are already part of.

The power of connections: unlocking the Web of data

Welcome to our project blog, just set up today. Lots more to come, but here’s a news item Jane Stevenson from Mimas has written to get us going:

Mimas and UKOLN are working together on an exciting JISC funded project to make our Archives Hub and Copac data available as structured Linked Data, for the benefit of education and research. We will also be working in partnership with Eduserv, Talis and OCLC, leading experts within their fields. We want to put archival and bibliographic data at the heart of the Linked Data Web, enabling new links to be made between diverse content sources and enabling the free and flexible exploration of data so that researchers can make new connections between subjects, people, organisations and places to reveal more about our history and society.

Linked Data uses the RDF data model to identify concepts and to describe relationships between those concepts. It promotes the idea of a Web of data rather than a Web of documents. The more document-centric approach, based on Web pages, does not readily expose data within the text in a way that applications can process, so the wealth of information within a page is of limited value.  Both the Archives Hub and Copac have so much rich data within them, and with Linked Data it can be brought to the fore by structuring concepts within the data it in a way that identifies them and facilitates linking to them.  Data can be combined in a way that results in new correlations, new perspectives and new discoveries.

http://www.flickr.com/photos/reedsturtevant/4288406572/

http://www.flickr.com/photos/reedsturtevant/4288406572/

Mimas is keen to explore new ways to open up data for the benefit our users. Providing bibliographic and archive data as Linked Data enables links with other data sources and creates new channels into the data. Researchers are more likely to discover sources that may materially affect their research outcomes.  It means that we can give researchers the potential to combine data sources for themselves, so that we do not need to predict the use of the data.

We know that researchers using the Hub or Copac are sometimes looking for a particular piece of information, such as a photograph of a library, or the birth date of a writer, or the location of an event. Linked Data can be valuable here because it helps to pin down concepts. If a researcher is looking for a photograph of John Rylands Library in Manchester, for example, Linked Data can clarify the concepts – a photograph, the library, ‘John Rylands’ as the name of a library, ‘John Rylands’ as a Victorian philanthropist, ‘Manchester’ as a place in England. It enables us to link across to other sources that can provide further information about these concepts. If a researcher is gathering information around a subject area, they can benefit from the linking concept and explore the Web much more fully because the data is no longer held within silos.

Archive data is by its nature incomplete and often potentially valuable sources are difficult to identify.  Bibliographic data is vast and it can be difficult to make useful connections. Researchers frequently search laterally through the descriptions, giving them a way to make serendipitous discoveries. Linked Data could potentially vastly expand the benefits of lateral search, helping users discover contextually related materials. Creating links just between cultural heritage collections can bring great benefits – archives, artifacts and published works relating to the same people, organisations, places and subjects are often widely dispersed. By bringing these together intellectually, new discoveries can be made about the life and work of an individual or the circumstances surrounding important historical events. New connections, new relationships, new ideas about our history and society. Put this together with other data sources, such as special collections, multimedia repositories and geographic information systems, and the opportunities for discovery are significantly increased.  A Linked Data approach offers potential to overcome differences in methodologies and standards for description and access which can hinder meaningful cross-searching and interlinking of related content.

Linked Data can enable researchers and teachers to repurpose data for their own specific use. It provides flexibility for people to create their own pathways through Archives Hub and Copac data alongside other data sources. Developers will be able to provide applications and visualisations tailored to the needs of researchers, learning environments, institutional and project goals.

This project, named LOCAH (Linked Open Copac and Archives Hub), is exploratory and real world applications of Linked Data are still in the early stages. Whilst the benefits could be extensive, we know that there are challenges, and in particular concerns about the resources required to create Linked Data and the availability of tools to make use of it. A number of key data sources are now available as Linked Data, such as BBC data, Wikipedia and Government datasets. In addition, developers are busy creating tools to make the data easy to query and process.  By getting involved in this creating Linked Data, we can explore the benefits and pitfalls in exposing archival and bibliographic data in this way. This is a project that enables us to contribute to a global effort to unlock the enormous potential within our data for the benefit of researchers and society as a whole.