Putting the Case for Linked Data

This is a summary of a break-out group discussion at the JISC Expo Programme meeting, July 2011, looking at ‘Skills required for Linked Data’.

We started off by thinking about the first steps when deciding to create Linked Data. We took a step back from the skills required and thought more about the understanding and the basic need and the importance of putting the case for Linked Data (or otherwise).

Do you have suitable data

Firstly, do you have data that is suitable to output as Linked Data. This comes down to the question: what is suitable data? It would be useful to provide more advise in this area.

Is Linked Data worth doing?

Why do you want Linked Data? Maybe you are producing data that others will find interesting and link into? If you give your data identifiers, others can link into it. But is Linked Data the right approach? Is what you really want open data more than Linked Data? Or just APIs into the data? Sometimes a simpler solution may give you the benefits that you are after.

Maybe for some organisations approaches other than Linked Data are appropriate, or are a way to start off – maybe just something as simple as outputting CSV. You need to think about what is appropriate for you.  By putting your data out in a more low-barrier way, you maybe able to find out more about who might use your data. However, there is an argument that it is very early days for Linked Data, and low levels of use right now may not reflect the potential for the data and how it is used in the future.

Are you the authority on the data? Is someone else the authority? Do you want to link into their stuff? These are the sorts of questions you need to be thinking about.

The group agreed that use cases would be useful here. They could act as a hook to bring people in. Maybe there should be somewhere to go to look up use cases – people can get a better idea of how they (their users) would benefit from creating Linked Data, they can see what others have done and compare their situation.

We talked around the issues involved in making a case for Linked Data. It would be useful if there was more information for people on how it can bring things to the fore. For example, we talked about set of photographs – a single photograph might be seen in a new context, with new connections made that can help to explain it and what it signifies.

What next?

There does appear to be a move of Linked Data a ‘clique’ into the mainstream – this should make it easier to understand and engage with. There are more tutorials, more support, more understanding. New tools will be developed that will make the process easier.

You need to think about different skills – data modelling and data transformation are very different things. We agreed that development is not always top down. Developers can be very self-motivated, in an environment where a continual learning process is often required. It may be that organisations will start to ask for skills around Linked Data when hiring people.

We felt that there is still a need for more support and more tutorials. We should move towards a critical mass, where questions raised are being answered and developers have more of a sense that there is help out there and they will get those answers. It can really help talking to other developers, so providing opportunities for this is important. The JISC Expo projects were tasked with providing documentation – explaining what they have done clearly to help others. We felt that these projects have helped to progress the Linked Data agenda and that it is an important encouraging people to acquire these skills to require processes and results to be written up.

Realistically, for many people, expertise needs to be brought in. Most organisations do not have resources to call upon. Often this is going to be cheaper than up-skilling – a steep learning curve can take weeks or months to negotiate whereas someone expert in this domain could do the work in just a few days. We talked about a role for (JISC) data centres in contributing to this kind of thing. However, we did acknowledge the important contribution that conferences, workshops and other events play in getting people familiar with Linked Data from a range of perspectives (as users of the data as well as providers). It can be useful to have tutorials that address your particular domain – data that you are familiar with.   Maybe we need a combination of approaches – it depends where you are starting from and what you want to know.  But for many people, the need to understand why Linked Data is useful and worth doing is an essential starting point.

We saw the value in having someone involved who is outward facing – otherwise there is a danger of a gap between the requirements of people using your data and what you are doing. There is a danger of going off in the wrong direction.

We concluded that for many, Linked Data is still a big hill to climb. People do still need hand-ups. We also agreed that Linked Data will get good press if there are products that people can understand – they need to see the benefits.

As maybe there is still an element of self-doubt about Linked Data, it is essential not just to output the data but to raise its profile, to advocate what you have done and why. Enthusiasm can start small but it can quickly spread out.

Finally, we agreed that people don’t always know where products are built around Linked Data. So, you may not realise how it is benefitting you. We need to explain what we have done as well as providing the attractive interface/product and we need to relate it to what people are familiar with.

 

 

 

 

 

 

 

LOD-LAM: International Linked Open Data in Libraries, Archives, and Museums Summit

LOD LAMI’m really pleased to announce that I was asked to join the organising committee for the International Linked Open Data in Libraries, Archives, and Museums Summit that will take place this June 2-3, 2011 in San Francisco, California, USA. There’s still time to apply until February 28th, and funding is available to help cover travel costs.

The International Linked Open Data in Libraries, Archives, and Museums Summit (“LOD-LAM”) will convene leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:

  • Identify the tools and techniques for publishing and working with Linked Open Data.
  • Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata.
  • Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions.

For more information see http://lod-lam.net/summit/about/.

The principal organiser/facilitator is Jon Voss (@LookBackMaps), Founder of LookBackMaps, along with Kris Carpenter Negulescu, Director of Web Group, Internet Archive, who is project managing.

I’m very chuffed to be part of the illustrious Organising Committee:

Lisa Goddard (@lisagoddard), Acting Associate University Librarian for Information Technology, Memorial University Libraries.
Martin Kalfatovic (@UDCMRK), Assistant Director, Digital Services Division at Smithsonian Institution Libraries and the Deputy Project Director of the Biodiversity Heritage Library.
Mark Matienzo (@anarchivist), Digital Archivist in Manuscripts and Archives at the Yale University Library.
Mia Ridge (@mia_out), Lead Web Developer & Technical Architect, Science Museum/NMSI (UK)
Tim Sherratt (@wragge), National Museum of Australia & University of Canberra
MacKenzie Smith, Research Director, MIT Libraries.
Adrian Stevenson (@adrianstevenson), UKOLN; Project Manager, LOCAH Linked Data Project.
John Wilbanks (@wilbanks), VP of Science, Director of Science Commons, Creative Commons.

It’ll be a great event I’m sure, so get your application in ASAP.

LOCAH Project – Wider Benefits to Sector & Achievements for Host Institution

Meeting a need

High quality research and teaching relies partly on access to a broad range of resources. Archive and library materials inform and enhance knowledge and are central to the JISC strategy. JISC invests in bibliographic and archival metadata services to enable discovery of, and access to, those materials, and we know the research, teaching and learning communities value those services.

As articulated in the Resource Discovery Taskforce Vision, that value could be increased if the data can be made to “work harder”, to be used in different ways and repurposed in different contexts.

Providing bibliographic and archive data as Linked Data creates links with other data sources, and allows the development of new channels into the data. Researchers are more likely to discover sources that may materially affect their research outcomes, and the ‘hidden’ collections of archives and special collections are more likely to be exposed and used.

Archive data is by its nature incomplete and often sources are hidden and little known. User studies and log analyses indicate that Archives Hub1 users frequently search laterally through the descriptions; this gives them a way to make serendipitous discoveries. Linked data is a way of vastly expanding the benefits of lateral search, helping users discover contextually related materials. Creating links between archival collections and other sources is crucial – archives relating to the same people, organisations, places and subjects are often widely dispersed. By bringing these together intellectually, new discoveries can be made about the life and work of an individual or the circumstances surrounding important historical events. New connections, new relationships, new ideas about our history and society. Put this together with other data sources, such as special collections, multimedia repositories and geographic information systems, and the opportunities for discovery are significantly increased.

Similarly, by making Copac bibliographic data available as Linked Data we can increase the opportunities for developers to provide contextual links to primary and secondary source material held within the UK’s research libraries and an increasing number of specialist libraries, including the British Museum, the National Trust, and the Royal Society. The provision of library and special collections content as Linked Data will allow developers to build interfaces to link contextually related historical sources that may have been curated and described using differing methodologies. The differences in these methodologies and the emerging standards for description and access have resulted in distinct challenges in providing meaningful cross-searching and interlinking of this related content – a Linked Data approach offers potential to overcome that significant hurdle.

Researchers and teachers will have the ability to repurpose data for their own specific use. Linked Data provides flexibility for people to create their own pathways through Archives Hub and Copac data alongside other data sources. Developers will be able to provide applications and visualisations tailored to the needs of researchers, learning environments, institutional and project goals.

Innovation

Archives are described hierarchically, and this presents challenges for the output of Linked Data. In addition, descriptions are a combination of structured data and semi-structured data. As part of this project, we will explore the challenges in working with semi-structured data, which can potentially provide a very rich source of information. The biographical histories for creators of archives may provide unique information that has been based on the archival source. Extracting event-based data from this can really open up the potential of the archival description to be so much more than the representation of an archive collection. It becomes a much more multi-faceted resource, providing data about people, organisations, places and events.

The library community is beginning to explore the potential of Linked Data. The Swedish and Hungarian National Libraries have exposed their catalogues as Linked Data, the Library of Congress has exposed subject authority data (LCSH), and OCLC is now involved in making the Virtual International Authority File (VIAF) available in this way.

By treating the entities (people, places, concepts etc) referred to in bibliographic data as resources in their own right, links can be made to other data referring to those same resources. Those other sources can be used to enrich the presentation of bibliographic data, and the bibliographic data can be used in conjunction with other data sources to create new applications.

Copac is the largest union catalogue of bibliographic data in the UK, and one of the largest in the world, and its exposure as Linked Data can provide a rich data source, of particular value to the research, learning and teaching communities.

In answering the call, we will be able to report on the challenges of the project, and how we have approached them. This will be of benefit to all institutions with bibliographic and archival data looking to maximise its potential. We are very well placed within the research and teaching communities to share our experiences and findings.