02. Publish open Linked Data for unspecified use

Make data openly and freely available as Linked Data.

Description

Activity - A growing enthusiasm for open access and transparency has empowered some efforts to simply make data openly and freely available for unencumbered use by third parties. Government efforts in the USA, the UK and elsewhere provide clear examples of this trend, as do library-specific initiatives such as the Open Library. This is a variant of UC1 involving Linked Data.
This use case differs from UC1 because it is specifically about publication as Linked Data format whereas UC1 is about format neutral.
Part of use case: | Share your experience
Actors - Libraries, which MAY require institutional buy-in and MAY have contractual and licensing implications with respect to suppliers, partners, etc.
Part of use case: | Share your experience
Data involved - Potentially any library data, possibly combined (or linked) with other metadata (such as course titles). Typically records and fields (tags) for which the institution feels rights and licensing issues are sufficiently well understood.
Part of use case: | Share your experience
Data flow - Data are placed online, either within the institution or via some third party such as the Talis Platform. Data may be periodically refreshed, and it would typically be the responsibility of any organisation consuming the data to check for any updates.
Part of use case: | Share your experience
Does this require Open Data - There is no requirement for Open Data per se. However, if we presume that the rationale for publication is to ensure the widest possible dissemination then adoption of a generic open data license (see Rights and Licensing Issues) is the most effective way to make the set of potential uses unambiguous. Restrictive licenses are counter-productive, as is making the data available without some explicit statement regarding potential utilisation. Locally developed licenses and statements regarding use should be avoided where possible as, although perhaps open in spirit, these local variants complicate matters for those wishing to combine data from disparate sources.
Part of use case: | Share your experience
Current Examples - Libris, National Széchényi Library, Freebase
Part of use case: | Share your experience

Benefits

Institution - (1) In line with institutional goals and mission with reference to disseminating knowledge, playing a role within the community, enabling innovation, etc; (2) Attracts publicity and status for first movers.
Part of use case: | Share your experience
Library Service - (1) Creating the opportunity for third parties to develop local and wider services of value, some of which may potentially drive increased attention and traffic back to the library and its holdings; (2) Enables enrichment of library data from other Linked Data sources such as http:// id.loc.gov/ (includes Library of Congress Subject Headings as Linked Data), http://viaf.org (Virtual International Authority File available as Linked Data, documented at http://outgoing.typepad.com/outgoing/2010/05/viafs-new- linked-data.html), http://dbpedia.org (Linked Data representation of Wikipedia); (3) Can increase exposure of library collection to web search engines.
Part of use case: | Share your experience
Researchers - (1) A building block in opening up access to large and unique collections of data; (2) potential for third party applications to become richer and more responsive to institutional collections, making those collections easier to access whilst also making the applications more useful.
Part of use case: | Share your experience
Students - (1) The possibility of Google Scholar and other external services having greater knowledge of institutional holdings; (2) local discovery becomes easier and richer due to enhanced library data (from other Linked Data sources).
Part of use case: | Share your experience
Replication - Medium: Pioneers can document the necessary steps to extract data from various systems, and document decisions regarding transformation of bibliographic data into Linked Data. However, decisions made by one institution may not be immediately transferrable to others, and ‘best practice’ for representing bibliographic data as Linked Data is not yet established.
Part of use case: | Share your experience
Case for not doing it - Uncertainty. Once data are released online under an open license, third parties are explicitly permitted to take and reuse those data as they see fit. Even if the institution initially responsible for releasing the data changes its policy and either withdraws the data or relicenses it with more stringent terms, anyone who downloaded the original release remains able to continue using and redistributing it in perpetuity.
Part of use case: | Share your experience

Motivation

Principles - The rationale is essentially philosophical. The institution or the library believes in the importance of openness, transparency and sharing, and is un-persuaded by arguments to preserve the status quo by keeping data private. Additionally the institution of the library believes that it is important to integrate library data into the fabric of the web, and that Linked Data is the best way of achieving this. For early adopters, publicity and status may play a not-insignificant part in the decision making process.
Part of use case: | Share your experience
Costs - Cost benefit is unlikely to be a significant motivation for this approach, especially as this approach may require more effort than UC1. However, doing this may represent an opportunity cost in diverting attention from another priority.
Part of use case: | Share your experience
Services - There is the possibility that a useful service may emerge from an external or internal party.
Part of use case: | Share your experience
Rationale for not doing it - (1) Uncertainty as to the legal status of the data; (2) discomfort with not being able to take the data back once it’s released; (3) concern about how the data might be used, and how that might reflect upon the institution; (4) potential disruption to existing relationships, partnerships, and commercial arrangements; (5) simply insufficient reason to make it a priority.
Part of use case: | Share your experience

Consequences of doing it as Open Data

What will happen? - Library bibliographic data will be linked into the wider web of Linked Data.
Part of use case: | Share your experience
Potential Risks - (1) Loss of control over institutional data; (2) The originator of elements of the bibliographic records challenges release as open data [see also UC1, UC3, UC4, UC5, UC6, UC7, UC15, UC16, UC17]; (3) Loss of future revenue [see also UC1]; (4) While there is currently some momentum behind the Linked Data movement, many of the expected benefits remain, to a large extent, unproven, and some commentators believe that the approach is too complex to gain widespread adoption.
Part of use case: | Share your experience
Potential Opportunities - (1) Development of innovative / compelling third party services based on open data; (2) An ecosystem of enthusiastic developers emerges, keen and able to provide alternative means of accessing key institutional services using Linked Data representations [see also UC1, UC16, UC17]; (3) Third–party tools (LibraryThing, Mendeley) get better and better, as they gain more data and more users – and as those users largely originate inside Universities, the institutions also benefit, although in ways that may be difficult to quantify [see also UC1, UC15]; (4) Large pools of data create opportunities for the creation of regional, national and international services to drive stock management, etc.; [see also UC1] (5) Bibliographic data becomes searchable via semantic web technologies; (6) Libraries establish position as key players in the Linked Data/web of data space; (7) Libraries benefit from other Linked Data sources providing richer metadata and related exploration of the collections.
Part of use case: | Share your experience
Consequences of not doing it? - (1) Libraries seen at odds with moves to Open Data in the public sector; (2) Libraries become sidelined as metadata experts and providers as others expose data on the web.
Part of use case: | Share your experience

Rights and Licensing Issues

Rights and licensing issues - In keeping with the principles behind this act, the license should be explicit and as open and unencumbered as possible in order to facilitate genuine reuse. See the general guidance on Licensing Issues for further detail.
Part of use case: | Share your experience

Practicalities

Data exchange formatting - Data transformed from MARC or local storage format to RDF, which is then serialised in a number of ways (e.g. N3, XML, JSON). Most commonly this would then be exposed via a triple store with a SPARQL endpoint, and via a RESTful web interface which would usually provide both human readable versions of the data (i.e. html pages) as well as machine-parsable data (i.e. RDF).

Key to this process will be choosing appropriate ontologies to represent the data. While there is some previous practice in this area, (see http:// dcpapers.dublincore.org/ojs/pubs/article/viewArticle/927), it is probably too early to see this as ‘best practice’ for representing bibliographic data as Linked Data. Common vocabularies used in current implementations are:

As libraries tend to hold information on non-bibliographic resources as well (e.g. audio-visual material), it may be necessary to represent these using more appropriate vocabularies (e.g. http://wiki.musicbrainz.org/RDF for recorded music).

A W3C ‘incubator group’ for Library Linked Data which is currently (May 2010 – May 2011) investigating “how existing building blocks of librarianship, such as metadata models, metadata schemas, standards and protocols for building interoperability and library systems and networked environments, encourage libraries to bring their content, and generally re-orient their approaches to data interoperability towards the Web” (http://www.w3.org/2005/Incubator/lld/)
Part of use case: | Share your experience
Lifecycle implications - Examples to date have merged the human-readable web interface to the library catalogue (OPAC) and the machine-readable RDF, with the implication that this is an up to date representation of the library catalogue, possibly refreshed daily or even more frequently.
Part of use case: | Share your experience
Hosting requirements - There are a variety of options, but the minimum would be space to store the RDF representation of the data, and a web server to serve data on request. However, when publishing this type of data it is becoming common to provide a SPARQL endpoint as well as a web interface, which further suggests the use of a triple store to host the data. An alternative approach is to outsource hosting to a third party, such as the Talis Platform.
Part of use case: | Share your experience
Existing systems impact - It is currently unlikely that existing systems will support publication of bibliographic data as Linked Data. It would be necessary to create the relevant routines to extract data from existing systems, transform into RDF, and publish either directly onto the web, or via a triple store.
Part of use case: | Share your experience
Skills demands - There is likely to be a steep learning curve for those engaging in the publication of bibliographic data as Linked Data. An understanding of both bibliographic data in traditional formats (e.g. MARC) and RDF will be required, which is likely to mean a high degree of collaboration between library staff and technical staff. A good understanding of http, configuration of web servers, and possibly triple store technology will be required, which implies a high level of technical expertise.
Part of use case: | Share your experience

Costs

Setup - While much of the software needed to publish Linked Data is Open Source, the time needed to gain the necessary expertise and setup the necessary infrastructure could be significant. In the short-term, outsourcing the provision of the necessary infrastructure could prove more cost effective.
Part of use case: | Share your experience
Ongoing - Once the investment in the initial setup has been done, the costs associated with sustaining this capability are likely to be low. If the activity has been outsourced there are likely to be higher ongoing costs in the medium to long-term. However, it should be noted that as the sector understanding of representing bibliographic data as Linked Data changes, it may be that earlier adopters will need to revisit implementations to bring their practice in line with more recent developments elsewhere.
Part of use case: | Share your experience
Cost of doing nothing - No additional costs will be directly accrued through inaction.
Part of use case: | Share your experience