This document discusses linking educational data as linked open data. It describes several existing educational linked data projects and datasets, including SmartLink, mEducator, and the Linked Education Graph. The Linked Education Graph integrates datasets from various sources into a single RDF dataset with over 6 million resources and 97 million triples. The document outlines challenges in linking educational data and introduces the LinkedUp project which aims to further adoption of linked data in education through an open data competition and infrastructure to integrate and query educational datasets.
From Data to Knowledge - Profiling & Interlinking Web Datasets
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
1. Linked Data for Education – Datasets & APIs
Stefan Dietze
- Green Hackathon, 14 December, Athens, Greece -
2. TEL data vs Linked Open Data
Linked Data for Education Linked Open Data
Relevant knowledge and data Vision: well connected graph of open Web data
Publications: ACM, PubMed, DBLP (L3S), OpenLibrary W3C standards (RDF, SPARQL) to expose data, URIs
(Cross-)domain knowledge & resources: BioPortal, historic to interlink datasets
artefacts in Europeana, Geonames, DBpedia, Freebase, … => vast cloud of interconnected datasets
Media resource metadata: BBC, Flickr, … Crossing all sorts of domains
32 billion triples (September 2011)
Explicit educational data
University Linked Data: eg The Open University UK,
http://data.open.ac.uk, Southampton University, …
OER Linked Data: mEducator Linked ER (
http://ckan.net/package/meducator), Open Learn LD
Schemas: LRMI (http://www.lrmi.net/), mEducator OER
schema (http://purl.org/meducator/ns)
⇒ http://linkededucation.org;
⇒ http://linkeduniversities.org
3. Early work: educational service integration
SmartLink: Linked Data registry of (educational) datasets / stores and their APIs
Discovery and lifting of educational data out of heterogeneous repositories
Transformation of heterogeneous data formats (XML, JSON...) and schemas (eg. IEEE LOM,
Dublin Core) into RDF (pre-requisite for LOD compliancy)
⇒ http://ckan.net/package/smartlink & http://purl.org/smartlink
Data/services integration & retrieval/search APIs
Green Hackathon 2012 Stefan Dietze 3
4. Early work: educational data integration
⇒ http://linkededucation.org/meducator
Data/services integration & retrieval/search APIs Linked Educational Resources
Green Hackathon 2012 Stefan Dietze 4
6. Data so far: SmartLink/mEducator in LOD cloud
http://ckan.net/package/smartlink
> 2000 triples so far
> 300 links to iServe
APIs (=> wiki) used by several
applications
http://ckan.net/package/meducator
> 35000 triples so far
> 1000 links to DBpedia & Bioportal
ontologies
APIs (=> see wiki) used by 4 applications
Green Hackathon 2012 Stefan Dietze 6
7. TEL data vs Linked Open Data
Challenges
Still limited take-up (applications usually focused on small set of datasets)
Key issues
Scalability and robustness (distributed data access & retrieval, Big Data integration)
Data quality (heterogeneous providers, lack of trust)
Legal and licensing issues
Lack of benchmarks and evaluation
Green Hackathon 2012 Stefan Dietze 7
8. “LinkedUp” Support Action
Linking Web Data for Education Project – Open Challenge in Web-scale Data Integration
EC Support Action, kickstarted in November 2012 => http://linkedup-project.eu
Goals
Push forward adoption of Web data/Linked Data in educational context
Drive technological advancement of Web data integration technologies
Approach
Open data competition (initial calls expected early 2013) incl. technical, legal and financial support
Open data curation !
Partners
+ network of associated institutions (eg BBC, Commonwealth of Learning, Talis UK, …)
Green Hackathon 2012 Stefan Dietze 8
9. LinkedUp data curation
Linked Education Cloud & Linked Education Graph
Educational data gathering - community-approach: Linked Education cloud
“LinkedUp/Linked Education cloud” as subset of LOD cloud
CKAN – “The DataHub” (ckan.net, most important data registry) for data collection
(analog to Linked Open Data approach)
Dedicated group (“linked-education”) for cataloging educational datasets
Educational Data
Educational data integration & infrastructure: Linked Education graph
Linked Education cloud => Linked Education graph & dataset
Integration of (selected) datasets into coherent (RDF) dataset
Infrastructure, unified (SPARQL) endpoint & APIs => http://linkededucation.org
Green Hackathon 2012 Stefan Dietze 9
11. Linked Education graph & dataset(s)
http://data.linkededucation.org/ns/linked-education.rdf
?
<dc:title> <akt:has-title>
OER VideoLecture
Publication
LinkedUniversities http://data.linkededucation.org/.... (details at the end)
educational videos 6 million distinct (but linked) resources
97 million RDF triples
21.6 GB of data
Green Hackathon 2012 Stefan Dietze 11
15. Linked Education graph & dataset(s)
Enabling cross-dataset queries
Example resource:
=> http://data.linkededucation.org/resource/led/92C8A5E7-7B4D-12A6-F4F2-76A6A8DC7C0A
Example query (schema alignment & categorisation):
SELECT ?resource ?title WHERE { ?resource led:title ?title FILTER regex(?title,
"linear equations", "i")}
⇒ returns 1102 resources from different datasets: 659 DBLP items, 397 ACM publications, 10
LinkedUniversities educational videos
Example query (disambiguation & correlation):
SELECT distinct ?entity WHERE {?entity led:hasEnrichmentContext ?dbp_context. ?
dbp_context rdf:type led:EnrichmentContext.
?dbp_context led:hasEnrichment
<http://data.linkededucation.org/ontology/Enrichment/Gravitation>}
⇒ returns 5 resources (LinkedUniversities, mEducator, BBC) enriched with DBpedia concept Gravitation
(even though their descriptions refer to "gravity" or "gravitational" or "laws of gravity").
Green Hackathon 2012 Stefan Dietze 15
16. How to access the data (1/2)
Registries and federated access to data
CKAN – The DataHub
THE public registry for open Web datasets (almost 5000 distinct datasets)
CKAN: http://thedatahub.org; LOD group: http://datahub.io/group/lodcloud
Linked Education dataset
Over 21 GB /6 million educationally relevant resources
SPARQL endpoint:
http://data.linkededucation.org/openrdf-sesame/repositories/linked-learning
[-selection]?query
Schema: http://data.linkededucation.org/ns/linked-education.rdf
Example resource:
http://data.linkededucation.org/resource/led/92C8A5E7-7B4D-12A6-F4F2-76A6A8DC7C0A
SmartLink
SmartLink dataset: registry of educationally relevant APIs
=> http://ckan.net/package/smartlink, http://purl.org/smartlink
SPARQL: http://smartlink.open.ac.uk/smartlink/sparql 2012
Green Hackathon Stefan Dietze 16
17. How to access the data (2/2)
Some individual datasets
ACM Learning Analytics and Knowledge (LAK) Dataset
Corpus of extracted metadata and full-text from ACM LAK conference series papers
and related publications (expanding)
Dataset & schema description: http://www.solaresearch.org/resources/lak-dataset/
LAK Challenge: win fame, an iPad, cash rewards!
SPARQL endpoint:
http://data.linkededucation.org/openrdf-sesame/repositories/lak-conference?query=%5BQUERY%5D
mEducator Linked Educational Resources
Over 600 OER (36.000 triples) from different providers
mEducator dataset: http://ckan.net/package/meducator
SPARQL: http://meducator.open.ac.uk/resourcesrestapi/rest/meducator/sparql
Schema: http://purl.org/meducator/ns
Dedicated search & retrieval APIs available (see http://linkededucation.org/meducator/)
Green Hackathon 2012 Stefan Dietze 17
18. Conclusions and Outlook
Summary, ongoing work & outlook
Wide range of relevant data sources & APIs available
Early cataloging (http://linkededucation.org, http://linkeduniversities.org) and
integration/federation (SmartLink, mEducator Linked Educational Resources)
LinkedUp (http://www.linkedup-project.eu): data curation, assessment and exploitation
Data cataloging: http://datahub.io/en/group/linked-education for collection of “educationally
relevant” datasets, categorisation and tagging
Data integration & infrastructure: unified endpoints and APIs at http://data.linkededucation.org
Getting involved
Submit your own data or tools: LinkedUp Challenge, LAK Challenge, LinkedUp Call for Data
Participate as LinkedUp evaluation panelist, use case or data contributor & benefit from access to
large network of organisations in Linked Data and TEL
Green Hackathon 2012 Stefan Dietze 18
19. Thank you!
Credits
Davide Taibi (CNR ITD, Italy)
Harry Yu & Dong Liu (The Open University, UK)
Besnik Fetahu (L3S, Germany)
mEducator and LinkedUp teams
Contact & links
http://purl.org/dietze / dietze@l3s.de
http://linkededucation.org
http://linkedup-project.eu
Green Hackathon 2012 Stefan Dietze 19
Hinweis der Redaktion
http://meducator.open.ac.uk/resourcesrestapi/rest/meducator/sparql?query=select%20?r%20?b%20where%20{?r%20?b%20?c} API: http://meducator.open.ac.uk/resourcesrestapi/rest/meducator/auth/propertysearch?property=mdc:title&value=Virtual (guest/guest) http://smartlink.open.ac.uk/smartlink/sparql?query=select%20?r%20?b%20where%20{?r%20?b%20?c} French educational service: http://smartlink.open.ac.uk/servicerestapi/restapi/searchservices?lang=French&sub=http://meducator.open.ac.uk/ontologies/open-learn-classification%23education All services: http://smartlink.open.ac.uk/servicerestapi/restapi/searchservices