4. T2.1 RDFization
• Task: Translate metadata on manuscripts (author, date, …) to RDF
• Most data providers are currently not exporting their data as RDF
• Various other formats used
• WP2 Approach:
1. Collect sample data from WP1 and external data providers
2. Analyze sample data for formats and content
3. Survey existing tools from WP2 project partners
4. Create concept and example mappings to transform exported data
into RDF using tools
DM2E WP2 Report 4
5. T2.1 Sample Data Collection & Analysis
• Thanks to data providers, we collected a large set of sample data from all
data providers using a variety of formats
• Meta data:
• Encoded Archival Description - EAD/XML (SBB)
• Maschinelles Austauschformat für Bibliotheken MAB (ONB, SBB)
• MARC (ONB, NLI)
• Text Encoding Initiative (UIB, HUB, BBAW)
• Object data:
• Structured data, Images, Audio, Video
DM2E WP2 Report 5
6. T2.1 Tool Survey
• Using sample data, we tested RDFization to EDM
• XML to RDF/EDM: MINT (NTUA)
• Dedicated MINT instance installed for DM2E
• Test mappings created for
HUB/TEI, ONB/MARC, SBB/MAB, BBAW/TEI, NLI/MARC, MPIWG
• Relational DB to RDF/EDM: D2RQ (FUB)
• Test mapping for MPIWG database created
• Other tools surveyed: jMet2Ont, DNB MAB tools
• Result: Some adaptions and extensions required, but usable
DM2E WP2 Report 6
7. T2.2 Mapping to EDM
• Europeana Data Model (EDM) is metadata exchange format
• between WP1 and WP3
• between DM2E and Europeana
• EDM allows several alternatives in modeling and does not capture all
requirements for digital manuscripts
• Preliminary Extension and specialization performed by WP2
• Separate entities for people, places and topics
• Type specialization (Article – PhysicalThing, Depiction –
WebResource)
DM2E WP2 Report 7
9. T2.3 Contextualization
• Contextualization: Connection of internally created resources with remote
resources to enrich data and provide common identifiers
• Tool: Silk – Link Discovery Framework
• Initial experiments conducted by MPIWG
• Resources identified for contextualization
• Authors
• Places
• Subjects
• Languages
DM2E WP2 Report 9
10. Provenance Tracking
• Data Providers can relax, no Proxy-Party
• Named Graphs used instead
• Feedback from Europeana “Do what you think is right”
• Two-tier import process
• MINT/D2R/… > Plain RDF
• Central RDF repository, Named Graphs and Provenance information
are added on import
DM2E WP2 Report 10
11. Conclusion
• Preliminary data model based on EDM available
• Sufficient tool support to translate most example meta data to RDF/EDM
• Contextualization ongoing
Open Issues:
• How to handle legacy data formats (non-XML, non-Database)?
• Incomplete information from metadata (see session on Overlaps and
synergies)
DM2E WP2 Report 11