7. Data sources
• as-enacted legislation (OPSI)
• PDFs
• pre-1988 OCR'd
• post-1988 SGML/XML data sources
• revised legislation (SLD)
• February 1991 "base date"
• different XML format
12. Usable
• user experience
• oriented around personas of real users
• clear provisos so you know what you're looking at
• reuser experience
• variety of formats (HTML snippets, XML, RDF, PDF)
• feeds for access
• integration between the two
13. Open
• open standards
• non-proprietary formats
• open source
• use open source technologies where appropriate
• open source our code - https://github.com/legislation/legislation
• open licence
• all available under Open Government Licence
14. RESTful
• URIs for everything
• every item of legislation
• every level within it
• every version of them
• every view of them
• every format of them
• HTTP status codes / content negotiation
• typed link for every transition
16. work
/id/ukpga/1985/67
expression 303 See Other
/ukpga/1985/67/scotland /ukpga/1985/67/2001-04-01
/ukpga/1985/67
manifestation Content-Location
/ukpga/1985/67/data.xml /ukpga/1985/67/data.pdf
/ukpga/1985/67/data.htm
information architecture FRBR model
17. work
/id/ukpga/1985/67/section/6
expression 303 See Other
/ukpga/1985/67/section/6/scotland /ukpga/1985/67/section/6/2001-04-01
/ukpga/1985/67/section/6
manifestation Content-Location
/ukpga/1985/67/section/6/data.xml /ukpga/1985/67/section/6/data.pdf
/ukpga/1985/67/section/6/data.htm
same for every fragment parts / chapters / sections
23. caching & delivery CDN
caching cache
static files web server
transformation pipeline engine XSLT & XSL-FO
storage & queries XML database XML & XQuery
system architecture native XML
24. caching & delivery Akamai
caching Squid
static files Apache
transformation Orbeon XSLT & XSL-FO
storage & queries MarkLogic XML & XQuery
system architecture native XML
26. Data quality
• data is out of date
• 100,000 unapplied effects
• can apply 10,000/year; parliament makes 15,000
• help others help us (and themselves)
• open source as a model
• editorial team retains control & ensures quality
• framing participant tasks
• reviewing participant work
27. New requirements
• new types of information
• effects & research
• tasks & workflows
• participants & permissions & messages
• new levels of interactivity
• read/write platform
• dynamic, native web interface
28. lingua franca application-native data
HTML JSON
concise
hard to get wrong
single source format web-native data
XML RDF
flexible graph model
other formats are better for other things
29. caching & delivery CDN
caching cache
static files web server
transformation pipeline engine
documents XML database triplestore data
new architecture XML and RDF data
33. User and re-user focus
• integrated API and UIs
• guarantees relevance
• help re-users understand information
• help developers debug
• URIs are key
• addressability
• sharability
• understanding of underlying resource model
34. Agility
• native XML eases development
• provides flexible access into documents
• avoids data model mismatches
• native RDF eases development too!
• ease of combining information from different sources
• querying with SPARQL
• schema-free & extensible aids agility
36. Summary
• complex documents
• added value from having them on the web
• layered architecture
• make the most of single-source publishing
• web standards
• long-term flexibility
• if we can do it with legislation ...
Each format has advantages, and so each looks at others advantages jealously:\nHTML's ubiquity\nXML's flexibility and ease of parsing\nRDF's reach to a real-world\nJSON's practicality\n\nOne result is ghettoisation: "you should not exist! you have no point! I am all that's needed!"\nAnother result is self-doubt: "what am I here for? what should I be?"\n
\n
\n
URLs that address structures within formats help those formats to be used together. They can be used for their strengths, without being compromised.\n