SlideShare ist ein Scribd-Unternehmen logo
1 von 20
OpenFlyData: the way to go for biological data integration Dr Jun Zhao Image Bioinformatics Research Group Department of Zoology University of Oxford
http://www.fly-ted.org.uk/509/
Use cases of gene expression data ,[object Object],[object Object],[object Object],[object Object]
OpenFlyData Application ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Barriers for accessing these data ,[object Object],[object Object],[object Object]
OpenFlyData.org demonstration ,[object Object],[object Object],[object Object],[object Object]
The data sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
System architecture SPARQL endpoint Web browser FlyUI application FlyUI widget HTTP Client side  SPARQL server (SPARQLite, Tomcat, Apache)‏ RDF cache (Jena TDB) ‏ FlyBase BDGP FlyTED FlyAtlas Server side
Creating RDF from data sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The heterogeneous Drosophila gene names DATA SOURCE POSSIBLE GENE IDENTIFIERS EXAMPLES FlyBase symbol schuy full name schumacher-levy annotation symbol CG17736 Unique FlyBase id FBgn0036925 Curated synonyms CG17736, schuy, etc BDGP FlyBase id FBgn0036925 Annotation symbol CG17736 FlyAtlas Affy microarray probe id 16166608_a_at FlyTED Uncontrolled gene name schuy, CG17736/schuy
Gene name mapping ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SPARQL queries PREFIX chado: <http://purl.org/net/chado/schema> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xs: <http://www.w3.org/2001/XML_Schema#> SELECT ?flybaseID  WHERE { ?feature rdf:type chado:Feature ; chado:name “schuy”^^xs:string ; chado:uniquename ?flybaseID . } SELECT ?feature.uniquename AS flybaseID FROM feature WHERE feature.name = “schuy” SPARQL SQL
SPARQL protocol GET /query/flybase?query=[URL encoded query] HTTP/1.1   Host: openflydata.org   Accept: application/sparql-results+json   POST /query/flybase HTTP/1.1   Host: openflydata.org   Accept: application/sparql-results+json   Content-Type: application/x-www-form-urlencoded   Content-Length: 456   query=[URL encoded query] HTTP  GET HTTP POST
SPARQL server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SPARQLite ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Benefits of SW technologies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Costs & Risks ,[object Object],[object Object],[object Object]
Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Future directions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfsSWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
Mariano Rodriguez-Muro
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft
 

Was ist angesagt? (20)

A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
 
Thinking restfully
Thinking restfullyThinking restfully
Thinking restfully
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfsSWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
SWT Lab 3
SWT Lab 3SWT Lab 3
SWT Lab 3
 

Andere mochten auch

Andere mochten auch (8)

myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
 
Emotion Labor Colloquim Presentation Fall 2006
Emotion Labor Colloquim Presentation Fall 2006Emotion Labor Colloquim Presentation Fall 2006
Emotion Labor Colloquim Presentation Fall 2006
 
2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
 
Socialenetwerken&web2.0
Socialenetwerken&web2.0Socialenetwerken&web2.0
Socialenetwerken&web2.0
 
Www sociam-2016-policy-reviews
Www sociam-2016-policy-reviewsWww sociam-2016-policy-reviews
Www sociam-2016-policy-reviews
 
Query-generation-for-provo-data-201406
Query-generation-for-provo-data-201406Query-generation-for-provo-data-201406
Query-generation-for-provo-data-201406
 
Replacing the business plan
Replacing the business planReplacing the business plan
Replacing the business plan
 
Marketing Plan for KTH University
Marketing Plan for KTH UniversityMarketing Plan for KTH University
Marketing Plan for KTH University
 

Ähnlich wie 2010 03 Lodoxf Openflydata

Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
SWT Lecture Session 4 - SW architectures and SPARQL
SWT Lecture Session 4 - SW architectures and SPARQLSWT Lecture Session 4 - SW architectures and SPARQL
SWT Lecture Session 4 - SW architectures and SPARQL
Mariano Rodriguez-Muro
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
charper
 
Web data from R
Web data from RWeb data from R
Web data from R
schamber
 

Ähnlich wie 2010 03 Lodoxf Openflydata (20)

Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
XMLPipeDB
XMLPipeDBXMLPipeDB
XMLPipeDB
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availability
 
SWT Lecture Session 4 - SW architectures and SPARQL
SWT Lecture Session 4 - SW architectures and SPARQLSWT Lecture Session 4 - SW architectures and SPARQL
SWT Lecture Session 4 - SW architectures and SPARQL
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
4 sw architectures and sparql
4 sw architectures and sparql4 sw architectures and sparql
4 sw architectures and sparql
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
 
Jena Programming
Jena ProgrammingJena Programming
Jena Programming
 
Web data from R
Web data from RWeb data from R
Web data from R
 

Mehr von Jun Zhao (12)

2012 05-swpm-provo
2012 05-swpm-provo2012 05-swpm-provo
2012 05-swpm-provo
 
2012 04-ldow-prov
2012 04-ldow-prov2012 04-ldow-prov
2012 04-ldow-prov
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
 

2010 03 Lodoxf Openflydata

  • 1. OpenFlyData: the way to go for biological data integration Dr Jun Zhao Image Bioinformatics Research Group Department of Zoology University of Oxford
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. System architecture SPARQL endpoint Web browser FlyUI application FlyUI widget HTTP Client side SPARQL server (SPARQLite, Tomcat, Apache)‏ RDF cache (Jena TDB) ‏ FlyBase BDGP FlyTED FlyAtlas Server side
  • 9.
  • 10. The heterogeneous Drosophila gene names DATA SOURCE POSSIBLE GENE IDENTIFIERS EXAMPLES FlyBase symbol schuy full name schumacher-levy annotation symbol CG17736 Unique FlyBase id FBgn0036925 Curated synonyms CG17736, schuy, etc BDGP FlyBase id FBgn0036925 Annotation symbol CG17736 FlyAtlas Affy microarray probe id 16166608_a_at FlyTED Uncontrolled gene name schuy, CG17736/schuy
  • 11.
  • 12. SPARQL queries PREFIX chado: <http://purl.org/net/chado/schema> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xs: <http://www.w3.org/2001/XML_Schema#> SELECT ?flybaseID WHERE { ?feature rdf:type chado:Feature ; chado:name “schuy”^^xs:string ; chado:uniquename ?flybaseID . } SELECT ?feature.uniquename AS flybaseID FROM feature WHERE feature.name = “schuy” SPARQL SQL
  • 13. SPARQL protocol GET /query/flybase?query=[URL encoded query] HTTP/1.1 Host: openflydata.org Accept: application/sparql-results+json POST /query/flybase HTTP/1.1 Host: openflydata.org Accept: application/sparql-results+json Content-Type: application/x-www-form-urlencoded Content-Length: 456 query=[URL encoded query] HTTP GET HTTP POST
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.

Hinweis der Redaktion

  1. Primary_spermatocyte Cup-like_pattern_of_distal_end_of_elongating_spermatids Gene Name: CG30044 GG Gene Name: CG30044 Slide Name: CG30044-mip40 Strain: wt Apical Signals: off Expression Location: Primary_spermatocyte Cup-like_pattern_of_distal_end_of_elongating_spermatids
  2. Note that the thumbnail images are retrieved from the original web sites
  3. FlyUI: a library of Javascript widgets as front ends to SPARQL data sources Built on Yahoo User Interface (YUI) library Widgets are composed in a browser to create the complete application Each widget provides: A Service that implements SPARQL queries A Model encapsulating SPARQL query results A Renderer
  4. Initially hoped to use D2R server&apos;s SPARQL query rewriting, but some queries would kill the server, so went for SPARQLite alternative Different techniques for generating RDF applied to different kinds of data source Resulting RDF is loaded into the Jena TDB triple store.
  5. This is mostly based on available off-the-shelf software Choice of triple store is influenced significantly by speed of loading ~10 million triples Also performed some experiments with OpenLink Virtuoso – performance looks pretty good Amazon EC2/EBS has worked well for us