AGRIS is an international database containing over 8 million bibliographic records related to agricultural science and technology. It was originally a bibliographic database but has been transformed into a linked open data application. This allows AGRIS to mine and index information from the wider web by crawling websites, indexing resources with an agricultural taxonomy, and linking relevant resources to records in AGRIS. This expands access to grey literature and unpublished research beyond just bibliographic metadata.
Blepharitis inflammation of eyelid symptoms cause everything included along w...
2015 11 agris-medes
1. AGRIS
From a bibliographical database to a linked
open data application extending
knowledge mining to the world wide web
Fabrizio Celli and Johannes Keizer – 04/11/2015
2. fabrizio celli johannes keizerhttp://aims.fao.org
Outline
What is AGRIS?
(S)Mash-up!
Mining and indexing the web
2
4. fabrizio celli johannes keizerhttp://aims.fao.org
AGRIS
The International System for Agricultural
Science and Technology
A collection of more than 8 million
multilingual bibliographic resources
A network of more than 150 institutions
from 65 countries
A Web portal (http://agris.fao.org/)
4
9. fabrizio celli johannes keizerhttp://aims.fao.org
AGRIS users
• Researchers, professors, graduated
students looking for bibliographies
• Librarians, cataloguers
• Small journal publishers, professional
associations, conference organizers
• Government officers asking for reports on a
specific topic
9
10. fabrizio celli johannes keizerhttp://aims.fao.org
Impact
10
It supports both developed and developing countries
Accessed from more than 200 countries and territories
Google Analytics
October 2015
11. fabrizio celli johannes keizerhttp://aims.fao.org
Statistics
1
8,142,755 multilingual bibliographic records
~ 400,000 from Latin America
~ 150,000 from Africa
~ 760,000 from Asia + 400,000 links to CASDD
(China)
253,286,038 triples
13. fabrizio celli johannes keizerhttp://aims.fao.org
LOD infrastructure
Since December 2013 AGRIS moved to the
RDF world
Generation of mashup pages
• users looking for specific topics can access a
publication from the AGRIS database, combined with
other related resources extracted from other
preselected datasets
• external resources are not only bibliographic
metadata, but also distribution maps, statistics,
germplasm accessions, and so on.
13
14. fabrizio celli johannes keizerhttp://aims.fao.org
The RDF-ization process
Translation of the AGRIS AP XML database
to RDF
• Selection of existing vocabularies
• Data cleaning and normalization
• Index all records with the AGROVOC thesaurus
• Run the conversion and publish RDF data!
Selection of external datasets we want to
interlink to AGRIS
14
16. fabrizio celli johannes keizerhttp://aims.fao.org
AGROVOC
The FAO multilingual vocabulary containing
around 32 000 concepts in up to 21
languages
Backbone: the magic that allows the
interlinking to external datasets
Two ways to implement the interlinking:
• Using AGROVOC formal aligments to other thesauri
• Querying external WebServices with scientific names
16
21. fabrizio celli johannes keizerhttp://aims.fao.org
From AGRIS to DBPedia
AGRIS
URI
AGROV
OC URI
dcterms:subject
DBPedia
URI
skos:closeMatch
skos:exactMatch
DBPedia
Abstract
Wikipedia
URL
DBPedia
Picture
foaf:isPrimaryTopicOfdbpedia-owl:abstract
foaf:depiction
Entry
point!
AGROVOC
is the
backbone
22. fabrizio celli johannes keizerhttp://aims.fao.org
SPARQL in action!
1. From an AGRIS URI, get the list of the AGROVOC URIs
(dcterms:subject)
PREFIX dct: <http://purl.org/dc/terms/>
SELECT ?agr
WHERE {
<AGRIS_Uri> dct:subject ?agr .
}
2. For each AGROVOC URI
2.1. Get skos:closeMatch and skos:exactMatch (formal alignments to other
thesauri)
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT?em ?cm {
OPTIONAL { <AGROVOC_Uri> skos:exactMatch ?em } .
OPTIONAL { <AGROVOC_Uri> skos:closeMatch ?cm } .
}
23. fabrizio celli johannes keizerhttp://aims.fao.org
Get DBPedia
2.2. The JAVA code filters DBPedia URIs, to avoid adding a new FILTER in
the SPARQL query (it’s heavy…)
2.3. For each DBPedia URI, query the DBPedia SPARQL endpoint to get
information to display in an AGRIS widget
SELECT ?abs ?img ?wiki
WHERE {
OPTIONAL {<DBP_Uri> dbpedia-owl:abstract ?abs} .
OPTIONAL {<DBP_Uri> foaf:depiction ?img} .
OPTIONAL {<DBP_Uri> foaf:isPrimaryTopicOf ?wiki} .
FILTER ( (lang(?abs ) ="en") || (!bound(?abs)) )
}
24. fabrizio celli johannes keizerhttp://aims.fao.org
Bibliography
2
«Migrating bibliographic datasets to the Semantic Web:
The AGRIS case». Stefano Anibaldi, Yves Jaques,
Fabrizio Celli, Armando Stellato, Johannes Keizer.
Semantic Web journal
«OpenAGRIS: using bibliographical data for linking into the
agricultural knowledge web». Fabrizio Celli, Stefano
Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer.
AOS 2011
26. fabrizio celli johannes keizerhttp://aims.fao.org
The context
Scientists and researchers publish their
results not only in journals or at conferences,
but also via web 2.0 tools and other media
Corpora of ongoing research activities,
unpublished material, grey literature, quick
discussions, and experiments with negative
results and ideas
This information is usually unstructured and
not exposed using web services
26
27. fabrizio celli johannes keizerhttp://aims.fao.org
Goal
Crawl the web (manually preselected
websites)
Machine learning algorithms to index
discovered web resources using AGROVOC
Select relevant resources using a
recommender system
Interlink to AGRIS!
27
28. fabrizio celli johannes keizerhttp://aims.fao.org
Crawling and indexing
28
https://github.com/fcproj/agrotagger
29. fabrizio celli johannes keizerhttp://aims.fao.org
Recommender system
29
• A JAVA component that computes meaningful
intersections between the Crawler Database
and the AGRIS database
• Offline process, recommendations are stored in
a triplestore
30. fabrizio celli johannes keizerhttp://aims.fao.org
Interlinking
30
https://github.com/fcproj/recommender
32. fabrizio celli johannes keizerhttp://aims.fao.org
Bibliography
32
Discovering, Indexing and Interlinking Information
Resources Fabrizio Celli, Johannes Keizer, Yves
Jaques, Stasinos Konstantopoulos, Dušan Vudragović.
F1000 Research
Version 2 under revision
Hinweis der Redaktion
Chinese Agricultural Sci-tech Documents Database (CASDD)
CGRIS germplasm database
The World Bank
Nature OpenSearch
Europena
FAO Geopolitical Ontology
Global Biodiversity Information Facility
Bioversity International
FAO Fisheries and Aquaculture
DBPedia