SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
Distributed Graph Databases and the
       Emerging Web of Data



             Marko A. Rodriguez
      T-5, Center for Nonlinear Studies
      Los Alamos National Laboratory
         http://markorodriguez.com

               April 16, 2009
Abstract

The World Wide Web is the defacto medium for publicly exposing a corpus
of interrelated documents. In its current form, the World Wide Web is the
Web of Documents. The next generation of the World Wide Web will
support the Web of Data. The Web of Data utilizes the same Uniform
Resource Identifier (URI) address space as the Web of Documents, but
instead of a exposing a graph of documents, the Web of Data exposes a
graph of data. Given that the URI address space of the Web is distributed
and infinite, the Web of Data provides a single unified space by which the
worlds data can be publicly exposed and interrelated. The Web of Data is
supported by both graph databases (which structure the data) and
distributed computing mechanism (which process the data). This
presentation will discuss the Web of Data, graph databases, and models of
computing in this emerging space.


                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

• Local Computing vs. Distributed Computing

• Multi-Relational Network Analysis with Grammar Walkers




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

• Local Computing vs. Distributed Computing

• Multi-Relational Network Analysis with Grammar Walkers




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Relational Database vs. the Graph Database

• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model
  is a collection interlinked tables.

• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model
  is a multi-relational graph.


                Relational Database                        Graph Database
                                                              d

                                                       c      a
                                                                     a

                                                               b
                    127.0.0.1                                127.0.0.2




                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Types of Graphs
• Undirected single-relational graph: homogenous set of symmetric links.

• Directed single-relational graph: homogenous set of links.

• Directed multi-relational graph: heterogenous set of links.
                       undirected single-relational graph

                          x                                    z


                       directed single-relational graph

                          x                                    z


                       directed multi-relational graph

                          x                  y                 z




                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World - Phase 1

• Marko is a human and Fluffy is a dog.




                    Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Relational Database - Phase 1

               ID             Name           Type          Legs        Fur

              0001           Marko          Human             2       false

              0002            Fluffy          Dog             4        true


            Object_Table




                    Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Graph Database - Phase 1

               Human                                               Dog



                type                                               type


               0001                                                0002


               name                                                name
        legs               fur                            legs                fur


    2          Marko             false              4              Fluffy            true




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World - Phase 2

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Relational Database - Phase 2

         ID      Name        Type        Legs       Fur              ID2            ID2

        0001     Marko      Human          2       false            0001           0002

        0002     Fluffy      Dog           4        true           0002            0001


      Object_Table                                              Friendship_Table




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Graph Database - Phase 2

               Human                                               Dog



                type                                               type

                                          friend
               0001                       friend                   0002


               name                                                name
        legs               fur                            legs                fur


    2          Marko             false              4              Fluffy            true




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World - Phase 3

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Relational Database - Phase 3

     ID      Name     Type    Legs     Fur           ID2          ID2           Type1        Type2

    0001     Marko    Human     2      false         0001        0002           Human       Mammal

    0002     Fluffy   Dog       4      true         0002         0001            Dog        Mammal


  Object_Table                                    Friendship_Table            Subclass_Table




                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Graph Database - Phase 3
                                          Mammal


                          subclassof                subclassof

                  Human                                          Dog



                   type                                          type

                                           friend
                  0001                     friend                0002


                  name                                           name
           legs             fur                         legs              fur


       2          Marko           false             4            Fluffy         true




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World - Phase 4
• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Relational Database - Phase 4

      ID      Name     Type      Legs    Fur            ID2         ID2            Type1       Type2

     0001     Marko    Human      2      false         0001         0002          Human       Mammal

     0002     Fluffy    Dog       4      true         0002          0001           Dog        Mammal

     0003    My_Rug    Carpet    N/A     N/A
                                                    Friendship_Table            Subclass_Table

   Object_Table                                        ID1          ID2

                                                       0002        0003

                                                    Pee_Table




                          Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Graph Database - Phase 4

                                       Mammal


                       subclassof                subclassof

               Human                                          Dog                   Carpet



                type                                          type                   type

                                        friend
               0001                     friend                0002       peedOn      0003


               name                                           name                  name
        legs             fur                         legs              fur


    2          Marko           false             4            Fluffy         true   My_Rug




                           Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World - Phase 5

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.

• Marko and Fluffy are both mammals.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Relational Database - Phase 5
        ID      Name     Type     Legs    Fur          ID2        ID2          Type1       Type2

       0001     Marko    Human      2    false        0001        0002        Human       Mammal

       0002     Fluffy    Dog       4     true        0002        0001          Dog       Mammal

       0003    My_Rug    Carpet    N/A    N/A
                                                   Friendship_Table          Subclass_Table

     Object_Table                                      ID1        ID2            ID        Type

                                                      0002       0003          0001       Human

                                                   Pee_Table                   0002        Dog


                                                                               0003       Carpet

                                                                                0001      Mammal

                                                                               0002       Mammal


                                                                             Type_Table



                          Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our World Modeled in a Graph Database - Phase 5

                                        Mammal


                       subclassof                 subclassof

               Human                                           Dog                   Carpet

                         type                         type

                type                                           type                   type

                                         friend
               0001                      friend                0002       peedOn      0003


               name                                            name                  name
        legs             fur                          legs              fur


    2          Marko            false             4            Fluffy         true   My_Rug




                           Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Graph as the Natural World Model

• The world is inherently (or perceived as) object-oriented.

• The world is filled with objects and relations among them.

• The multi-relational graph is a very natural representation of the world.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Graph as the Natural Programming Model

• High-level computer languages are object-oriented.

• Nearly no impedance mismatch between the multi-relational graph and
  the programming object.

• It is easy to go from graph database to in-memory object.

          Human marko = new Human();
          marko.name = "Marko";
          marko.addFriend(fluffy);
          marko.setHasFur(false);
          marko.setLegs(2);


                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
SQL vs. SPARQL

SELECT OTY.Name FROM Object_Table AS OTX,
      Object_Table AS OTY, Friendship_Table WHERE
   OTX.Name = "Marko" AND
   Friendship_Table.ID1 = OTY.ID AND
   Friendship_Table.ID2 = OTX.ID;


SELECT ?z WHERE {
  ?x name "Marko" .
  ?y friend ?x .
  ?y name ?z }

E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, WWW Consortium,

http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/, 2004.



                                    Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

• Local Computing vs. Distributed Computing

• Multi-Relational Network Analysis with Grammar Walkers




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Internet Address Spaces

• The Uniform Resource Identifier (URI) is the superclass of the Uniform
  Resource Locator (URL) and Uniform Resource Name (URN).




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Uniform Resource Locator
• The set of all URLs is the address space of all resources that can be
  located and retrieved on the Web. URLs denote where a resource is.
    http://markorodriguez.com/index.html
    ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6
    ∗ http:// means GET at port 80,
    ∗ /index.html means the resource to get at that Internet location.

                                      Web Server




                                       index.html




                                 markorodriguez.com
                                    216.251.43.6



                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Uniform Resource Name

• The set of all URNs is the address space of all resources within the urn:
  namespace.
    urn:uuid:bd93def0-8026-11dd-842be54955baa12
    urn:issn:0892-3310
    urn:doi:10.1016/j.knosys.2008.03.030

• Named resources need not be retrievable through the Web.

• URNs denote what a resource is.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Uniform Resource Identifier
• The URI address space is an infinite space for all Internet resources.
        urn:issn:0892-3310
        ftp://markorodriguez.com/private/markos_secrets.txt
        http://www.lanl.gov#fluffy

• Important: URIs can denote concepts, instances, and datum.




                                        lanl:fluffy                 lanl:fluffy_legs




lanl is a namespace prefix which extends to http://www.lanl.gov#.



                                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Web of Documents
• The World of Documents is primarily concerned with the Hyper-Text
  Transfer Protocol (HTTP) and with retrievable resources in the URL
  address space.

• These retrievable resources are files: HTML documents, images, audio,
  etc. The “web” is created when HTML documents contain URLs.
                                 http://markorodriguez.com/


                                           index.html



                                               href


            Resume.html       href        Home.html          href       Research.html




                          Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Web of Data

• The Web of Data is primarily concerned with URIs.

• The Resource Description Framework (RDF) is the standard for
  representing the relationship between URIs and literals (e.g. float, string,
  date time, etc.).

                                     subject                 predicate                object


                                    lanl:marko               foaf:knows               lanl:fluffy


                                     foaf:name                                       foaf:name


                          "Marko A. Rodriguez"^^xsd:string                "Fluffy P. Everywhere"^^xsd:string




C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data on the Web, International World Wide Web Conference, 2008.



                                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Our Make Believe World in RDF
                                                      lanl:Mammal



                            rdfs:subClassOf                            rdfs:subClassOf


                   lanl:Human                                                         lanl:Dog

                                   rdf:type                                rdf:type
                        rdf:type                                                      rdf:type



                   lanl:marko                          lanl:friend                    lanl:fluffy

                                                       lanl:friend
             lanl:fur        lanl:legs                                      lanl:fur      lanl:legs
                   foaf:name                                                       foaf:name

"false"^^xsd:boolean               "2"^^xsd:integer            "true"^^xsd:boolean                 "4"^^xsd:integer


        "Marko A. Rodriguez"^^xsd:string                               "Fluffy P. Everywhere"^^xsd:string




                                   Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Web of Data is a Distributed Database

• The URI address space is distributed.

• URIs can denote datum.

• RDF denotes the relationships URIs.

• The Web of Data’s foundational standard is RDF.

• Therefore, the Web of Data is a distributed database.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Web of Documents vs. the Web of Data

        Web Server                                  Web Server




             HTML                 href                   HTML


         127.0.0.1                                   127.0.0.2




       Graph Database                             Graph Database



                               lanl:friend



         127.0.0.1                                   127.0.0.2




              Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Current Web of Data - March 2009
                               homologenekegg                     projectgutenberg
                            symbol
                                                                                   homologenekegg
                                                                              libris                                           projectgutenberg
                                                   cas                          symbol
                                                                                     bbcjohnpeel
                                                                                                                                           libris
                  unists                    diseasome dailymed                 w3cwordnet
                                   chebi
                                        hgnc     pubchem           eurostat
                          mgi
                                   geneid
                                            omim                      wikicompany         geospecies
                                                                                                                   cas                               bbcjohnpeel
                                                                                                       diseasome dailymed
                                               drugbank                        worldfactbook
                       reactome
                                   pubmed                    unists
                                                                 magnatune
                                                                              opencyc                                                          w3cwordnet
              uniparc                                     linkedct                       chebi
                                                                                            freebase

taxonomy
         uniref
                             uniprot
                        geneontology
                                       interpro                                                    hgnc       pubchem              eurostat
                                   pdb                                   yago      umbel
                                             pfam                      mgi
                                                                   dbpedia                             omim
                                                                                              bbclatertotpgovtrack                    wikicompany         geospecies
                                         prosite
                               prodom                                     flickrwrappr
                                                                                         geneid
                                                                                      opencalais

                                                                reactome
                                                                                                uscensusdata
                                                                                                          drugbank                             worldfactbook
                                                                      lingvoj linkedmdb
                                                                                           surgeradio
                                                                                                                                 magnatune
                                                                                          pubmed
                                                                                  virtuososponger                                             opencyc
                                                  rdfbookmashup
                                             uniparc                                                                                                        freebase
                                                   swconferencecorpus         geonames musicbrainz      myspacewrapper    linkedct
                                         dblpberlin                         uniprot pubguide
                      taxonomy                         revyu                                      interpro
                                    uniref                       geneontologyjamendo bbcplaycountdata
                                                                   rdfohloh
                                                                                         pdb                                                       umbel
                                                                                                                                         yago
                                                    semanticweborg          siocsites        riese
                                                                                                        pfam                       dbpedia                    bbclatertotp            govtrack
                                                                foafprofiles
                            dblphannover    openguides                         audioscrobbler       prosite bbcprogrammes
                                                                                prodom
                                                                                     crunchbase                                           flickrwrappropencalais
                                                                    doapspace                                                                                   uscensusdata
                                                            flickrexporter
                                                                                                                                                           surgeradio
              budapestbme                                               qdos
                                                                                                                                      lingvoj linkedmdb
                                                                          semwebcentral                                                           virtuososponger
            eurecom                    ecssouthampton

                   pisa
                          dblprkbexplorer
                                  newcastle                                                                            rdfbookmashup
                                                                                                                                                    geonames musicbrainz
                                      rae2001
                                  eprints
                                       irittoulouse
                    laascnrs acm citeseer
                                                                                                                         swconferencecorpus                                        myspacewrapper
                           ieee                                                                                dblpberlin                                            pubguide
                resex
                                ibm

                                                                                                                            revyu                                jamendo
                                                                                                                                       rdfohloh
                                                                                                                                                                               bbcplaycountdata
M.A. Rodriguez. A Graph Analysis of the Linked Data Cloud, in review, http://arxiv.org/abs/0903.0194, 2009.
                                                                semanticweborg                     riese                                        siocsites
                                                                                                                                    foafprofiles
                                                                                                                  openguides                       audioscrobbler                       bbcprogrammes
                                                                                               dblphannover
                                                                                                                                          crunchbase
                                                                                         Computer Science Department Colloquium – University of New Mexico – April 16, 2009
                                                                                                                                        doapspace


                                                                                                                                 flickrexporter
                                                                                                                                            qdos
The Current Web of Data - March 2009
data set           domain           data set             domain         data set               domain
audioscrobbler     music            govtrack             government     pubguide               books
bbclatertotp       music            homologene           biology        qdos                   social
bbcplaycountdata   music            ibm                  computer       rae2001                computer
bbcprogrammes      media            ieee                 computer       rdfbookmashup          books
budapestbme        computer         interpro             biology        rdfohloh               social
chebi              biology          jamendo              music          resex                  computer
crunchbase         business         laascnrs             computer       riese                  government
dailymed           medical          libris               books          semanticweborg         computer
dblpberlin         computer         lingvoj              reference      semwebcentral          social
dblphannover       computer         linkedct             medical        siocsites              social
dblprkbexplorer    computer         linkedmdb            movie          surgeradio             music
dbpedia            general          magnatune            music          swconferencecorpus     computer
doapspace          social           musicbrainz          music          taxonomy               reference
drugbank           medical          myspacewrapper       social         umbel                  general
eurecom            computer         opencalais           reference      uniref                 biology
eurostat           government       opencyc              general        unists                 biology
flickrexporter      images           openguides           reference      uscensusdata           government
flickrwrappr        images           pdb                  biology        virtuososponger        reference
foafprofiles        social           pfam                 biology        w3cwordnet             reference
freebase           general          pisa                 computer       wikicompany            business
geneid             biology          prodom               biology        worldfactbook          government
geneontology       biology          projectgutenberg     books          yago                   general
geonames           geographic       prosite              biology        ...



                                Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Cultural Differences that are Leading to Web-Based
              Data Management - Part 1

• Relational databases tend to not maintain public access points.

• Relational database users tend to not publish their schemas.




• Web of Data graph databases maintain public access points called
  SPARQL end-points or Linked Data URLs.

• Web of Data graph database users tend to reuse and extend public
  schemas called ontologies.


                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Cultural Differences that are Leading to Web-Based
            Data Management - Part 2
            Conventional Model                                        Web of Data Model
    127.0.0.1       127.0.0.2          127.0.0.3             127.0.0.1         127.0.0.2           127.0.0.3
   Application 1   Application 2      Application 3         Application 1     Application 2      Application 3


                                                                  processes    processes      processes

    processes       processes          processes




                                                           Web of Data

    structures      structures         structures
                                                                 structures    structures      structures



    127.0.0.1       127.0.0.2          127.0.0.3             127.0.0.4         127.0.0.5           127.0.0.6




                                 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

• Local Computing vs. Distributed Computing

• Multi-Relational Network Analysis with Grammar Walkers




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
SPARQLing a Data Provider - Local Computing

                       SELECT ?x WHERE {                                      127.0.0.2
                         lanl:marko lanl:friend ?x




                                                               END-POINT
           127.0.0.1




                                                                SPARQL
                       }
                                                                           Graph Database

                             { lanl:fluffy }




• The 127.0.0.1 client is querying the 127.0.0.2 server.

• The query is any read-based SPARQL query.

• The results are those resources that bound to the query arguments.



                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
GETing Linked Data as RDF - Local Computing
    http://www.lanl.gov#marko

            lanl:fluffy


           lanl:friend

                                                                 lanl:fluffy
           lanl:marko
                                     HTTP GET
            lanl:wrote                             lanl:friend


            vub:1010                                    Web of Data
                                      lanl:marko                                      ieee:2020


     http://www.vub.edu#1010                        lanl:wrote                lanl:cites

           ieee:2020
                                                                   vub:1010

            lanl:cites

            vub:1010                     HTTP GET

           127.0.0.1




                         Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Problem with the Current Web of Data Infrastructure

• The only interfaces are SPARQL end-points and HTTP GETs of RDF
  subgraphs.

• For human-based document retrieval, this is fine. For machine-based
  data processing, this does not scale.

M.A. Rodriguez. A Distributed Process Infrastructure for a Distributed Data Structure. Semantic Web and Information Systems

Bulletin, AIS Special Interest Group on Semantic Web and Information Systems, http://arxiv.org/abs/0807.3908, 2008.




                                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Problem with the Current Web of Data Infrastructure

• We can not rely on the “download and index” philosophy of the World
  Wide Web.
    As of March 2009, the Web of Data maintains 4.5 billion triples.

• The Web of Data can not rely on a single service provider.
    too much data.
    too many types algorithms that can utilize this data.
    too many clock cycles to locally process this data.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Open Virtual Machine Farm
                                   Graph Database                         Graph Database



                                                         lanl:friend



                                      127.0.0.1                             127.0.0.2

                                    Virtual Machine        code/          Virtual Machine
                                         Farm             machine              Farm




• Distributed computing through code/machine migration between farms.

• move the process to the data, not the data to the process.

M.A. Rodriguez. General Purpose Computing on a Semantic Network Substrate. in Emergent Web Intelligence, eds. R. Chbeir,

A. Hassanien, A. Abraham and Y. Badr, Springer-Verlag, http://arxiv.org/abs/0704.3395, 2009.

M.A. Rodriguez. The RDF Virtual Machine, in review, LA-UR-08-03925, 2009.



                                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Neno RDF Programming Language - Code Serialization
                                                 urn:uuid:
              demo:Human          rdf:type
                                                 4fa0f752
                                                 hasMethod            xsd:int example(xsd:string a)
                                             Method
                                                                      {
                                                  urn:uuid:
                 hasMethodName
                                                 6e400b42
                                                                        if(a == "marko")
                                                                          return 1;
                                                  hasBlock              else
                                             Block
          "example"^^xsd:string                                           return 2;
                                                  urn:uuid:
                                                 4e0bada0             }
                                                  nextInst
                                             Equals
                                                  urn:uuid:                                        Block
                                                 51b8d4a0                                        urn:uuid:
                                                                                 falseInst
                                                                                                67bbd072
                                                  nextInst

                      hasLeft                 Branch                                 Block        nextInst
                                                  urn:uuid:                        urn:uuid:            PushValue
                                                                   trueInst
                                                 51b8d4a0                         610eb4b0
                                                                                                 urn:uuid:
                           LocalDirect
                                                                                                6d451a1e
                                                                                    nextInst
                    urn:uuid:         hasRight
                    54e14d4c                                               PushValue             hasValue
                                                 LocalDirect
                                                                                   urn:uuid:            LocalDirect
                      hasURI         urn:uuid:                                     5c4d5bc2
                                    5869b878                                                    urn:uuid:
                                                                                                62e8b8dc
                                      hasURI                                        hasValue
                  "a"^^xsd:string                                                                 hasURI
                                                                           LocalDirect                           nextInst
                                                                                  urn:uuid:
                                "marko"^^xsd:string                               6425e5ec
                                                                nextInst                        "2"^^xsd:int
                                                                                    hasURI
                                                         Return
                                                                                                                urn:uuid:
                                                               urn:uuid:                                       008e999a
                                                                                 "1"^^xsd:int
                                                               0748e1c6
                                                                                                                 Return




                            Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Fhat RDF Virtual Machine - Machine Serialization
                                   xsd:boolean                     RVM                    xsd:boolean
                                           [1]                                                 [1]

                                                 methodReuse                       halt


               programLocation                                     Fhat

                                      operandTop                                                                          hasFrame
                                                                 returnTop

     [0..1]                           [0..1]                           [0..1]
                                                                                              currentFrame
                    [0..1]     Operand                 [0..1]
     Instruction                                                ReturnStack
                                Stack
               rdf:rest                           rdf:rest                        blockTop
                                                                  rdf:first                                       [0..1]            [0..*]
                                rdf:first
                                                                        [0..1]
                                     [0..1]                                                 forFrame                        Frame
                                                                                                           [1]
                             rdfs:Resource                      Instruction
                                                                                                                              rdf:li
                                                                                                                          [0..*]

                                                       [0..1]                    [0..1]                                    Frame
                                                                   Block
                                                                                                                          Variable
                                                                   Stack
                                                  rdf:rest                                 hasSymbol                       hasValue         fromBlock
                                                                  rdf:first
                                                                       [0..1]                        [1]                  [0..*]            [1]

                                                                   Block                  xsd:string               rdfs:Resource             Block




                                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A Collection of Interlinked Graph Databases - Currently
                     127.0.0.2                                 127.0.0.3




                                                                           127.0.0.6


     127.0.0.4                         127.0.0.5



                                                                                  127.0.0.10
                                                   127.0.0.9

                  127.0.0.8



      127.0.0.7                                                                          127.0.0.11




                              Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A Collection of Interlinked Graph Databases and
              Processors - Future
                   127.0.0.2                             127.0.0.3




                                                                     127.0.0.6


   127.0.0.4                     127.0.0.5



                                                                            127.0.0.10
                                             127.0.0.9

                127.0.0.8



    127.0.0.7                                                                      127.0.0.11




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Future of Web-Based Distributed Computing

• The HTTP GET approach to Web of Data does not scale.

• The Neno/Fhat (or any general-purpose computing) environment is
  unsafe.

• The Web of Data needs an open, safe, flexible, and easy to adopt
  computing infrastructure.




                    Computer Science Department Colloquium – University of New Mexico – April 16, 2009
What Type of Processing?

• Object-oriented programming: Web of Data as an object repository.

• Logic: Web of Data as a knowledge-base.

• Graph/network analysis: Web of Data as a multi-relational graph.



• The future computing environment should support at least these popular
  processing models.

• We will focus on graph/network analysis for the remainder of this
  presentation.


                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

• Local Computing vs. Distributed Computing

• Multi-Relational Network Analysis with Grammar Walkers




                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Introduction to Random Walkers

• Random walkers can be used in single-relational networks to calculate:
    stationary probability distribution: primary eigenvector calculation
    spreading activation: search by means of diffusion

• There is a continuous and a discrete form of the general random walk
  method.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Random Walks in a Single-Relational Network

• Suppose a single-relational network G, where

                           G = (V, E ⊆ (V × V )).


• Let’s represent that network as a row stochastic adjacency matrix A ∈
  [0, 1]|V |×|V |, where

                                     1
                                    Γ(i)      if (i, j) ∈ E
                      Ai,j =
                                    0         otherwise.


• Finally, assume an “energy vector” π ∈ R|V |.


                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Random Walks in a Single-Relational Network

                                      a       b       c      d

                                a    0      0.5       0    0.5
         b            c
                                b    0       0        1      0
                                                                      1    0    0   0
                                c   0.5      0        0     0.5
         a            d

                                d    0       1        0      0



               G                                  A                         π
• πA can be interpreted as the continuous form of propagating random
  walkers over the G.


                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Stationary Probability Distribution in a
      Single-Relational Network
                             π1          1         0            0       0



 a     b       c      d      π2          0        0.5           0      0.5


0     0.5      0     0.5
                             π3          0        0.5          0.5      0

               1
                             π4
0     0               0
                                       0.25        0            0.5   0.25     time

0.5   0        0     0.5
                               5
 0             0      0
                             π         0.25       0.38          0     0.36

       1
                             π6          0        0.5          0.38   0.13



           A                                             ...

                           π∞          0.15      0.31          0.31   0.23




                   Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Stationary Probability Distribution in a
                 Single-Relational Network

• If G is strongly connected and aperiodic then there exits a π such that
  π = πA.

• This stationary π ∞ is the primary eigenvector of A.

• PageRank computes the stationary π by forcing G (the Web citation
  graph) to be strongly connected and aperiodic.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Spreading Activation in a Single-Relational Network

• Spreading activation can be thought of as a “local rank” algorithm, while
  calculating the stationary probability provides you a “global rank”.

• With spreading activation, you iterate for only a certain number of
  timesteps.

• Also, you record how much energy has flowed through each vertex.

• Let’s demonstrate using a single discrete walker...




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Spreading Activation in a Single-Relational Network

• The walkers moves from vertex to vertex with choice dependent on the
  probability distribution of A.

• At every step, if the walker is at vertex i then πi = π + 1.


                  2                          3
                                                      π1        1     0     0     0

       G          b                          c
                                                      π2        1     1     0     0
                                                                                         time

              1                                       π3        1     1     1     0

                                                      π4
                  a                         d
                      4                                         2     1     1     0



                          Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Random Walks in a Multi-Relational Network

• Suppose a multi-relational network M , where

                            M = (V, E = {E0, E1, . . . , Ek ⊆ (V × V )})

• Represent as a {0, 1}-adjacency tensor A ∈ {0, 1}|V |×|V |×|E|, where

                                            1     if (i, j) ∈ Em : 1 ≤ m ≤ k
                            Am =
                             i,j
                                            0     otherwise.


• Then assume a “energy vector” π ∈ R|V |.

M.A. Rodriguez and J. Shinavier. Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms, in

review, http://arxiv.org/abs/0806.2274, 2009.



                                        Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Random Walks in a Multi-Relational Network


   b       cites      c
                                                        0   1       0   0
authored           contains                             0   0       0   0   1   0   0   0
   a                  d                                 0   0       0   0

                                                        0   0       0   0
                                   ns
                               ai
                               nt
                              co



                                         s
                                        te


                                                   ed
                                        ci

                                              or
                                              th
                                             au

           M                                                    A               π



                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Operations of the Multi-Relational Path Algebra

• A · B: ordinary matrix multiplication determines the number of (A, B)-
  paths between vertices.
• A : matrix transpose inverts path directionality.
• A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively
  exclude paths.
• n(A): not generates the complement of a {0, 1}n×n matrix.
• c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix.
                                                  +
• v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where
                                                        +
  only certain rows or columns contain non-zero values.
• λA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.


                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Traverse Operation
• An interesting aspect of the single-relational adjacency matrix A ∈ {0, 1}n×n is that when it is raised
                                 (k)
  to the kth power, the entry Ai,j is equal to the number of paths of length k that connect vertex i to
  vertex j .
                              (1)
• Given, by definition, that Ai,j (i.e. Ai,j ) represents the number of paths that go from i to j of length
  1 (i.e. a single edge) and by the rules of ordinary matrix multiplication,

                                  (k)                 (k−1)
                             Ai,j =                 Ai,l           · Al,j : k ≥ 2.
                                            l∈V

                                        a                  b               c

                         a   b      c                a     b       c                a       b      c

                     a   0   1      0           a     0        1   0       a        0       0      1

                     b   0    0     1       ·   b     0        0   1   =   b         0      0      0

                     c   0    0     0           c    0         0   0       c        0       0       0
                                                                               there is a path of length 2
                                                                                       from a to c




                                   Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A1 : authored A2 : cites A3 : contains
h             ih         ih             i


                                         The Traverse Operation

                                                         Z = A1 · A2 · A1 ,
 Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the
articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited
article to its author j . Semantically, Z is an author-citation single-relational path matrix.

                                                                            A2
                                                         vub:1010         lanl:cites       ieee:2020


                                  A1         lanl:authored                                                    A1
                                                                                                   lanl:authored


                                lanl:marko                          lanl:author-citation                   vub:fheyligh


                                                                            Z

* NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, the operations

operate on all vertices in parallel.



                                                Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Filter Operation
Various path filters can be defined and applied using the entry-wise
Hadamard matrix product denoted ◦, where
                                                        
                          A1,1 · B1,1 · · · A1,m · B1,m
             A◦B=             .
                               .      ...         .
                                                  .      .
                         An,1 · Bn,1 · · · An,m · Bn,m


        24   1    0        0   0              0    1     0        0   0          0     1    0     0      0

         0   72   0        4   0              0     1    0        0   0           0   72    0     0      0

        23   0    0        0   0      ◦       1     0    0        0   0   =     23     0    0    0       0

         0   0 15.3 0          0              0     0    0        0   0          0     0    0     0      0

        0    0    0        0   12             0     0    0        0   0          0     0    0     0      0

             Path Matrix                            Path Filter                   Filtered Path Matrix



                                    Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Filter Operation

•   A◦1=A
•   A◦0=0
•   A◦B=B◦A
•   A ◦ (B + C) = (A ◦ B) + (A ◦ C)
•   A ◦ B = (A ◦ B) .




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Not Filter
The not filter is useful for excluding a set of paths to or from a vertex.

                              n : {0, 1}n×n → {0, 1}n×n

with a function rule of

                                                     1   if Ai,j = 0
                              n(A)i,j =
                                                     0   otherwise.


                                0    0   1   1   1        1   1   0   0   0

                                 1   0   1   0   1        0   1   0   1   0

                          n     0    1   1   1   1   =    1   0   0   0   0

                                1    1   0   1   1        0   0   1   0   0

                                1    1   1   1   0        0   0   0   0   1




                              Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Not Filter

If A ∈ {0, 1}n×n, then

• n(n(A)) = A
• A ◦ n(A) = 0
• n(A) ◦ n(A) = n(A).




                         Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A1 : authored A2 : cites A3 : contains
h             ih         ih             i


                                       The Not Filter
A coauthorship path matrix is

                                       Z = A1 · A1 ◦ n(I)


                                                  acm:0505


                             A1                              lanl:authored
                                                                             A1
                                  lanl:authored



                          lanl:marko              lanl:coauthor          lanl:jbollen

                                                      Z
                                            n(I)
                         lanl:coauthor



                               Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Clip Filter
The general purpose of clip is to take a path matrix and “clip”, or
normalize, it to a {0, 1}n×n matrix.

                               c : Rn×n → {0, 1}n×n
                                    +


                                               1   if Zi,j > 0
                          c(Z)i,j =
                                               0   otherwise.

                     24   1     0     0   0             1    1     0    0    0

                      0   72    0     4    0            0    1     0     1   0

                 c   23    0    0    0    0    =        1    0     0    0    0

                     0     0 15.3 0       0             0    0     1     0   0

                     0     0    0     0   12            0    0     0    0    1



                          Computer Science Department Colloquium – University of New Mexico – April 16, 2009
The Clip Filter

If A, B ∈ {0, 1}n×n and Y, Z ∈ Rn×n, then
                                +


•   c(A) = A
•   c(n(A)) = n(c(A)) = n(A)
•   c(Y ◦ Z) = c(Y) ◦ c(Z)
•   n(A ◦ B) = c (n(A) + n(B))
•   n(A + B) = n(A) ◦ n(B)




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A1 : authored A2 : cites A3 : contains
h             ih         ih             i


                                                 The Clip Filter
Suppose we want to create an author citation path matrix that does not allow self citation or coauthor
citations.              „                  «     „ „                      ««
                            1     2     1                1     1
                   Z= A ·A ·A                ◦n c A · A           ◦ n(I)      ◦ n(I)
                                                                                |{z}
                        |        {z         } |              {z             } no self
                                         cites                                 no coauthors

                                                                   Z
                                        lanl:author-citation                                             odu:nelson


                                                                                                   authored
                                                                         2
                                                                     A                                        A1
                                                 lanl:3030         lanl:cites          lanl:4040


                                A   1                                                                  A1
                                                                       lanl:authored          lanl:authored
                                lanl:authored



                      lanl:marko                               lanl:coauthor                              lanl:jbollen

                                                      n c A1 · A1 ◦ n(I)

                         self            n(I)

                                         Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A1 : authored A2 : cites A3 : contains
h             ih         ih             i


                                     The Clip Filter

However, using various theorems of the path algebra and abstract algebra
in general,

             Z = A1 · A2 · A1               ◦ n c A1 · A1 ◦ n(I)                      ◦ n(I)
                                                                                         no self
                            cites                          no coauthors


becomes

                 Z = A1 · A2 · A1               ◦ n c A1 · A1                    ◦ n(I).




                               Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Other Filters and Operations...

• Please refer to the article for more information on these filters and
  operations.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Problems with the Path Algebra

• As a matrix algebra, it is impossible (computationally speaking) to
  compute matrix operations over the entire Web of Data.

• However, it is possible to approximate these calculations using “random”
  walkers.




                       Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Mapping Paths to Grammar-Based Random Walkers

• A grammar-based random walker is a walker that obeys a path
  description.

• Able to compute “semantically rich” spreading activation and stationary
  probability distributions in a multi-relational network.

• Able to approximate through the convergence properties of these
  operations.

• Provides a convenient application to the Web of Data and linked graph
  databases.

M.A. Rodriguez. Grammar-Based Random Walkers in Semantic Networks. Knowledge-Based Systems, 21(7), 727–739, 2008.



                                     Computer Science Department Colloquium – University of New Mexico – April 16, 2009
A Grammar Walker
                     Grammar Walker



               A1 · A1 ◦ n(I)




         t=1
               t=2        t=3




Web of Data


     structures       structures      structures



  127.0.0.4            127.0.0.5          127.0.0.6




Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Grammar Walking the Web of Data
                                     127.0.0.1

                                 1           7



                127.0.0.2                               127.0.0.3




                    2
                                                                    127.0.0.6


127.0.0.4                    127.0.0.5



                                                                           127.0.0.10
                                 3
                                            127.0.0.9

             127.0.0.8                                                                   6

                                                        5
 127.0.0.7               4                                                        127.0.0.11




                  Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Conclusion

• Graph databases will increasingly support the Web of Data.

• The Web of Data is about open, global-scale data management.

• Distributed computing is required for global-scale data processing.

• Grammar walkers can be used for distributed network analysis on the
  Web of Data.




                      Computer Science Department Colloquium – University of New Mexico – April 16, 2009
Thank You For Your Time

My homepage: http://markorodriguez.com
Neno/Fhat: http://neno.lanl.gov
Collective Decision Making Systems: http://cdms.lanl.gov
Faith in the Algorithm: http://faithinthealgorithm.net
MESUR: http://www.mesur.org




                Computer Science Department Colloquium – University of New Mexico – April 16, 2009

Weitere ähnliche Inhalte

Was ist angesagt?

Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked DataFabien Gandon
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Mathieu d'Aquin
 
Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Mathieu d'Aquin
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataJose Emilio Labra Gayo
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkHerbert Van de Sompel
 
LIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersLIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersPrattSILS
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebMathieu d'Aquin
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Knowledge Organization Lis 653 Spring 2017 Class Posters
Knowledge Organization Lis 653 Spring 2017 Class PostersKnowledge Organization Lis 653 Spring 2017 Class Posters
Knowledge Organization Lis 653 Spring 2017 Class PostersPrattSILS
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Sebastian Ryszard Kruk
 
Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014PrattSILS
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 

Was ist angesagt? (20)

Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked Data
 
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...
 
Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
 
LIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersLIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation Posters
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Knowledge Organization Lis 653 Spring 2017 Class Posters
Knowledge Organization Lis 653 Spring 2017 Class PostersKnowledge Organization Lis 653 Spring 2017 Class Posters
Knowledge Organization Lis 653 Spring 2017 Class Posters
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 

Mehr von Marko Rodriguez

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic MachineMarko Rodriguez
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data TypeMarko Rodriguez
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryMarko Rodriguez
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialMarko Rodriguez
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 

Mehr von Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 

Kürzlich hochgeladen

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Distributed Graph Databases and the Emerging Web of Data

  • 1. Distributed Graph Databases and the Emerging Web of Data Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com April 16, 2009
  • 2. Abstract The World Wide Web is the defacto medium for publicly exposing a corpus of interrelated documents. In its current form, the World Wide Web is the Web of Documents. The next generation of the World Wide Web will support the Web of Data. The Web of Data utilizes the same Uniform Resource Identifier (URI) address space as the Web of Documents, but instead of a exposing a graph of documents, the Web of Data exposes a graph of data. Given that the URI address space of the Web is distributed and infinite, the Web of Data provides a single unified space by which the worlds data can be publicly exposed and interrelated. The Web of Data is supported by both graph databases (which structure the data) and distributed computing mechanism (which process the data). This presentation will discuss the Web of Data, graph databases, and models of computing in this emerging space. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 3. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data • Local Computing vs. Distributed Computing • Multi-Relational Network Analysis with Grammar Walkers Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 4. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data • Local Computing vs. Distributed Computing • Multi-Relational Network Analysis with Grammar Walkers Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 5. The Relational Database vs. the Graph Database • A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model is a collection interlinked tables. • A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model is a multi-relational graph. Relational Database Graph Database d c a a b 127.0.0.1 127.0.0.2 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 6. Types of Graphs • Undirected single-relational graph: homogenous set of symmetric links. • Directed single-relational graph: homogenous set of links. • Directed multi-relational graph: heterogenous set of links. undirected single-relational graph x z directed single-relational graph x z directed multi-relational graph x y z Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 7. Our Make Believe World - Phase 1 • Marko is a human and Fluffy is a dog. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 8. Our World Modeled in a Relational Database - Phase 1 ID Name Type Legs Fur 0001 Marko Human 2 false 0002 Fluffy Dog 4 true Object_Table Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 9. Our World Modeled in a Graph Database - Phase 1 Human Dog type type 0001 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 10. Our Make Believe World - Phase 2 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 11. Our World Modeled in a Relational Database - Phase 2 ID Name Type Legs Fur ID2 ID2 0001 Marko Human 2 false 0001 0002 0002 Fluffy Dog 4 true 0002 0001 Object_Table Friendship_Table Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 12. Our World Modeled in a Graph Database - Phase 2 Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 13. Our Make Believe World - Phase 3 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 14. Our World Modeled in a Relational Database - Phase 3 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal Object_Table Friendship_Table Subclass_Table Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 15. Our World Modeled in a Graph Database - Phase 3 Mammal subclassof subclassof Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 16. Our Make Believe World - Phase 4 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 17. Our World Modeled in a Relational Database - Phase 4 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 0002 0003 Pee_Table Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 18. Our World Modeled in a Graph Database - Phase 4 Mammal subclassof subclassof Human Dog Carpet type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 19. Our Make Believe World - Phase 5 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. • Marko and Fluffy are both mammals. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 20. Our World Modeled in a Relational Database - Phase 5 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 ID Type 0002 0003 0001 Human Pee_Table 0002 Dog 0003 Carpet 0001 Mammal 0002 Mammal Type_Table Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 21. Our World Modeled in a Graph Database - Phase 5 Mammal subclassof subclassof Human Dog Carpet type type type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 22. The Graph as the Natural World Model • The world is inherently (or perceived as) object-oriented. • The world is filled with objects and relations among them. • The multi-relational graph is a very natural representation of the world. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 23. The Graph as the Natural Programming Model • High-level computer languages are object-oriented. • Nearly no impedance mismatch between the multi-relational graph and the programming object. • It is easy to go from graph database to in-memory object. Human marko = new Human(); marko.name = "Marko"; marko.addFriend(fluffy); marko.setHasFur(false); marko.setLegs(2); Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 24. SQL vs. SPARQL SELECT OTY.Name FROM Object_Table AS OTX, Object_Table AS OTY, Friendship_Table WHERE OTX.Name = "Marko" AND Friendship_Table.ID1 = OTY.ID AND Friendship_Table.ID2 = OTX.ID; SELECT ?z WHERE { ?x name "Marko" . ?y friend ?x . ?y name ?z } E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, WWW Consortium, http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/, 2004. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 25. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data • Local Computing vs. Distributed Computing • Multi-Relational Network Analysis with Grammar Walkers Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 26. Internet Address Spaces • The Uniform Resource Identifier (URI) is the superclass of the Uniform Resource Locator (URL) and Uniform Resource Name (URN). Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 27. The Uniform Resource Locator • The set of all URLs is the address space of all resources that can be located and retrieved on the Web. URLs denote where a resource is. http://markorodriguez.com/index.html ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6 ∗ http:// means GET at port 80, ∗ /index.html means the resource to get at that Internet location. Web Server index.html markorodriguez.com 216.251.43.6 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 28. The Uniform Resource Name • The set of all URNs is the address space of all resources within the urn: namespace. urn:uuid:bd93def0-8026-11dd-842be54955baa12 urn:issn:0892-3310 urn:doi:10.1016/j.knosys.2008.03.030 • Named resources need not be retrievable through the Web. • URNs denote what a resource is. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 29. The Uniform Resource Identifier • The URI address space is an infinite space for all Internet resources. urn:issn:0892-3310 ftp://markorodriguez.com/private/markos_secrets.txt http://www.lanl.gov#fluffy • Important: URIs can denote concepts, instances, and datum. lanl:fluffy lanl:fluffy_legs lanl is a namespace prefix which extends to http://www.lanl.gov#. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 30. The Web of Documents • The World of Documents is primarily concerned with the Hyper-Text Transfer Protocol (HTTP) and with retrievable resources in the URL address space. • These retrievable resources are files: HTML documents, images, audio, etc. The “web” is created when HTML documents contain URLs. http://markorodriguez.com/ index.html href Resume.html href Home.html href Research.html Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 31. The Web of Data • The Web of Data is primarily concerned with URIs. • The Resource Description Framework (RDF) is the standard for representing the relationship between URIs and literals (e.g. float, string, date time, etc.). subject predicate object lanl:marko foaf:knows lanl:fluffy foaf:name foaf:name "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data on the Web, International World Wide Web Conference, 2008. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 32. Our Make Believe World in RDF lanl:Mammal rdfs:subClassOf rdfs:subClassOf lanl:Human lanl:Dog rdf:type rdf:type rdf:type rdf:type lanl:marko lanl:friend lanl:fluffy lanl:friend lanl:fur lanl:legs lanl:fur lanl:legs foaf:name foaf:name "false"^^xsd:boolean "2"^^xsd:integer "true"^^xsd:boolean "4"^^xsd:integer "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 33. The Web of Data is a Distributed Database • The URI address space is distributed. • URIs can denote datum. • RDF denotes the relationships URIs. • The Web of Data’s foundational standard is RDF. • Therefore, the Web of Data is a distributed database. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 34. The Web of Documents vs. the Web of Data Web Server Web Server HTML href HTML 127.0.0.1 127.0.0.2 Graph Database Graph Database lanl:friend 127.0.0.1 127.0.0.2 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 35. The Current Web of Data - March 2009 homologenekegg projectgutenberg symbol homologenekegg libris projectgutenberg cas symbol bbcjohnpeel libris unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi geneid omim wikicompany geospecies cas bbcjohnpeel diseasome dailymed drugbank worldfactbook reactome pubmed unists magnatune opencyc w3cwordnet uniparc linkedct chebi freebase taxonomy uniref uniprot geneontology interpro hgnc pubchem eurostat pdb yago umbel pfam mgi dbpedia omim bbclatertotpgovtrack wikicompany geospecies prosite prodom flickrwrappr geneid opencalais reactome uscensusdata drugbank worldfactbook lingvoj linkedmdb surgeradio magnatune pubmed virtuososponger opencyc rdfbookmashup uniparc freebase swconferencecorpus geonames musicbrainz myspacewrapper linkedct dblpberlin uniprot pubguide taxonomy revyu interpro uniref geneontologyjamendo bbcplaycountdata rdfohloh pdb umbel yago semanticweborg siocsites riese pfam dbpedia bbclatertotp govtrack foafprofiles dblphannover openguides audioscrobbler prosite bbcprogrammes prodom crunchbase flickrwrappropencalais doapspace uscensusdata flickrexporter surgeradio budapestbme qdos lingvoj linkedmdb semwebcentral virtuososponger eurecom ecssouthampton pisa dblprkbexplorer newcastle rdfbookmashup geonames musicbrainz rae2001 eprints irittoulouse laascnrs acm citeseer swconferencecorpus myspacewrapper ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata M.A. Rodriguez. A Graph Analysis of the Linked Data Cloud, in review, http://arxiv.org/abs/0903.0194, 2009. semanticweborg riese siocsites foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase Computer Science Department Colloquium – University of New Mexico – April 16, 2009 doapspace flickrexporter qdos
  • 36. The Current Web of Data - March 2009 data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ... Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 37. Cultural Differences that are Leading to Web-Based Data Management - Part 1 • Relational databases tend to not maintain public access points. • Relational database users tend to not publish their schemas. • Web of Data graph databases maintain public access points called SPARQL end-points or Linked Data URLs. • Web of Data graph database users tend to reuse and extend public schemas called ontologies. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 38. Cultural Differences that are Leading to Web-Based Data Management - Part 2 Conventional Model Web of Data Model 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3 Application 1 Application 2 Application 3 Application 1 Application 2 Application 3 processes processes processes processes processes processes Web of Data structures structures structures structures structures structures 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4 127.0.0.5 127.0.0.6 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 39. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data • Local Computing vs. Distributed Computing • Multi-Relational Network Analysis with Grammar Walkers Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 40. SPARQLing a Data Provider - Local Computing SELECT ?x WHERE { 127.0.0.2 lanl:marko lanl:friend ?x END-POINT 127.0.0.1 SPARQL } Graph Database { lanl:fluffy } • The 127.0.0.1 client is querying the 127.0.0.2 server. • The query is any read-based SPARQL query. • The results are those resources that bound to the query arguments. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 41. GETing Linked Data as RDF - Local Computing http://www.lanl.gov#marko lanl:fluffy lanl:friend lanl:fluffy lanl:marko HTTP GET lanl:wrote lanl:friend vub:1010 Web of Data lanl:marko ieee:2020 http://www.vub.edu#1010 lanl:wrote lanl:cites ieee:2020 vub:1010 lanl:cites vub:1010 HTTP GET 127.0.0.1 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 42. Problem with the Current Web of Data Infrastructure • The only interfaces are SPARQL end-points and HTTP GETs of RDF subgraphs. • For human-based document retrieval, this is fine. For machine-based data processing, this does not scale. M.A. Rodriguez. A Distributed Process Infrastructure for a Distributed Data Structure. Semantic Web and Information Systems Bulletin, AIS Special Interest Group on Semantic Web and Information Systems, http://arxiv.org/abs/0807.3908, 2008. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 43. Problem with the Current Web of Data Infrastructure • We can not rely on the “download and index” philosophy of the World Wide Web. As of March 2009, the Web of Data maintains 4.5 billion triples. • The Web of Data can not rely on a single service provider. too much data. too many types algorithms that can utilize this data. too many clock cycles to locally process this data. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 44. The Open Virtual Machine Farm Graph Database Graph Database lanl:friend 127.0.0.1 127.0.0.2 Virtual Machine code/ Virtual Machine Farm machine Farm • Distributed computing through code/machine migration between farms. • move the process to the data, not the data to the process. M.A. Rodriguez. General Purpose Computing on a Semantic Network Substrate. in Emergent Web Intelligence, eds. R. Chbeir, A. Hassanien, A. Abraham and Y. Badr, Springer-Verlag, http://arxiv.org/abs/0704.3395, 2009. M.A. Rodriguez. The RDF Virtual Machine, in review, LA-UR-08-03925, 2009. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 45. Neno RDF Programming Language - Code Serialization urn:uuid: demo:Human rdf:type 4fa0f752 hasMethod xsd:int example(xsd:string a) Method { urn:uuid: hasMethodName 6e400b42 if(a == "marko") return 1; hasBlock else Block "example"^^xsd:string return 2; urn:uuid: 4e0bada0 } nextInst Equals urn:uuid: Block 51b8d4a0 urn:uuid: falseInst 67bbd072 nextInst hasLeft Branch Block nextInst urn:uuid: urn:uuid: PushValue trueInst 51b8d4a0 610eb4b0 urn:uuid: LocalDirect 6d451a1e nextInst urn:uuid: hasRight 54e14d4c PushValue hasValue LocalDirect urn:uuid: LocalDirect hasURI urn:uuid: 5c4d5bc2 5869b878 urn:uuid: 62e8b8dc hasURI hasValue "a"^^xsd:string hasURI LocalDirect nextInst urn:uuid: "marko"^^xsd:string 6425e5ec nextInst "2"^^xsd:int hasURI Return urn:uuid: urn:uuid: 008e999a "1"^^xsd:int 0748e1c6 Return Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 46. The Fhat RDF Virtual Machine - Machine Serialization xsd:boolean RVM xsd:boolean [1] [1] methodReuse halt programLocation Fhat operandTop hasFrame returnTop [0..1] [0..1] [0..1] currentFrame [0..1] Operand [0..1] Instruction ReturnStack Stack rdf:rest rdf:rest blockTop rdf:first [0..1] [0..*] rdf:first [0..1] [0..1] forFrame Frame [1] rdfs:Resource Instruction rdf:li [0..*] [0..1] [0..1] Frame Block Variable Stack rdf:rest hasSymbol hasValue fromBlock rdf:first [0..1] [1] [0..*] [1] Block xsd:string rdfs:Resource Block Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 47. A Collection of Interlinked Graph Databases - Currently 127.0.0.2 127.0.0.3 127.0.0.6 127.0.0.4 127.0.0.5 127.0.0.10 127.0.0.9 127.0.0.8 127.0.0.7 127.0.0.11 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 48. A Collection of Interlinked Graph Databases and Processors - Future 127.0.0.2 127.0.0.3 127.0.0.6 127.0.0.4 127.0.0.5 127.0.0.10 127.0.0.9 127.0.0.8 127.0.0.7 127.0.0.11 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 49. The Future of Web-Based Distributed Computing • The HTTP GET approach to Web of Data does not scale. • The Neno/Fhat (or any general-purpose computing) environment is unsafe. • The Web of Data needs an open, safe, flexible, and easy to adopt computing infrastructure. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 50. What Type of Processing? • Object-oriented programming: Web of Data as an object repository. • Logic: Web of Data as a knowledge-base. • Graph/network analysis: Web of Data as a multi-relational graph. • The future computing environment should support at least these popular processing models. • We will focus on graph/network analysis for the remainder of this presentation. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 51. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data • Local Computing vs. Distributed Computing • Multi-Relational Network Analysis with Grammar Walkers Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 52. Introduction to Random Walkers • Random walkers can be used in single-relational networks to calculate: stationary probability distribution: primary eigenvector calculation spreading activation: search by means of diffusion • There is a continuous and a discrete form of the general random walk method. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 53. Random Walks in a Single-Relational Network • Suppose a single-relational network G, where G = (V, E ⊆ (V × V )). • Let’s represent that network as a row stochastic adjacency matrix A ∈ [0, 1]|V |×|V |, where 1 Γ(i) if (i, j) ∈ E Ai,j = 0 otherwise. • Finally, assume an “energy vector” π ∈ R|V |. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 54. Random Walks in a Single-Relational Network a b c d a 0 0.5 0 0.5 b c b 0 0 1 0 1 0 0 0 c 0.5 0 0 0.5 a d d 0 1 0 0 G A π • πA can be interpreted as the continuous form of propagating random walkers over the G. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 55. Stationary Probability Distribution in a Single-Relational Network π1 1 0 0 0 a b c d π2 0 0.5 0 0.5 0 0.5 0 0.5 π3 0 0.5 0.5 0 1 π4 0 0 0 0.25 0 0.5 0.25 time 0.5 0 0 0.5 5 0 0 0 π 0.25 0.38 0 0.36 1 π6 0 0.5 0.38 0.13 A ... π∞ 0.15 0.31 0.31 0.23 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 56. Stationary Probability Distribution in a Single-Relational Network • If G is strongly connected and aperiodic then there exits a π such that π = πA. • This stationary π ∞ is the primary eigenvector of A. • PageRank computes the stationary π by forcing G (the Web citation graph) to be strongly connected and aperiodic. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 57. Spreading Activation in a Single-Relational Network • Spreading activation can be thought of as a “local rank” algorithm, while calculating the stationary probability provides you a “global rank”. • With spreading activation, you iterate for only a certain number of timesteps. • Also, you record how much energy has flowed through each vertex. • Let’s demonstrate using a single discrete walker... Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 58. Spreading Activation in a Single-Relational Network • The walkers moves from vertex to vertex with choice dependent on the probability distribution of A. • At every step, if the walker is at vertex i then πi = π + 1. 2 3 π1 1 0 0 0 G b c π2 1 1 0 0 time 1 π3 1 1 1 0 π4 a d 4 2 1 1 0 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 59. Random Walks in a Multi-Relational Network • Suppose a multi-relational network M , where M = (V, E = {E0, E1, . . . , Ek ⊆ (V × V )}) • Represent as a {0, 1}-adjacency tensor A ∈ {0, 1}|V |×|V |×|E|, where 1 if (i, j) ∈ Em : 1 ≤ m ≤ k Am = i,j 0 otherwise. • Then assume a “energy vector” π ∈ R|V |. M.A. Rodriguez and J. Shinavier. Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms, in review, http://arxiv.org/abs/0806.2274, 2009. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 60. Random Walks in a Multi-Relational Network b cites c 0 1 0 0 authored contains 0 0 0 0 1 0 0 0 a d 0 0 0 0 0 0 0 0 ns ai nt co s te ed ci or th au M A π Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 61. The Operations of the Multi-Relational Path Algebra • A · B: ordinary matrix multiplication determines the number of (A, B)- paths between vertices. • A : matrix transpose inverts path directionality. • A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively exclude paths. • n(A): not generates the complement of a {0, 1}n×n matrix. • c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix. + • v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where + only certain rows or columns contain non-zero values. • λA: scalar multiplication weights the entries of a matrix. • A + B: matrix addition merges paths. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 62. The Traverse Operation • An interesting aspect of the single-relational adjacency matrix A ∈ {0, 1}n×n is that when it is raised (k) to the kth power, the entry Ai,j is equal to the number of paths of length k that connect vertex i to vertex j . (1) • Given, by definition, that Ai,j (i.e. Ai,j ) represents the number of paths that go from i to j of length 1 (i.e. a single edge) and by the rules of ordinary matrix multiplication, (k) (k−1) Ai,j = Ai,l · Al,j : k ≥ 2. l∈V a b c a b c a b c a b c a 0 1 0 a 0 1 0 a 0 0 1 b 0 0 1 · b 0 0 1 = b 0 0 0 c 0 0 0 c 0 0 0 c 0 0 0 there is a path of length 2 from a to c Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 63. A1 : authored A2 : cites A3 : contains h ih ih i The Traverse Operation Z = A1 · A2 · A1 , Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited article to its author j . Semantically, Z is an author-citation single-relational path matrix. A2 vub:1010 lanl:cites ieee:2020 A1 lanl:authored A1 lanl:authored lanl:marko lanl:author-citation vub:fheyligh Z * NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, the operations operate on all vertices in parallel. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 64. The Filter Operation Various path filters can be defined and applied using the entry-wise Hadamard matrix product denoted ◦, where   A1,1 · B1,1 · · · A1,m · B1,m A◦B= . . ... . . . An,1 · Bn,1 · · · An,m · Bn,m 24 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 72 0 4 0 0 1 0 0 0 0 72 0 0 0 23 0 0 0 0 ◦ 1 0 0 0 0 = 23 0 0 0 0 0 0 15.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 Path Matrix Path Filter Filtered Path Matrix Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 65. The Filter Operation • A◦1=A • A◦0=0 • A◦B=B◦A • A ◦ (B + C) = (A ◦ B) + (A ◦ C) • A ◦ B = (A ◦ B) . Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 66. The Not Filter The not filter is useful for excluding a set of paths to or from a vertex. n : {0, 1}n×n → {0, 1}n×n with a function rule of 1 if Ai,j = 0 n(A)i,j = 0 otherwise. 0 0 1 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 n 0 1 1 1 1 = 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 67. The Not Filter If A ∈ {0, 1}n×n, then • n(n(A)) = A • A ◦ n(A) = 0 • n(A) ◦ n(A) = n(A). Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 68. A1 : authored A2 : cites A3 : contains h ih ih i The Not Filter A coauthorship path matrix is Z = A1 · A1 ◦ n(I) acm:0505 A1 lanl:authored A1 lanl:authored lanl:marko lanl:coauthor lanl:jbollen Z n(I) lanl:coauthor Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 69. The Clip Filter The general purpose of clip is to take a path matrix and “clip”, or normalize, it to a {0, 1}n×n matrix. c : Rn×n → {0, 1}n×n + 1 if Zi,j > 0 c(Z)i,j = 0 otherwise. 24 1 0 0 0 1 1 0 0 0 0 72 0 4 0 0 1 0 1 0 c 23 0 0 0 0 = 1 0 0 0 0 0 0 15.3 0 0 0 0 1 0 0 0 0 0 0 12 0 0 0 0 1 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 70. The Clip Filter If A, B ∈ {0, 1}n×n and Y, Z ∈ Rn×n, then + • c(A) = A • c(n(A)) = n(c(A)) = n(A) • c(Y ◦ Z) = c(Y) ◦ c(Z) • n(A ◦ B) = c (n(A) + n(B)) • n(A + B) = n(A) ◦ n(B) Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 71. A1 : authored A2 : cites A3 : contains h ih ih i The Clip Filter Suppose we want to create an author citation path matrix that does not allow self citation or coauthor citations. „ « „ „ «« 1 2 1 1 1 Z= A ·A ·A ◦n c A · A ◦ n(I) ◦ n(I) |{z} | {z } | {z } no self cites no coauthors Z lanl:author-citation odu:nelson authored 2 A A1 lanl:3030 lanl:cites lanl:4040 A 1 A1 lanl:authored lanl:authored lanl:authored lanl:marko lanl:coauthor lanl:jbollen n c A1 · A1 ◦ n(I) self n(I) Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 72. A1 : authored A2 : cites A3 : contains h ih ih i The Clip Filter However, using various theorems of the path algebra and abstract algebra in general, Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I) ◦ n(I) no self cites no coauthors becomes Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I). Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 73. Other Filters and Operations... • Please refer to the article for more information on these filters and operations. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 74. Problems with the Path Algebra • As a matrix algebra, it is impossible (computationally speaking) to compute matrix operations over the entire Web of Data. • However, it is possible to approximate these calculations using “random” walkers. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 75. Mapping Paths to Grammar-Based Random Walkers • A grammar-based random walker is a walker that obeys a path description. • Able to compute “semantically rich” spreading activation and stationary probability distributions in a multi-relational network. • Able to approximate through the convergence properties of these operations. • Provides a convenient application to the Web of Data and linked graph databases. M.A. Rodriguez. Grammar-Based Random Walkers in Semantic Networks. Knowledge-Based Systems, 21(7), 727–739, 2008. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 76. A Grammar Walker Grammar Walker A1 · A1 ◦ n(I) t=1 t=2 t=3 Web of Data structures structures structures 127.0.0.4 127.0.0.5 127.0.0.6 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 77. Grammar Walking the Web of Data 127.0.0.1 1 7 127.0.0.2 127.0.0.3 2 127.0.0.6 127.0.0.4 127.0.0.5 127.0.0.10 3 127.0.0.9 127.0.0.8 6 5 127.0.0.7 4 127.0.0.11 Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 78. Conclusion • Graph databases will increasingly support the Web of Data. • The Web of Data is about open, global-scale data management. • Distributed computing is required for global-scale data processing. • Grammar walkers can be used for distributed network analysis on the Web of Data. Computer Science Department Colloquium – University of New Mexico – April 16, 2009
  • 79. Thank You For Your Time My homepage: http://markorodriguez.com Neno/Fhat: http://neno.lanl.gov Collective Decision Making Systems: http://cdms.lanl.gov Faith in the Algorithm: http://faithinthealgorithm.net MESUR: http://www.mesur.org Computer Science Department Colloquium – University of New Mexico – April 16, 2009