Graph Databases: Trends in the Web of Data

Graph Databases: Trends in the Web of Data
Marko A. Rodriguez
Graph Systems Architect
http://markorodriguez.com
http://twitter.com/twarko
http://slideshare.com/slidarko

KRDB Trends in the Web of Data School - Brixen/Bressanone, Italy– September 18, 2010

September 18, 2010

Abstract
Relational databases are perhaps the most commonly used data management systems. In
relational databases, data is modeled as a collection of disparate tables. In order to unify
the data within these tables, a join operation is used. This operation is expensive as the
amount of data grows. For information retrieval operations that do not make use of
extensive joins, relational databases are an excellent tool. However, when an excessive
amount of joins are required, the relational database model breaks down. In contrast,
graph databases maintain one single data structure—a graph. A graph contains a set of
vertices (i.e. nodes, dots) and a set of edges (i.e. links, lines). These elements make
direct reference to one another, and as such, there is no notion of a join operation. The
direct references between graph elements make the joining of data explicit within the
structure of the graph. The beneﬁt of this model is that traversing (i.e. moving between
the elements of a graph in an intelligent, direct manner) is very eﬃcient and yields a style
of problem-solving called the graph traversal pattern. This session will discuss graph
databases, the graph traversal programming pattern, and their use in solving real-world
problems.

Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs

difﬁculty
graphs

algebra
databases

indices
time

data models
Diﬃculty Chart

software

algorithms

real-world

conclusion

A Vertex

There once was a vertex i ∈ V named tenderlove.

Two Vertices

And then came along another vertex j ∈ V named sixwing.
Thus, i, j ∈ V .

A Directed Edge

Our tenderlove extended a relationship to sixwing. Thus,
(i, j) ∈ E.

The Single-Relational, Directed Graph

More vertices join, create edges and, in turn, the graph grows...

The Single-Relational, Directed Graph as a Matrix

A single-relational graph deﬁned as

G = (V, E ⊆ (V × V ))

can be represented as the adjacency matrix A ∈ {0, 1}n×n, where

1 if (i, j) ∈ E
Ai,j =
0 otherwise.

The Single-Relational, Directed Graph as a Matrix

0 1 1 0

1 0 0 1

1 0 0 0

0 1 0 0

G A

The Single-Relational, Directed Graph

• All vertices are homogenous in meaning—all vertices denote the same
type of object (e.g. people, webpages, etc.).1

• All edges are homogenous in meaning—all edges denote the same type
of relationships (e.g. friendship, works with, etc.).2

1
This is not completely true. All n-partite single-relational graphs allow for the division of the vertex set
into n subsets, where V = n Ai : Ai ∩ Aj = ∅. Thus, its possible to implicitly type the vertices.
i
2
This is not completely true. There exists an injective, information-preserving function that maps any
multi-relational graph to a single-relational graph, where edge types are denoted by topological structures.
Thus, at a “higher-level,” it is possible to create a heterogenous set of relationships.
Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of Applied
Mathematics and Computer Sciences, 5(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]

Applications of Single-Relational Graphs

• Social: define how people interact (collaborators, friends, kins).

• Biological: define how biological components interact (protein, food
chains, gene regulation).

• Transportation: define how cities are joined by air and road routes.

• Dependency: define how software modules, data sets, functions depend
on each other.

• Technology: define the connectivity of Internet routers, web pages, etc.

• Language: define the relationships between words.

The Limitations of Single-Relational Graph Modeling

Friendship Graph Favorite Graph Works-For Graph

Unfortunately, single-relational graphs are independent of each other. This
is because G = (V, E)—there is only a single edge set E (i.e. a single type
of relation).

Numerous Algorithms for Single-Relational Graphs
We would like a more flexible graph modeling construct, but unfortunately,
most of our graph algorithms were designed for single-relational graphs.3

• Geodesic: diameter, radius, eccentricity, closeness, betweenness, etc.

• Spectral: random walks, PageRank, eigenvector centrality, spreading
activation, etc.

• Assortativity: scalar, categorical, hierarchal, etc.

• Others: ...4
3
For a fine book on graph analysis algorithms, please see:
Brandes, U., Erlebach T., “Network Analysis: Methodological Foundations,” edited book, Springer, 2005.
4
One of the purposes of this presentation is advocate for local graph analysis algorithms (i.e. priors-based,
relative) vs. global graph analysis algorithms. Most popular graph analysis algorithms are global in that
they require an analysis of the whole graph (or a large portion of a graph) to yield results. Local analysis
algorithms are dependent on sub-graphs of the whole and in effect, can boast faster running times.

How do we solve this?

A multi-relational graph and a path
algebra.

A Directed, Labeled Edge

friend

Lets specify the type of relationship that exists between
tenderlove and sixwing. Thus, (i, j) ∈ Efriend.

Growing a Multi-Relational Graph

friend

friend

Lets make the friendship relationship symmetric. Thus,
(j, i) ∈ Efriend.


friend friend

friend

friend

Lets add marko to the mix: k ∈ V . This graph is still
single-relational. There is only one type of relation.


friend friend favorite

friend

friend

Lets add an (i, l) ∈ Efavorite. Now there are multiple types of
relationships: Efriend and Efavorite (2 edge sets).

The Multi-Relational, Directed Graph

• At this point, there is a multi-relational, directed graph: G = (V, E),
where E = (E0, E1, . . . , Em ⊆ (V × V )).5

• Vertices can denote different types of objects (e.g. people, places).6

• Edge can denote different types of relationships (e.g. friend, favorite).7

5
Another representation is G ⊆ (V × Ω × V ), where Ω ⊆ Σ∗ is the set of legal edge labels.
6
Vertex types can be determined by the domain and range specification of the respective edge
relation/label/predicate. Or, another way, by means of an explicit typing relation such as a, type, b .
7
Edge types are determined by the label that accompanies the edge.

The Multi-Relational, Directed RDF Graph

• This is the data model of the Web of Data—the RDF data model.

• The RDF data model’s vertex set is split into URIs (U ), literals (L), and
blank/anonymous nodes (B), such that:

G ⊆ ((U × B) × U × (U × B × L)).8

8
Named graphs are a popular extension to the RDF data model. There are various serializatons such as
TriX FIND and Trig FIND. However, for the sake of brevity, this presentation will not discuss named graphs.

The Multi-Relational, Directed Graph as a Tensor

A three-way tensor can be used to represent a multi-relational graph. If

G = (V, E = {E0, E1, . . . , Em ⊆ (V × V )})

is a multi-relational graph, then A ∈ {0, 1}n×n×m and

1 if (i, j) ∈ Em : 1 ≤ k ≤ m
Ak
i,j =
0 otherwise.

Thus, each edge set in E represents an adjacency matrix and the
combination of m adjacency matrices forms a 3-way tensor.

The Multi-Relational, Directed Graph as a Tensor

friend
0 0 0 0
0 0 0 1
friend favorite
0 0 0 0

0 0 0 0

s
er
sw

nd
an

e

ite
fri

or
G A
v
fa

Multi-Relational Graph Algorithms

“Can we evaluate single-relational graph analysis algorithms
on a multi-relational graph?”

The Meaning of Edge Meanings

loves loves loves hates hates hates
loves loves hates hates

• Multi-relationally: tenderlove is more liked than marko.

• Single-relationally: tenderlove and marko simply have the same
in-degree.
Given, lets say, degree-centrality, tenderlove and marko are equal as
they have the same number of relationships. The edge labels do not
eﬀect the output of the degree-centrality algorithm.

What Do You Mean By “Central?”
answer

...
answer_for

ite
or
v
What is your favorite

fa
answer_by
bookstore?

favorite
question_by
...

friend

friend friend

Lets focus speciﬁcally on centrality. What is the most central vertex in a
multi-relational graph? Who is the most central friend in the graph—by friendship, by
question answering, by favorites, etc?

Primary Eigenvector

“What does the primary eigenvector of a multi-relational
graph mean?”91011

9
We will use the primary eigenvector for the following argument. Note that the same argument applies
for all known single-relational graph algorithms (i.e. geodesic, spectral, community detection, etc.).
10
Technical details are left aside such as outgoing edge probability distributions and the irreducibility of
the graph.
11
The popular PageRank vector is deﬁned as the primary eigenvector of a low-probability fully connected
graph combined with the original graph (i.e. both graphs maintain the same V ).

Primary Eigenvector: Ignoring Edge Labels

|V |×|V |
• If π = Bπ, where B ∈ N+ is the adjacency matrix formed by
merging the edge sets in E, then edge labels are ignored—all edges are
treated equally.

• In this “ignoring labels”-model, there is only one primary eigenvector for
the graph—one deﬁnition of centrality.

• With a heterogenous set of vertices connected by a heterogenous set of
edges, what does this type of centrality mean?

Primary Eigenvector: Isolating Subgraphs
• Are there other primary eigenvectors in the multi-relational graph?

• You can ignore certain edge sets and calculate the primary eigenvector
(e.g. pull out the single-relational “friend”-graph.)
π = Afriendπ, where Afriend ∈ {0, 1}|V |×|V | is the adjacency matrix
formed by the edge set Efriend.

• Thus, you can isolate subgraphs (i.e. adjacency matrices) of the
multi-relational graph and calculate the primary eigenvector for those
subgraphs.

• In this “isolation”-model, there are m deﬁnitions of centrality—one for
each isolated subgraph.12
12
Remember, A ∈ {0, 1}n×n×m .

Primary Eigenvector: Turing Completeness
• What about using paths through the graph—not simply explicit one-step
edges?

• What about determining centrality for a relation that isn’t explicit in E
(i.e. Ak ∈ A)? In general, what about π = Xπ, where X is a derived
adjacency matrix of the multi-relational graph.
For example, if I know who everyone’s friends are, then I know (i.e. can
infer, derive, compute) who everyone’s friends-of-a-friends (FOAF) are.
What about the primary eigenvector of the derived FOAF graph?

• In the end, you want a Turing-complete framework—you want complete
control (universal computability) over how π moves through the
multi-relational graph structure.13
13
These ideas are expounded upon at great length throughout this presentation.

A Path Algebra for Evaluating
Single-Relational Algorithms on Multi-Relational Graphs
• There exists a multi-relational graph algebra for mapping single-relational
graph analysis algorithms to the multi-relational domain.14

• The algebra works on a tensor representation of a multi-relational graph.

• In this framework and given the running example, there are as many
primary eigenvectors as there are abstract path deﬁnitions.
14
* Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network
Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, doi:10.1016/j.joi.2009.06.004, 2009.
[http://arxiv.org/abs/0806.2274]
* Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,
21(7), pp. 727–739, doi:10.1016/j.knosys.2008.03.030, 2008. [http://arxiv.org/abs/0803.4355]
* Rodriguez, M.A., Watkins, J.,“Grammar-Based Geodesics in Semantic Networks,” Knowledge-Based
Systems, in press, doi:10.1016/j.knosys.2010.05.009, 2010.

The Operations of the Multi-Relational Path Algebra

• A · B: ordinary matrix multiplication determines the number of (A, B)-
paths between vertices.
• A : matrix transpose inverts path directionality.
• A ◦ B: Hadamard, entry-wise multiplication applies a ﬁlter to selectively
exclude paths.
• n(A): not generates the complement of a {0, 1}n×n matrix.
• c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix.
+
• v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where
+
only certain rows or columns contain non-zero values.
• xA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.

Primary Eigenvectors in a Multi-Relational Graph
• Friend: Afriend π
2
• FOAF: Afriend · Afriend π ≡ Afriend π
2
• FOAF (no self): Afriend ◦ n(I) π 15
2
• FOAF (no friends nor self): Afriend ◦ n Afriend ◦ n(I) π

• Co-Worker: Aworks at
· Aworks at
◦ n (I) π

• Friend-or-CoWorker: 0.65Afriend + 0.35 Aworks at
· Aworks at
◦ n ( I) π
• ...and more.16
15
I ∈ {0, 1}|V |×|V | : Ii,i = 1—the identity matrix.
16
Note, again, that the examples are with respect to determining the primary eigenvector of the derived
adjacency matrix. The same argument holds for all other single-relational graph analysis algorithms. In
general, the path algebra provides a means of creating “higher-order” (i.e. semantically-rich) single-relational
graphs from a single multi-relational graph. Thus, these derived matrices can be subjected to standard
single-relational graph analysis algorithms.

Deriving “Semantically Rich” Adjacency Matrices

0 0 0 0
0 0 0 0

=
0 0 1 0
0 0 0 0
0 0 0 1

0 0 0 0
∪ 0 1 0 0 0 0 0 1

0 0 0 0
0 0 0 0 0 0 0 0
s

an f) d
er

n
0 0 0 0
sw

se ie
nd

fri rs
o -fr

an

e

e
ite
fri

(n -of

d
sw
l

en
A Afriend · A friend
or

nd
◦ n(I) A
v

e
e
fa

rit
fri

vo
fa
2
Afriend ◦ n(I)
friend-of-a-friend (no self)

Use the multi-relational graph to generate explicit edges that were implicitly deﬁned as
paths. Those new explicit edges can then be memoized17 and re-used (time vs. space
tradeoﬀ)—aka path reuse.
17
Memoization Wikipedia entry: http://en.wikipedia.org/wiki/Memoization.

Benefits, Drawbacks, and Future of the Path Algebra
• Benefit: Provides a set of theorems for deriving equivalences and thus,
provides the foundation for graph traversal engine optimizers.18 Serves a
similar purpose as the relational algebra for relational databases.19

• Drawback: The algebra is represented in matrix form and thus,
operationally, works globally over the graph.20

• Future: A non-matrix-based, ring theoretic model of graph traversal
that supports +, −, and · on individual vertices and edges. The Gremlin
[http://gremlin.tinkerpop.com] graph traversal engine presented
later provides the implementation before a fully-developed theory.
18
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]
19
Codd, E.F., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM,
13(6), pp. 377–387, doi:10.1145/362384.362685, 1970.
20
It is possible to represent local traversals using vertex filters at the expense of clumsy notation.

The Simplicity of a Graph

• A graph is a simple data structure.

• A graph states that something is related to something else (the foundation
of any other data structure).21

• It is possible to model a graph in various types of databases.22
Relational database: MySQL, Oracle, PostgreSQL
JSON document database: MongoDB, CouchDB
XML document database: MarkLogic, eXist-db
etc.
21
A graph can be used to represent other data structures. This point becomes convenient when looking
beyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing their
applicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc.
22
For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directed
graph. Note that it is possible to model multi-relational graphs in these types of database as well.

Representing a Graph in a Relational Database

outV | inV
------------ A
A | B
A | C
C | D B C
D | A

D

Representing a Graph in a JSON Database

{
A : {
outE : [B, C] A
}
B : {
outE : []
}
B C
C : {
outE : [D]
}
D : {
outE : [A] D
}
}

Representing a Graph in an XML Database

graphml
graph
node id=A / A
node id=B /
node id=C /
node id=D /
edge source=A target=B /
edge source=A target=C / B C
edge source=C target=D /
edge source=D target=A /
/graph
/graphml
D

Deﬁning a Graph Database

“If any database can represent a graph, then what
is a graph database?”

Defining a Graph Database

A graph database is any storage system that
provides index-free adjacency.2324

23
There is no “official” definition of what makes a database a graph database. The one provided is my
definition (respective of the influence of my collaborators in this area). However, hopefully the following
argument will convince you that this is a necessary definition. Given that any database can model a graph,
such a definition would not provide strict enough bounds to yield a formal concept (i.e. ).
24
There is adjacency between the elements of an index, but if the index is not the primary data structure
of concern (to the developer), then there is indirect/implicit adjacency, not direct/explicit adjacency. A
graph database exposes the graph as an explicit data structure (not an implicit data structure).

Deﬁning a Graph Database by Example

Toy Graph Gremlin
(stuntman)

B E

A

C D

Graph Databases and Index-Free Adjacency
B E

A

C D

• Our gremlin is at vertex A.
• In a graph database, vertex A has direct references to its adjacent vertices.
• Constant time cost to move from A to B and C . It is dependent upon the number
of edges emanating from vertex A (local).

Graph Databases and Index-Free Adjacency

B E

A

C D

The Graph (explicit)

Non-Graph Databases and Index-Based Adjacency

B E

A B C A
B,C E D,E

D E
C D

• Our gremlin is at vertex A.


B E

A B C A
B,C E D,E

D E
C D

• In a non-graph database, the gremlin needs to look at an index to determine what
is adjacent to A.
• log2(n) time cost to move to B and C . It is dependent upon the total number of
vertices and edges in the database (global).


B E

A B C A
B,C E D,E

D E C D

The Index (explicit) The Graph (implicit)

Index-Free Adjacency
• While any database can implicitly represent a graph, only a
graph database makes the graph structure explicit.25

• In a graph database, each vertex serves as a “mini index”
of its adjacent elements.26

• Thus, as the graph grows in size, the cost of a local step
remains the same.27
25
Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_
Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in a
relational database (MySQL) and a graph database (Neo4j).
26
Each vertex can be intepreted as a “parent node” in an index with its children being its adjacent
elements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit the
graph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner)
27
A graph, in many ways, is like a distributed index.

Graph Databases Do Make Use of Indices

A B C
} Index of Vertices
(by id)

D E } The Graph

• There is more to the graph than the explicit graph structure.

• Indices index the vertices by their properties (e.g. ids, name, latitude).28
28
Graph databases can be used to create index structures. In fact, in the early days of Neo4j, Neo4j used
its own graph structure to index the properties of its vertices—a graph indexing a graph. A thought iterated
many times over by Craig Taverner who is interested in graph databases for geo-spatial indexing/analysis.

The Patterns of a Relational Database

• In a relational database, operations are conceptualized set-
theoretically with the joining of tuple structures being the
means by which normalized/separated data is associated.

The Pattern of a Graph Databases

• In a graph database, operations are conceptualized graph-
theoretically with paths over edges being the means by which
non-adjacent/separated vertices are associated.29

29
Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” ATTi and NeoTechnology Technical
Report, currently in review, 2010. [http://arxiv.org/abs/1004.1001]

What About Triple/Quad Stores?

• In a triple/quad store, operations are conceptualized set-
theoretically.
pattern matching (e.g. SPARQL): ?pattern
inferencing (e.g. RDFS, OWL): ?pattern =⇒ triples.

• In many implementations, the triple/quad store make use
of indices that combine subjects (?s), predicates (?p), and
objects (?o).

Triple/Quad Stores, Graph Theory, and the Web of Data

• The triple/quad store rides an interesting boundary between
a relational and graph database — though its seen more set
theoretically. This is because, I believe, RDF/Web of Data
is not presented/taught in terms of graphs and graph
theoretic operations.

Graph Databases and the Web of Data

• In theory and ignoring performance, index and index-free models have the
same expressivity and allow for the same manipulations. But such theory
does not determine intention and the mental ruts that any approach
engrains.

• Can the graph traversal pattern become a staple in the Web of
Data?
Formulate SPARQL pattern matching in terms of traversing.
Formulate inference in terms of traversing.
Take advantage of graph theoretic models of data processing.

TinkerPop: Making Stuﬀ for the Fun of It
• Open source software group started in 2008 focusing on graph data
structures, graph query engines, graph-based programming languages,
and, in general, tools and techniques for working with graphs.
[http://tinkerpop.com] [http://github.com/tinkerpop]
Current members: Marko A. Rodriguez (ATTi), Peter Neubauer
(NeoTechnology), Joshua Shinavier (Rensselaer Polytechnic Institute),
and Pavel Yaskevich (“I am no one from nowhere”).

TinkerPop Productions

• Blueprints: Data Models and their Implementations
[http://blueprints.tinkerpop.com]
• Pipes: A Data Flow Framework using Process Graphs
[http://pipes.tinkerpop.com]
• Gremlin: A Graph-Based Programming Language
[http://gremlin.tinkerpop.com]
• Rexster: A RESTful Graph Shell
[http://rexster.tinkerpop.com]
Wreckster: A Ruby API for Rexster
[http://github.com/tenderlove/wreckster]

There are other TinkerPop products (e.g. Ripple, LoPSideD, TwitLogic, etc.), but for the
purpose of this presentation, only the above will be discussed.

Blueprints: Data Models and their Implementations

Blueprints

• Blueprints is the like the JDBC of the graph database community.

• Provides a Java-based interface API for the property graph data model.
Graph, Vertex, Edge, Index.

• Provides implementations of the interfaces for TinkerGraph, Neo4j,
OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully)
others such as InﬁniteGraph, InfoGrid, Sones, and HyperGraphDB.30
30
HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its current
form, only supports the more common binary graph.

Pipes: A Data Flow Framework using Process Graphs

Pipes

• A dataﬂow framework with support for Blueprints-based graph processing.

• Provides a collection of “pipes” (implement Iterable and Iterator)
that are connected together to form processing pipelines.
Filters: ComparisonFilterPipe, RandomFilterPipe, etc.
Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc.
Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.
Logic: OrPipe, AndPipe, etc.

Gremlin: A Graph-Based Programming Language

Gremlin G = (V, E)

• A Turing-complete, graph-based programming language that compiles
Gremlin syntax down to Pipes (implements JSR 223).

• Support various language constructs: :=, foreach, while, repeat,
if/else, function and path deﬁnitions, etc.
./outE[@label=‘friend’]/inV
./outE[@label=‘friend’]/inV/outE[@label=‘friend’]/inV[g:except($ )]
g:key(‘name’,‘Aaron Patterson’)[0]/outE[@label=‘favorite’]/inV/@name

Rexster: A RESTful Graph Shell

reXster
• Allows Blueprints graphs to be exposed through a RESTful API (HTTP).

• Supports stored traversals written in raw Pipes or Gremlin.

• Supports adhoc traversals represented in Gremlin.

• Provides “helper classes” for performing search-, score-, and rank-based
traversal algorithms—in concert, support for recommendation.

• Aaron Patterson (ATTi) maintains the Ruby connector Wreckster.

Typical TinkerPop Graph Stack
GET http://{host}/{resource}

Neo4j NativeStore TinkerGraph

Using Graphs in Real-Time Systems
• Most popular graph algorithms require global graph analysis.
Such algorithms compute a score, a vector, etc. given the structure
of the whole graph. Moreover, many of these algorithms have large
running times: O(|V | + |E|), O(|V | log |V |), O(|V |2), etc.

• Many real-world situations can make use of local graph analysis.31
Search for x starting from y.
Score x given its local neighborhood.
Rank x relative to y.
Recommend vertices to user x.
31
Many web applications are “ego-centric” in that they are with respect to a particular user (the user
logged in). In such scenarios, local graph analysis algorithms are not only prudent to use, but also, beneﬁcial
in that they are faster than global graph analysis algorithms. Many of the local analysis algorithms discussed
run in the sub-second range (for graphs with “natural” statistics).

Applications of Graph Databases and Traversal Engines:
Searching, Scoring, and Ranking
ˆ
• Searching: given a power multi-set of vertices (P(V )) and a path
description (Ψ), return the vertices at the end of that path.32
ˆ ˆ
P(V ) × Ψ → P(V )

• Scoring: given some vertices and a path description, return a score.
ˆ
P(V ) × Ψ → R

• Ranking: given some vertices and a path description, return a map of
scored vertices.
ˆ
P(V ) × Ψ → (V × R)
32
Use cases need not be with respect to vertices only. Edges can be searched, scored, and ranked as well.
However, in order to express the ideas as simply as possible, all discussion is with respect to vertices.

Applications of Graph Databases and Traversal Engines:
Recommendation
• Recommendation: searching, scoring, and ranking can all be used as
components of a recommendation. Thus, recommendation is founded on
these more basic ideas.
Recommendation aids the user by allowing them to make “jumps” through
the data. Items that are not explicitly connected, are connected implicitly through
recommendation (through some abstract path Ψ).

• The act of recommending can be seen as an attempt to increase the
density of the graph around a user’s vertex. For example, recommending
user i ∈ V places to visit U ⊂ V , will hopefully lead to edges of the form
i, visited, j : ∀j ∈ U .33
33
A standard metric for recommendation quality is seen as how well it predicts the user’s future behavior.
That is, does it predict an edge.

There Is More Than “People Who Like X Also Like Y .”
• A system need not be limited to one type of recommendation. With graph-based
methods, there are as many recommendations as there are abstract paths.
• Use recommendation to aid the user in solving problems (i.e. computationally
derive solutions for which your data set is primed for). Examples below are with respect
to problem-solving in the scholarly community.34
Recommend articles to read. (articles)
Recommend collaborators to work on an idea/article with. (people)
Recommend a venue to submit the article to. (venues)
Recommend an editor referees to review the article. (people)35
Recommend scholars to talk to and concepts to talk to them about at the venue.
(people and tags)
34
Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the
Scholarly Communication Process,” KRS-2009-02, 2009. [http://arxiv.org/abs/0905.1594]
35
Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information
and Knowledge Management (CIKM), pp. 319–328, doi:10.1145/1458082.1458127, 2008. [http:
//arxiv.org/abs/cs/0605112]

Real-Time, Domain-Speciﬁc, Graph-Based,
Problem-Solving Engine

Ψ5
Ψ1 Real-Time
+ Ψ4
Ψn Ψ2
Ψ3
= Domain-Speciﬁc
Graph-Based
Problem-Solving Engine

Library of Path/Traversal
Expressions
Graph Data Set

Your domain model (i.e. graph dataset) determines what traversals you can design,
develop, and deploy. Together, these determine which types of problems you can solve
automatically/computationally for yourself, your users.

Applicable in Various, Seemingly Diverse Areas
• Applications to a techno-social government (i.e. collective decision making systems).36

0.20
correct decisions
0.00 0.05 0.10 0.15 0.95
direct democracy
dynamically distributed democracy

0.80
proportion oferror
0.65
dynamically distributed democracy
direct democracy

0.50
100 90 80 70 60 50 40 30 20 10
100 90 80 70 60 50 40 30 20 10 0
0
percentage of active citizens
percentage of active citizens (n)

36 Fig. 5. The relationship between k and evote for direct democracy (gray
* Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective The plot provides
line) and dynamically distributed democracy (black line). Decision Making Systems
k

the proportion of identical, correct decisions over a simulation that was run
Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929]
with 1000 artificially generated networks composed of 100 citizens each.
Fig. 6. A visualization of a network of t
* Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” color denotes their “political tenden
citizen’s Hawaii
International Conference on Systems Science (HICSS), pp. 39–49, 2007. [http://arxiv.org/abs/cs/0609034] is 1, and layout. is 0.5. purple The layout algori
As previously stated, let x ∈ [0, 1]n denote the political Reingold
* Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale each citizen in this population, where xi is the of the North
tendency of Decision-Making Systems,” Proceedings
tendency of citizen i and, for the purpose of simulation, is
American Association for Computational Social and Organizational Science Conference, 2004. [http://arxiv.org/abs/cs/
determined from a uniform distribution. Assume that every 1 n “vote power” and this is represe
0412047] citizen in a population of n citizens uses some social network- such that the total amount of vote
based system to create links to those individuals that they 1. Let y ∈ Rn denote the total amo
+
believe reflect their tendency the best. In practice, these links flowed to each citizen over the cours
may point to a close friend, a relative, or some public figure a ∈ {0, 1}n denotes whether citizen
whose political tendencies resonate with the individual. In in the current decision making pro
other words, representatives are any citizens, not political values of a are biased by an unfair
candidates that serve in public office. Let A ∈ [0, 1]n×n denote of making the citizen an active parti
the link matrix representing the network, where the weight of the citizen inactive. The iterative alg
an edge, for the purpose of simulation, is denoted where ◦ denotes entry-wise multip

1 − |xi − xj | if link exists

A detour into the property graph
data model...

Property Graphs and Graph Databases

• Most graph databases support a graph data model known as a property
graph.

• A property graph is a directed, attributed, multi-relational graph.
In other words, vertices and edges are equipped with a collection of
key/value pairs.37

37
Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society
for Information Science and Technology, American Society for Information Science and Technology, 2010.

From a Multi-Relational Graph...


friend

friend

...to a Property Graph
name=marko
location=Santa Fe lat=11111
gender=male long=22222

created_at=123456

name=sixwing
location=West Hollywood
gender=male
created_at=234567

friend

friend
created_at=234567

Why the Property Graph Model?
• Standard single-relational graphs do not provide enough modeling ﬂexibility for use in
real-world situations.38
• Multi-relational graphs do and the Web of Data (RDF) world demonstrates this to be
the case in practice.

• Property graphs are perhaps more practical because not every datum needs to be
“related” (e.g. age, name, etc.). Thus, the edge and key/value model is a convenient
dichotomy.39
• Property graphs provide ﬁner-granularity on the meaning of an edge as the key/values
of an edge add extra information beyond the edge label.

38
This is not completely true—researchers use the single-relational graph all the time. However, in most
data rich applications, its limiting to work with a single edge type and a homogenous population of vertices.
39
RDF has a similar argument in that literals can only be the object of a triple. However, in practice, when
represented in a graph database, there is a single literal vertex denoting that literal and thus, is traversable
like any other vertex.

Graph Type Morphisms
weighted graph

add weight attribute

property graph

remove attributes remove attributes no op

labeled graph no op semantic graph no op directed graph

remove edge labels remove edge labels
make labels URIs no op

remove directionality
rdf graph multi-graph

remove loops, directionality,
and multiple edges

simple graph no op undirected graph

Toy Graph Dataset
lat=11111
long=22222

name=marko
created_at=123456 4 name=sixwing
location=West Hollywood
location=Santa Fe
gender=male
favorite gender=male

friend friend
1 2
3
favorite
created_at=234567 friend
favorite

6
name=Bryce Canyon favorite
5
name=charlie

We will use the toy-graph above to demonstrate Gremlin (to introduce the syntax).

Dataset Schema in Neo4j
Neo4j [http://neo4j.org] is a “schema-less” database. However, ultimately, data is
represented according to some schema whether that schema be explicit in the database, in
the code interacting with the database, or in the developer’s head.40 Please note the
schema diagrammed below is a non-standard convention.41

name=string name=string
location=string lat=double
gender=string long=double
type=Person type=Place

Person Place

friend
favorite

40
A better term for “schema-less” might have been “dynamic schema.”
41
For expressive, standardized graph-based schema languages, refer to RDFS [http://www.w3.org/TR/
rdf-schema/] and OWL [http://www.w3.org/TR/owl-features/] of the Web of Data community.

Dataset Schema in MySQL
CREATE TABLE friend (
outV INT NOT NULL,
inV INT NOT NULL);
CREATE INDEX friend_outV_index USING BTREE ON friend (outV);
CREATE INDEX friend_inV_index USING BTREE ON friend (inV);

CREATE TABLE favorite (
outV INT NOT NULL,
inV INT NOT NULL);
CREATE INDEX favorite_outV_index USING BTREE ON favorite (outV);
CREATE INDEX favorite_inV_index USING BTREE ON favorite (inV);

CREATE TABLE metadata (
vertex INT NOT NULL,
_key VARCHAR(100) NOT NULL,
_value VARCHAR(100),
PRIMARY KEY (vertex, _key));
CREATE INDEX metadata_vertex_index USING BTREE ON metadata (vertex);
CREATE INDEX metadata_key_index USING BTREE ON metadata (_key);
CREATE INDEX metadata_value_index USING BTREE ON metadata (_value);

Basic Gremlin

gremlin (1 + 2) * 4 div 5
==2.4
gremlin marko + a. + rodriguez
==marko a. rodriguez
gremlin func ex:add-one($x)
$x + 1
end
gremlin foreach $y in g:list(1,2,3,4)
g:print(ex:add-one($y))
end
2
3
4
5

Searching Example: Friends

gremlin $_g := neo4j:open(‘/data/mygraph’)
name=marko
location=Santa Fe lat=11111 gremlin $_ := g:id-v(1)
==v[1]
gremlin .
==v[1]
3 4
gremlin ./outE
created_at=123456 ==e[10][1-friend-2]
friend favorite name=sixwing
==e[11][1-friend-3]
location=West Hollywood ==e[12][1-favorite-4]
gender=male
gremlin ./outE[@label=‘friend’]/inV/@name
friend ==sixwing
1 2 ==marko
gremlin ./outE[@label=‘friend’]/inV/@gender
favorite favorite
==male
created_at=234567 friend ==male
gremin ./outE[@label=‘friend’]
6 /inV[@location=‘Santa Fe’]/@name
name=Bryce Canyon favorite ==marko
5
name=charlie

Searching Example: Friends in SPARQL
The name of tenderlove’s friends...

SELECT ?y WHERE {
ex:tenderlove ex:friend ?x .
?x ex:name ?y }

The gender of tenderlove’s friends...

SELECT ?y WHERE {
?x ex:gender ?y }

The name of tenderlove’s friends who live in Santa Fe...

SELECT ?y WHERE {
?x ex:livesIn ex:SantaFe .
?x ex:name ?y }

Searching Example: FOAF (No Friends, No Self)

gremlin .
name=marko
location=Santa Fe lat=11111 ==v[1]
gremlin ./outE[@label=‘friend’]/inV
/outE[@label=‘friend’]/inV
==v[1]
3 4
==v[1]
created_at=123456 ==v[5]
gremlin (./outE[@label=‘friend’]
location=West Hollywood /inV)[g:assign($x)]
gender=male
/outE[@label=‘friend’]
friend /inV[g:except($_)][g:except($x)]
1 2 /@name
==charlie
favorite favorite

6
5
name=charlie

Searching Example: FOAF (No Friends, No Self)
in SPARQL

The name of tenderlove’s friends’ friends who are not him or his friends.

SELECT ?z WHERE {
?x ex:friend ?y .
?y ex:name ?z .
FILTER { ?y != ex:tenderlove AND ?x != ?y }}

Searching Example: Friend’s Favorites

gremlin .
name=marko
location=Santa Fe lat=11111 ==v[1]
gremlin ./outE[@label=‘friend’]/inV
/outE[@label=‘favorite’]/inV
==v[6]
3 4
==v[6]
created_at=123456 gremlin ./outE[@label=‘friend’]/inV
/outE[@label=‘favorite’ and @created_at234500]
location=West Hollywood /inV/@name
gender=male
==Bryce Canyon
friend
1 2
favorite favorite

6
5
name=charlie

Loading Identical Data into MySQL and Neo4j

On my laptop. 10,000,000 edges are created between 100,000 vertices.
Random assignment with 50% favorite-edges and 50% friend-edges.
This is a dense, relatively unnatural graph—everyone is heavily
connected.42

42
The largest Neo4j instance that I know of contained 100,030,002 (100 million) vertices, 3,041,030,000
(3 billion) edges, and 140,120,000 (140 million) properties. This was deployed on Amazon EC2 and was
yielding FOAF traversals, on average, in ∼50ms (again, index-free traversal). Figures provided by Todd
Stavish (Stav.ish Consulting [http://blog.stavi.sh/]).

Play Query

“What do my friends’ friends
favorite?”

Querying Random Vertices with Repeats
mysql SELECT count(favorite.inV) FROM friend as fa, friend as fb, favorite
WHERE fa.outV=XXX AND fa.inV=fb.outV AND fb.inV=favorite.outV;
29.72 sec -- vertex 110752
0.330 sec -- vertex 110752 REPEAT
10.10 sec -- vertex 145893
11.64 sec -- vertex 126993
14.37 sec -- vertex 136442
6.990 sec -- vertex 154837

gremlin g:count(g:id(XXX)/outE[@label=‘friend’]/inV
/outE[@label=‘friend’]/inV/outE[@label=‘favorite’]/inV)
3.646 sec -- vertex 110752
0.756 sec -- vertex 145893
3.251 sec -- vertex 126993
1.462 sec -- vertex 136442
1.875 sec -- vertex 154837

A Traversal Detour Through the Web of Data
ECS
South-
Sem- Wiki-
BBC Surge ampton
LIBRIS Web- company
Playcount Radio Central RDF
Data ohloh
Resex
Doap- Buda-
Music- space Semantic ReSIST
brainz Audio- pest Eurécom
Project
Flickr Web.org
MySpace Scrobbler QDOS SW BME Wiki
exporter
Wrapper
Conference IRIT
Corpus Toulouse

RAE National
BBC BBC Crunch 2001 Science
FOAF SIOC ACM
BBC Music Later + John Base Revyu Foundation
Jamendo Peel profiles Sites
TOTP Open-
Guides
DBLP
flickr RKB
Project
Pub Geo- Euro- wrappr Explorer
Guten- Virtuoso
Guide names stat Pisa CORDIS
berg Sponger eprints
BBC
Programmes Open
Calais
RKB
riese World Linked
ECS
Magna- Fact- MDB IEEE New-
South-
tune book
ampton castle
RDF Book
DBpedia Mashup
Linked
GeoData lingvoj Freebase LAAS-
US CiteSeer
Census CNRS
W3C DBLP
Data IBM
WordNet Hannover
UniRef
GEO
UMBEL Species DBLP
Gov-
Track Berlin
Reactome
LinkedCT UniParc
Open Taxonomy
Cyc Yago Drug
PROSITE
Daily Bank
Med
Pub GeneID
Chem
Homolo KEGG UniProt
Gene
Pfam ProDom
Disea- CAS
Gene
some
ChEBI Ontology
Symbol OMIM

Inter
Pro
UniSTS PDB
HGNC
MGI
PubMed
As of July 2009

Image produced by Richard Cyganiak and Anja Jentzsch. [http://linkeddata.org/]

Deﬁning the Web of Data

• The Web of Data is similar to the Web of Documents (of common knowledge), but
instead of referencing documents (e.g. HTML, images, etc.) with the URI address
space, individual datum are referenced.4344
http://markorodriguez.com, foaf:fundedBy, http://atti.com
http://markorodriguez.com, foaf:name, Marko Rodriguez
http://markorodriguez.com, foaf:age, 30
http://markorodriguez.com, foaf:knows, http://tenderlovemaking.com
• In graph theoretic terms, the Web of Data is a multi-relational graph deﬁned as
G ⊆ (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B is the set of
all blank/anonymous nodes, and L is the set of all literals.
43
The Web of Data is also known as the Linked Data Web, the Giant Global Graph, the Semantic Web,
the RDF graph, etc.
44
* Rodriguez, M.A., “Interpretations of the Web of Data, Data Management in the Semantic Web, eds.
H. Jin and Z. Lv, Nova Publishing, in press, 2010. [http://arxiv.org/abs/0905.3378]
* Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” Technical Report, KRS-2009-01, 2009.

Some of the Datasets on the Web of Data
data set domain data set domain data set domain
audioscrobbler music govtrack government pubguide books
bbclatertotp music homologene biology qdos social
bbcplaycountdata music ibm computer rae2001 computer
bbcprogrammes media ieee computer rdfbookmashup books
budapestbme computer interpro biology rdfohloh social
chebi biology jamendo music resex computer
crunchbase business laascnrs computer riese government
dailymed medical libris books semanticweborg computer
dblpberlin computer lingvoj reference semwebcentral social
dblphannover computer linkedct medical siocsites social
dblprkbexplorer computer linkedmdb movie surgeradio music
dbpedia general magnatune music swconferencecorpus computer
doapspace social musicbrainz music taxonomy reference
drugbank medical myspacewrapper social umbel general
eurecom computer opencalais reference uniref biology
eurostat government opencyc general unists biology
flickrexporter images openguides reference uscensusdata government
flickrwrappr images pdb biology virtuososponger reference
foafprofiles social pfam biology w3cwordnet reference
freebase general pisa computer wikicompany business
geneid biology prodom biology worldfactbook government
geneontology biology projectgutenberg books yago general
geonames geographic prosite biology ...

Web of Data Dataset Dependencies
homologenekegg projectgutenberg
symbol libris
cas bbcjohnpeel
unists diseasome dailymed w3cwordnet
chebi
hgnc pubchem eurostat
mgi omim wikicompany geospecies
geneid
reactome drugbank worldfactbook
magnatune
pubmed opencyc
uniparc freebase
linkedct
uniprot
taxonomy interpro
uniref geneontologypdb umbel
yago
pfam dbpedia bbclatertotp govtrack
prosite
prodom flickrwrappropencalais
uscensusdata
surgeradio
lingvoj linkedmdb
virtuososponger
homologenekegg projectgutenberg
rdfbookmashup symbol libris
swconferencecorpus geonames musicbrainz myspacewrapper
dblpberlin pubguide cas bbcjohnpeel
revyu unists
jamendo diseasome dailymed w3cwordnet
chebi
rdfohloh hgnc
bbcplaycountdata
pubchem eurostat
mgi omim wikicompany geospecies
semanticweborg siocsites riese
geneid
foafprofiles reactome drugbank worldfactbook
audioscrobbler bbcprogrammes magnatune
dblphannover openguides pubmed opencyc
uniparc
crunchbase
freebase
linkedct
uniprot
taxonomy doapspace interpro
uniref geneontology pdb umbel
yago
pfam dbpedia bbclatertotp govtrack
flickrexporter
budapestbme qdos prosite
prodom flickrwrappropencalais
semwebcentral uscensusdata
eurecom ecssouthampton
dblprkbexplorer
surgeradio
newcastle lingvoj linkedmdb
pisa
rae2001 virtuososponger
acm
eprints
irittoulouse rdfbookmashup
laascnrs citeseer
swconferencecorpus geonames musicbrainz myspacewrapper
ieee
resex dblpberlin pubguide
ibm
revyu jamendo
rdfohloh
bbcplaycountdata
semanticweborg siocsites riese
foafprofiles
openguides audioscrobbler bbcprogrammes
dblphannover
crunchbase
doapspace

flickrexporter

Web of Data Transforms Development Paradigm
A new application development paradigm emerges. No longer do data and application
providers need to be the same entity (left). With the Web of Data, its possible for
developers to write applications that utilize data that they do not maintain (right).45

Application 1 Application 2 Application 3 Application 1 Application 2 Application 3

processes processes processes

processes processes processes

Web of Data

structures structures structures
structures structures structures

127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3

45
Rodriguez, M.A., “A Reﬂection on the Structure and Process of the Web of Data,”
Bulletin of the American Society for Information Science and Technology, 35(6), pp. 38–43,
doi:10.1002/bult.2009.1720350611, 2009. [http://arxiv.org/abs/0908.0373]

Extending our Knowledge of Bryce Canyon National Park
gremlin $h := lds:open()
gremlin $_ := g:id-v($h, ‘http://dbpedia.org/resource/Bryce_Canyon_National_Park’)
==v[http://dbpedia.org/resource/Bryce_Canyon_National_Park]
gremlin ./outE
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:reference - http://www.nps.gov/brca/]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:iucnCategory - II@en]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:numberOfVisitors - 1012563^^xsd:integer]
==e[dbpedia:Bryce_Canyon_National_Park - skos:subject - dbpedia:Category:Colorado_Plateau]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:visitationNum - 1012563^^xsd:int]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:abstract - Bryce Canyon National Park is a national
park located in southwestern Utah in the United States...@en]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:area - 35835.0^^http://dbpedia.org/datatype/acre]
==e[dbpedia:Bryce_Canyon_National_Park - rdf:type - dbpedia-owl:ProtectedArea]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:location - dbpedia:Garfield_County%2C_Utah]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:nearestCity - dbpedia:Panguitch%2C_Utah]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:established - 1928-09-15^^xsd:date]
...

46

46
Linked Data Sail (LDS) was developed by Joshua Shinavier (RPI and TinkerPop) and connects to
Gremlin through Gremlin’s native support for Sail (i.e. for RDF graphs). LDS caches the traversed aspects
of the Web of Data into any quad-store (e.g. MemoryStore, AllegroGraph, HyperGraphSail, Neo4jSail, etc.).

Augmenting Traversals with the Web of Data
Lets extend our query over the Web of Data. Perhaps incorporate that into our searching,
scoring, ranking, and recommendation.

gremlin $visits := ./outE[@label=‘dbpprop:visitationNum’]/inV/@value
==1012563
gremlin $acreage := ./outE[@label=‘dbpprop:area’]/inV/@value
==35835.0

### imagine wrapping traversals in Gremlin functions:
### func lds:acreage($h, $v) and func lds:visitors($h, $v)

gremlin ./outE[@label=‘friend’]/inV/outE[@label=‘favorite’]
/inV[lds:acreage($h, .) 1000000 and lds:visitors($h, .) 2000000]/@name
==Bryce Canyon

Thus, what do tenderlove’s friends favorite that are small in acreage and visitation?47
47
In Gremlin, its possible to have multiple graphs open in parallel and thus, mix and match data from
each graph as desired. Hence, demonstrated by the example above, its possible to mix Web of Data RDF
graph data and Blueprints property graph data.

Using the Web of Data for Music Recommendation

Yet another aside: Using only the Web of Data data to recommend musicians/bands
with a simplistic, edge-boolean spreading activation algorithm.48

gremlin $_ := ==The Tubes
g:id(‘http://dbpedia.../Grateful_Dead’) ==Bob Dylan
==v[http://dbpedia.../Grateful_Dead] ==New Riders of the Purple Sage
gremlin lds:spreading-activation(.) ==Bruce Hornsby
==Jerry Garcia Acoustic Band ==Donna Jean Godchaux
==BK3 ==Kingfish
==Phil Lesh and Friends ==Jerry Garcia Band
==Old and In the Way ==Donna Jean Godchaux Band
==RatDog ==The Other Ones
==The Dead ==Bobby and the Midnites
==Heart of Gold Band ==Furthur
==Legion of Mary ==Rhythm Devils

48
Please read the following for interesting, deeper ideas in this space: Clark, A., “Associative Engines:
Connectionism, Concepts, and Representational Change,” MIT Press, 1993.

Another View of the TinkerPop Stack

GET http://{host}/{resource}

Local Dataset Web of Data

owl:sameAs

Extending the Schema for Some Richer Examples
For the last part of this presentation on recommendation, we will extend
the data schema to include tags (a place can be tagged with a tag). This
will allow for some richer examples.4950

name=string name=string
location=string lat=double
gender=string long=double name=string
type=Person type=Place type=Tag

Person Place Tag

friend
favorite tagged

49
Please note that 1.) “place” can be item/thing/book/music/etc. 2.) “favorite” can be
likes/purchased/visited/etc. 3.) “tag” can be category/etc. A particular use case is presented, but with
little imagination, application to other schemas is, of course, plausible.
50
Following examples have experimental syntax that may diﬀer slightly from oﬃcial Gremlin 0.5 release.

Recommendation Example: Friend Finder
• Open Friendship Triangles: (V × Ψ) → (V × N+)51 (people)
1. Create return map (i.e. V × N+).
2. Determine who my friends are.
3. Determine who my friends friends are...
4. ...that are not already my friends or me. (weighted by the number of overlapping
friends—more overlaps, more traversers at that user vertex)
5. Sort return map by number of traversers at those user/people vertices.

$m := g:map()
(./outE[@label=‘friend’]/inV)[g:assign($x)]
/outE[@label=‘friend’]/inV
/.[g:except($x)][g:except($_)][g:op-value(‘+’,$m,.,1)]
g:sort($m,‘value’,true)
51
Rx ◦ Afriend · Afriend ◦ n Afriend ◦ n (I), where x is the user/person vertex. The in-degree
centrality vector of the derived adjacency matrix determines the resultant V rank.

Graph Databases: Trends in the Web of Data

Graph Databases: Trends in the Web of Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Graph Databases: Trends in the Web of Data

Ähnlich wie Graph Databases: Trends in the Web of Data (20)

Mehr von Marko Rodriguez

Mehr von Marko Rodriguez (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Graph Databases: Trends in the Web of Data