SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Graph Representation
Learning
Jure Leskovec
Networks
Jure Leskovec, Stanford 2
Why Networks?
Universal language for describing complex data
 Networks from science, nature, and technology
are more similar than one would expect
Data availability (+computational challenges)
 Web/mobile, bio, health, and medical
Shared vocabulary between fields
 Computer Science, Social science, Physics,
Statistics, Biology
Impact!
 Social networking, Social media, Drug design
Jure Leskovec, Stanford 3
4
Economic networksSocial networks
Networks of neurons
Information networks:
Web & citations
Biomedical networks
Internet
Many Data are Networks
Jure Leskovec, Stanford University
Networks: Common
Language
Jure Leskovec, Stanford University 5
Peter Mary
Albert
Tom
co-worker
friendbrothers
friend
Protein 1 Protein 2
Protein 5
Protein 9
Movie 1
Movie 3
Movie 2
Actor 3
Actor 1 Actor 2
Actor 4
|N|=4
|E|=4
Tasks on Networks
Classical ML tasks in networks:
 Node classification
 Predict a type of a given node
 Link prediction
 Predict whether two nodes are linked
 Community detection
 Identify densely linked clusters of nodes
 Network similarity
 How similar are two (sub)networks
6Jure Leskovec, Stanford University
Example: Node Classification
Many possible ways to create node
features:
 Node degree, PageRank score, motifs,
…
Jure Leskovec, Stanford University 7
? ?
?
?
?
Machine
Learning
Machine Learning Lifecycle
8
Network
Data
Node
Features
Learning
Algorithm
Model
Downstream
prediction task
Feature
Engineering
Automatically
learn the features
(Supervised) Machine Learning
Lifecycle: This feature, that feature.
Every single time!
Jure Leskovec, Stanford University
Feature Learning in Graphs
This talk: Feature learning
for networks!
Jure Leskovec, Stanford University 9
vectornode
𝑓: 𝑢 → ℝ 𝑑
ℝ 𝑑
Feature representation,
embedding
u
Why Learn Embeddings?
The goal is to map each node into a
low-dimensional space
 Distributed representation for nodes
 Similarity between nodes indicates link
strength
 Encodes network information and generate
node representation
Jure Leskovec, Stanford University 10
What is network embedding?
• We map each node in a network into a low-
dimensional space
– Distributed representation for nodes
– Similarity between nodes indicate the link
strength
– Encode network information and generate node
representation
17
Example
 Zachary’s Karate Club network:
Jure Leskovec, Stanford University 11
Example
• Zachary’s Karate Network:
DeepWalk: Online Learning of Social Representations. B. Perozzi, R. Al-Rfou, S Skiena, KDD 2014.
Why Is It Hard?
Images have fixed 2D structure
 Can define convolutions (CNNs)
Jure Leskovec, Stanford University 12
Why Is It Hard?
Text and Speech have linear 1D
structure
 Can define sliding windows
But graphs are non-Euclidean!
 Graphs have arbitrary size
 Node numbering is arbitrary
(node isomorphism problem)
Jure Leskovec, Stanford University 13
Graph-structur
Input
What if our data lo
This Talk: Outline
Feature Learning for networks:
1) “Linearizing” the graph
 Create a “sentence” for each node
using random walks
 node2vec
2) Graph convolution networks
 Propagate information between the
nodes of the graph
 GraphSAGE
Jure Leskovec, Stanford University 14
15
node2vec:
Unsupervised
Feature Learning
Jure Leskovec, Stanford University
node2vec: Scalable Feature Learning for Networks
A. Grover, J. Leskovec. KDD 2016.
Predicting multicellular function through multi-layer tissue networks.
M. Zitnik, J. Leskovec. Bioinformatics, 33 (14): i190-i198, 2017.
Unsupervised Feature
Learning
 Intuition: Find embedding of nodes to
𝑑-dimensions that preserves
similarity
 Idea: Learn node embedding such
that nearby nodes are close together
 Given a node 𝑢, how do we define
nearby nodes?
 𝑁𝑆 𝑢 … neighbourhood of 𝑢 obtained
by some strategy 𝑆 16Jure Leskovec, Stanford University
Feature Learning as
Optimization
 Given 𝐺 = (𝑉, 𝐸)
 Goal is to learn 𝑓: 𝑢 → ℝ 𝑑
 where 𝑓 is a table lookup
 We directly “learn” coordinates 𝑓(𝑢) of 𝑢
 Given node 𝑢, we want to learn feature
representation 𝑓(𝑢) that is predictive of
nodes in 𝑢’s neighborhood 𝑁𝑆(𝑢)
max
𝑓
𝑢 ∈𝑉
log Pr(𝑁𝑆(𝑢)| 𝑓 𝑢 )
Jure Leskovec, Stanford University 17
Unsupervised Feature
Learning
Goal: Find embedding 𝑓(𝑢) that predicts
nearby nodes 𝑁𝑆 𝑢 :
Assume conditional likelihood factorizes:
Then softmax:
Estimate 𝑓(𝑢) using stochastic gradient descent.18Jure Leskovec, Stanford University
How to determine 𝑁𝑆 𝑢
Two classic strategies to define a
neighborhood 𝑁𝑆 𝑢 of a given node 𝑢:
Jure Leskovec
Stanford University
jure@cs.stanford.edu
re careful
Recent re-
led to sig-
e features
ensitiveto
for learn-
e2vec, we
f features
ween net-
of node’s
u
s3
s2
s1
s4
s8
s9
s6
s7
s5
BFS
DFS
Figure1: BFSand DFSsearch strategies from node u (k = 3).
and edges. A typical solution involves hand-engineering domain-
specific featuresbased on expert knowledge. Even if onediscounts
the tedious work of feature engineering, such features are usually 19
𝑁 𝐵𝐹𝑆 𝑢 = { 𝑠1, 𝑠2, 𝑠3}
𝑁 𝐷𝐹𝑆 𝑢 = { 𝑠4, 𝑠5, 𝑠6}
Local microscopic view
Global macroscopic view
Jure Leskovec, Stanford University
BFS vs. DFS
BFS:
Micro-view of
neighbourhood
u
DFS:
Macro-view of
neighbourhood
20Jure Leskovec, Stanford University
Interpolating BFS and DFS
Biased random walk 𝑆 that given a
node 𝑢 generates neighborhood 𝑁𝑆 𝑢
 Two parameters:
 Return parameter 𝑝:
 Return back to the previous node
 In-out parameter 𝑞:
 Moving outwards (DFS) vs. inwards (BFS)
 Intuitively, 𝑞 is the “ratio” of BFS vs. DFS
21Jure Leskovec, Stanford University
Biased Random Walks
Biased 2nd-order random walks explore
network neighborhoods:
 Rnd. walk started at 𝑢 and is now at 𝑤
 Insight: Neighbors of 𝑤 can only be:
Idea: Remember where that walk came from
22
s1
s2
w
s3
u
Closer to 𝒖
Same distance to 𝒖
Farther from 𝒖
Jure Leskovec, Stanford University
Biased Random Walks
 Walker is at w. Where to go next?
 𝑝, 𝑞 model transition probabilities
 𝑝 … return parameter
 𝑞 … ”walk away” parameter
1
1/𝑞
1/𝑝
Jure Leskovec, Stanford University 23
1/𝑝, 1/𝑞, 1 are
unnormalized
probabilitiess1
s2
w
s3
u
Biased Random Walks
 Walker is at w. Where to go next?
 BFS-like walk: Low value of 𝑝
 DFS-like walk: Low value of 𝑞
𝑁𝑆(𝑢) are the nodes visited by the
Jure Leskovec, Stanford University 24
w →
s1
s2
s3
1/𝑝
1
1/𝑞
Unnormalized
transition prob.
1
1/𝑞
1/𝑝s1
s2
w
s3
u
node2vec algorithm
1) Simulate 𝑟 random walks of length 𝑙
starting from each node 𝑢
2) Optimize the node2vec objective using
Stochastic Gradient Descent
Linear-time complexity
All 3 steps are individually parallelizable
Jure Leskovec, Stanford University 25
Experiments: Micro vs. Macro
Network of character interactions in a
novel
Figure 3: Complementary visualizations of Les Misérables co-
appearance network generated by node2vec with label colors
reflecting homophily (top) and structural equivalence(bottom).
also exclude arecent approach, GraRep [6], that generalizes LINE
to incorporate information from network neighborhoods beyond 2-
hops, but does not scale and hence, provides an unfair comparison
withother neural embedding basedfeaturelearning methods. Apart
from spectral clustering which has aslightly higher time complex-
Algorithm Dataset
BlogCatalog PPI Wikip
Spectral Clustering 0.0405 0.0681 0.03
DeepWalk 0.2110 0.1768 0.12
LINE 0.0784 0.1447 0.11
node2vec 0.2581 0.1791 0.15
node2vec settings (p,q) 0.25, 0.25 4, 1 4, 0
Gain of node2vec [%] 22.3 1.3 21.8
Table 2: Macro-F1 scores for multilabel classification on
Catalog, PPI (Homo sapiens) and Wikipedia word coo
rencenetworkswith a balanced 50% train-test split.
labeled data with a grid search over p, q 2 { 0.25, 0.50, 1,
Under the above experimental settings, we present our resu
two tasks under consideration.
4.3 Multi-label classification
In the multi-label classification setting, every node is as
oneor morelabelsfrom afiniteset L . During thetraining pha
observe a certain fraction of nodes and all their labels. The
to predict the labels for the remaining nodes. This is a challe
task especially if L is large. We perform multi-label classifi
on thefollowing datasets:
• BlogCatalog [44]: This is a network of social relatio
of the bloggers listed on the BlogCatalog website. T
bels represent blogger interests inferred through the
dataprovidedby thebloggers. Thenetwork has10,312
333,983 edges and 39 different labels.
• Protein-Protein Interactions (PPI) [5]: We use a sub
of the PPI network for Homo Sapiens. The subgrap
responds to the graph induced by nodes for which we
𝑝 = 1, 𝑞 = 2
Microscopic view of the
network neighbourhood
𝑝 = 1, 𝑞 = 0.5
Macroscopic view of
the network
neighbourhood
26Jure Leskovec, Stanford University
Node Classification
Outperforms in all cases, beating
closest benchmark by up to 22%.
Jure Leskovec, Stanford University 27
BlogCatalog Wiki-POS
Spectral
Clustering
0.0405 0.0395
DeepWalk 0.2110 0.1274
LINE 0.0784 0.1164
node2vec 0.2581 0.1552
p, q 0.25, 0.25 4, 0.5
% gain 22.3 21.8
Macro-F1 score
Incomplete Network Data
28Jure Leskovec, Stanford University
Predictiveperformance
Predictiveperformance
Multi-Layer Networks
Extending node2vec to
multi-layer networks
Jure Leskovec, Stanford University 29
Predicting multicellular function through multi-layer tissue networks. M. Zitnik, J.
Leskovec. Bioinformatics, 33 (14): i190-i198, 2017.
Multi-Layer Networks
 Given layers 𝐺𝑖 and
hierarchy 𝑀
 Output: features
of nodes in layers
and in internal
levels of the hierarchy
 Aim to capture multilevel
hierarchical structure captured by 𝑀
30Jure Leskovec, Stanford University
Multi-Layer Networks
 For nodes in leaves 𝐺𝑖
use node2vec objective
 For internal hierarchy:
 𝑓𝑖(𝑢) in layer 𝑖 is close to 𝑓𝜋(𝑢) in parent 𝜋(𝑖)
31
Per-layer
node2vec
Hierarchical
dependencyJure Leskovec, Stanford University
Implications
 Nodes in different layers
representing the same entity/node
have the same features in hierarchy
ancestors
 We learn feature representations at
multiple scales:
 features of nodes in the layers
 features of nodes in non-leaves in the
Jure Leskovec, Stanford University 32
Application: Protein function
 Proteins are worker
molecules
 Understanding protein
function has great biomedical
and pharmaceutical
implications
 Function of proteins
depends on their tissue
context
[Greene et al., Nat Genet ‘15]
33
G1
G2
G3
G4
Jure Leskovec, Stanford University
Experiments: Biological Nets
107 genome-wide
tissue-specific
protein interaction
networks
 584 tissue-specific cellular functions
 Examples (tissue, cellular function):
 (renal cortex, cortex development)
 (artery, pulmonary artery morphogenesis)
34
FemaleReproductiveSystem
Choroid
Eye
NervousSystem
Placenta
Integument
Retina
Hindbrain
PancreaticIslet
Basophil
SpinalCord
Spermatid
EndocrineGland
ReproductiveSystem
ParietalLobe
Hepatocyte
CorpusCallosum
Pons
TemporalLobe
Pancreas
Oviduct
BloodPlasma
Lens
Glia
Jure Leskovec, Stanford University
Tissue Specific Prediction
42% improvement over
state-of-the-art baseline
Tissues
35Jure Leskovec, Stanford University
Brain Tissues
Frontal
lobe
Medulla
oblongata
PonsSubstantia
nigra
Midbrain
Parietal
lobe
Occipital
lobe
Temporal
lobe
Brainstem
Brain
Cerebellum
36
9 brain tissue PPI networks
in two-level hierarchy
Jure Leskovec, Stanford University
Embedding Brain Networks
 Do embeddings match anatomy?
37Jure Leskovec, Stanford University
node2vec: Summary
Task-independent feature learning in
networks:
 An explicit locality preserving
objective for feature learning
 Biased random walks capture
diversity of network patterns
 Scalable and robust algorithm
Jure Leskovec, Stanford University 38
A Different Setting
 So far: Node2vec
 Unsupervised (task-agnostic)
 Nodes have not attributes
 Next: GraphSage
 Supervised (task-specific)
 Nodes have attributes
 Text, image, etc.
39Jure Leskovec, Stanford University
40
GraphSAGE:
Supervised Feature
Learning
Jure Leskovec, Stanford University
Inductive Representation Learning on Large Graphs.
W. Hamilton, R. Ying, J. Leskovec. Neural Information Processing Systems (NIPS), 2017.
Representation Learning on Graphs: Methods and Applications.
W. Hamilton, R. Ying, J. Leskovec. IEEE Data Engineering Bulletin, 2017.
Idea: Convolutional Networks
CNN on an image:
Jure Leskovec, Stanford University 41
Goal is to generalize convolutions beyond simple lattices
Leverage node features/attributes (e.g., text, images)
From Images to Networks
Single CNN layer with 3x3 filter:
Jure Leskovec, Stanford University 42
Convolutional neural networks (on grids)
(Animation by
Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Image Graph
Transform information at the neighbors and combine
it
 Transform “messages” ℎ𝑖 from neighbors: 𝑊𝑖 ℎ𝑖
 Add them up: 𝑊 ℎ
Real-World Graphs
But what if your graphs look like this?
Jure Leskovec, Stanford University 43
or this:
Graph-structured data
…
Input
ReLU
What if our data looks like this?
or this:
Graph-structured data
What if our data looks like this?
or this:
 Examples:
Social networks, Information networks,
Knowledge graphs, Communication
networks, Web graph, …
A Naïve Approach
 Join adjacency matrix and features
 Feed them into a deep neural net:
 Issues with this idea:
 𝑂(𝑁) parameters
 Not applicable to graphs of different sizes
 Not invariant to node ordering
Jure Leskovec, Stanford University 44
A B C D E
A
B
C
D
E
0 1 1 1 0 1 0
1 0 0 1 1 0 0
1 0 0 1 0 0 1
1 1 1 0 1 1 1
0 1 0 1 0 1 0
Feat
A naïve approach
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters
• No inductive learning possible
?A
C
B
D
E
[A , X ]
Graph Convolutional
Networks
Graph Convolutional Networks:
Jure Leskovec, Stanford University 45
Problem: For a given subgraph how to
come with canonical node ordering
Learning convolutional neural networks for graphs. M. Niepert, M. Ahmed, K. Kutzkov ICML. 2016.
or this:
aph-structured data
… …
Input
Hidden layer Hidden la
ReLU
hat if our data looks like this?
Our Approach: GraphSAGE
Learn how to propagate information across
the graph to compute node features
46Jure Leskovec, Stanford University
Determine node
computation graph
Propagate and
transform information
𝑖
Idea: Node’s neighborhood defines a
computation graph
Semi-Supervised Classification with Graph Convolutional Networks. T. N. Kipf, M. Welling, ICLR 2017
Our Approach: GraphSAGE
Update for node 𝒊:
ℎ𝑖
(𝑘+1)
= 𝑅𝑒𝐿𝑈 𝑊 𝑘 ℎ𝑖
𝑘
,
𝑛∈𝒩 𝑖
𝑅𝑒𝐿𝑈(𝑄 𝑘 ℎ 𝑛
𝑘
)
 ℎ𝑖
0
= attributes of node 𝑖
 Σ ⋅ : Aggregator function (e.g., avg., LSTM, max-pooling)
47Jure Leskovec, Stanford
𝑖
Transform 𝑖’s own
features from level 𝑘
Transform and aggregate
features of neighbors 𝑛
𝑘 + 1 𝑡ℎ
level
features of node 𝑖
GraphSAGE: Example
Jure Leskovec, Stanford University 48
W(2)
Q(2)
Q(1)
W(1)
Q(1)
W(1)
Q(1)
W(1)
W(2) W(2) W(2)
Q(2) Q(2) Q(2)
Supervised training to identify parameters: W(k), Q(k)
W(1)
Q(1)
GraphSAGE: Benefits
 Can use different aggregators 𝛾
 Mean (simple element-wise mean), LSTM (to a
random order of nodes), Max-pooling (element-wise
max)
 Can use different loss functions:
 Cross entropy, Hinge loss, ranking loss
 Model has a constant number of parameters
 Fast scalable inference
 Can be applied to any node in any networkJure Leskovec, Stanford University 49
Application: Pinterest
Human curated collection of pins
Jure Leskovec, Stanford University 50
Pin:Avisual bookmark someone
has saved from the internet to a
board they’ve created.
Pin: Image, text, link
Board: A greater collection of ideas (pins having sth. in common)
Pinterest Graph
Graph: 2B pins, 1B boards, 17B edges
 Graph is dynamic: need to apply to
new nodes without model retraining
 Rich node features: content, image
Jure Leskovec, Stanford University 51
Q
Task: Item-Item Recs
Related Pin recommendations
 Given user is looking at pin Q, what
pin X are they going to save next:
Jure Leskovec, Stanford University 52
Query Positive Hard negativeRnd. negative
GraphSAGE Training
 Leverage inductive capability, and
train on individual subgraphs
 300 million nodes, 1 billion edges,
1.2 billion pin pairs (Q, X)
 Large batch size: 2048 per minibatch
Jure Leskovec, Stanford University 53
GraphSAGE: Inference
 Use MapReduce for
model inference
 Avoids repeated computation
Jure Leskovec, Stanford University 54
Experiments
Related Pin recommendations
 Given user is looking at pin Q, predict
what pin X are they going to save next
 Baselines for comparison
 Visual: VGG-16 visual features
 Annotation: Word2Vec model
 Combined: combine visual and annotation
 RW: Random-walk based algorithm
 GraphSAGE
 Setup: Embed 2B pins, perform nearest
neighbor to generate recommendations
Jure Leskovec, Stanford University 55
Results: Ranking
Task: Given Q, rank X as high as
possible among 2B pins
 Hit-rate: Pct. P was among top-k
 MRR: Mean reciprocal rank
Jure Leskovec, Stanford University 56
Method Hit-rate MRR
Visual 17% 0.23
Annotation 14% 0.19
Combined 27% 0.37
GraphSAGE 46% 0.56
Results: User Study
 User study: Which
recommendation do you prefer?
Jure Leskovec, Stanford University 57
Method Win Lose Draw Fraction of Wins
GraphSAGE vs.
Visual
26.7% 18.6% 54.7% 58.9%
GraphSAGE vs.
Annotation
28.4% 16.1% 55.5% 63.8%
GraphSAGE vs.
RW
32.2% 21.4% 46.4% 60.1%
Example Recommendations
Jure Leskovec, Stanford University 58
GS
GraphSAGE: Summary
 Graph Convolution Networks
 Generalize beyond simple convolutions
 Fuses node features & graph info
 State-of-the-art accuracy for node
classification and link prediction.
 Model size independent of graph size;
can scale to billions of nodes
 Largest embedding to date (3B nodes, 17B edges)
 Leads to significant performance
gains Jure Leskovec, Stanford University 59
Conclusion
Feature learning
for networks
Jure Leskovec, Stanford University 60
vecnode
𝑓: 𝑢 → ℝ 𝑑
ℝ 𝑑
Feature representation,
embedding
u
Conclusion
Results from the past 1-2 years have shown:
 Representation learning paradigm can be
extended to graphs
 No feature engineering necessary
 Can effectively combine node attribute data
with the network information
 State-of-the-art results in a number of
domains/tasks
 Use end-to-end training instead of
multi-stage approaches for better
performance Jure Leskovec, Stanford University 61
Conclusion
Next steps:
 Multimodal & dynamic/evolving settings
 Domain-specific adaptations
(e.g. for recommender systems)
 Graph generation
 Prediction beyond simple parwise edges
 Multi-hop edge prediction
 Theory
Jure Leskovec, Stanford University 62
63
PhD Students
Post-Doctoral Fellows
Funding
Collaborators
Industry Partnerships
Claire
Donnat
Mitchell
Gordon
David
Hallac
Emma
Pierson
Himabindu
Lakkaraju
Rex
Ying
Tim
Althoff
Will
Hamilton
David
Jurgens
Marinka
Zitnik
Michele
Catasta
Srijan
Kumar
Stephen
Bach
Rok
Sosic
Research
Staff
Peter
Kacin
Dan Jurafsky, Linguistics, Stanford University
Christian Danescu-Miculescu-Mizil, Information Science, Cornell University
Stephen Boyd, Electrical Engineering, Stanford University
David Gleich, Computer Science, Purdue University
VS Subrahmanian, Computer Science, University of Maryland
Sarah Kunz, Medicine, Harvard University
Russ Altman, Medicine, Stanford University
Jochen Profit, Medicine, Stanford University
Eric Horvitz, Microsoft Research
Jon Kleinberg, Computer Science, Cornell University
Sendhill Mullainathan, Economics, Harvard University
Scott Delp, Bioengineering, Stanford University
Jens Ludwig, Harris Public Policy, University of Chicago
Geet
Sethi
Jure Leskovec, Stanford University
64
Post-doc positions open!
Email us at jure@cs.stanford.edu
References
 node2vec: Scalable Feature Learning for Networks
A. Grover, J. Leskovec. KDD 2016.
 Predicting multicellular function through multi-layer
tissue networks. M. Zitnik, J. Leskovec. Bioinformatics,
2017.
 Inductive Representation Learning on Large Graphs.
W. Hamilton, R. Ying, J. Leskovec. NIPS 2017
 Representation Learning on Graphs: Methods and
Applications. W. Hamilton, R. Ying, J. Leskovec.
IEEE Data Engineering Bulletin, 2017.
 Code:
 http://snap.stanford.edu/node2vec
Jure Leskovec, Stanford University 65

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetupLiad Magen
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentationPawan Singh
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringSOYEON KIM
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 

Was ist angesagt? (20)

Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
 
Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Geometric Deep Learning
Geometric Deep Learning Geometric Deep Learning
Geometric Deep Learning
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Link prediction
Link predictionLink prediction
Link prediction
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Cnn
CnnCnn
Cnn
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 

Ähnlich wie Graph Representation Learning

240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Netwoks icml09
Netwoks icml09Netwoks icml09
Netwoks icml09zhangzhao
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social NetworksIJCSIS Research Publications
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachElena Sügis
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
Topology ppt
Topology pptTopology ppt
Topology pptboocse11
 
Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Nees Jan van Eck
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...Xi Wang
 
node2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxnode2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxssuser2624f71
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...TELKOMNIKA JOURNAL
 

Ähnlich wie Graph Representation Learning (20)

Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Netwoks icml09
Netwoks icml09Netwoks icml09
Netwoks icml09
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approach
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
 
TopologyPPT.ppt
TopologyPPT.pptTopologyPPT.ppt
TopologyPPT.ppt
 
node2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxnode2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptx
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...
 

Kürzlich hochgeladen

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Kürzlich hochgeladen (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

Graph Representation Learning

  • 3. Why Networks? Universal language for describing complex data  Networks from science, nature, and technology are more similar than one would expect Data availability (+computational challenges)  Web/mobile, bio, health, and medical Shared vocabulary between fields  Computer Science, Social science, Physics, Statistics, Biology Impact!  Social networking, Social media, Drug design Jure Leskovec, Stanford 3
  • 4. 4 Economic networksSocial networks Networks of neurons Information networks: Web & citations Biomedical networks Internet Many Data are Networks Jure Leskovec, Stanford University
  • 5. Networks: Common Language Jure Leskovec, Stanford University 5 Peter Mary Albert Tom co-worker friendbrothers friend Protein 1 Protein 2 Protein 5 Protein 9 Movie 1 Movie 3 Movie 2 Actor 3 Actor 1 Actor 2 Actor 4 |N|=4 |E|=4
  • 6. Tasks on Networks Classical ML tasks in networks:  Node classification  Predict a type of a given node  Link prediction  Predict whether two nodes are linked  Community detection  Identify densely linked clusters of nodes  Network similarity  How similar are two (sub)networks 6Jure Leskovec, Stanford University
  • 7. Example: Node Classification Many possible ways to create node features:  Node degree, PageRank score, motifs, … Jure Leskovec, Stanford University 7 ? ? ? ? ? Machine Learning
  • 8. Machine Learning Lifecycle 8 Network Data Node Features Learning Algorithm Model Downstream prediction task Feature Engineering Automatically learn the features (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time! Jure Leskovec, Stanford University
  • 9. Feature Learning in Graphs This talk: Feature learning for networks! Jure Leskovec, Stanford University 9 vectornode 𝑓: 𝑢 → ℝ 𝑑 ℝ 𝑑 Feature representation, embedding u
  • 10. Why Learn Embeddings? The goal is to map each node into a low-dimensional space  Distributed representation for nodes  Similarity between nodes indicates link strength  Encodes network information and generate node representation Jure Leskovec, Stanford University 10 What is network embedding? • We map each node in a network into a low- dimensional space – Distributed representation for nodes – Similarity between nodes indicate the link strength – Encode network information and generate node representation 17
  • 11. Example  Zachary’s Karate Club network: Jure Leskovec, Stanford University 11 Example • Zachary’s Karate Network: DeepWalk: Online Learning of Social Representations. B. Perozzi, R. Al-Rfou, S Skiena, KDD 2014.
  • 12. Why Is It Hard? Images have fixed 2D structure  Can define convolutions (CNNs) Jure Leskovec, Stanford University 12
  • 13. Why Is It Hard? Text and Speech have linear 1D structure  Can define sliding windows But graphs are non-Euclidean!  Graphs have arbitrary size  Node numbering is arbitrary (node isomorphism problem) Jure Leskovec, Stanford University 13 Graph-structur Input What if our data lo
  • 14. This Talk: Outline Feature Learning for networks: 1) “Linearizing” the graph  Create a “sentence” for each node using random walks  node2vec 2) Graph convolution networks  Propagate information between the nodes of the graph  GraphSAGE Jure Leskovec, Stanford University 14
  • 15. 15 node2vec: Unsupervised Feature Learning Jure Leskovec, Stanford University node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016. Predicting multicellular function through multi-layer tissue networks. M. Zitnik, J. Leskovec. Bioinformatics, 33 (14): i190-i198, 2017.
  • 16. Unsupervised Feature Learning  Intuition: Find embedding of nodes to 𝑑-dimensions that preserves similarity  Idea: Learn node embedding such that nearby nodes are close together  Given a node 𝑢, how do we define nearby nodes?  𝑁𝑆 𝑢 … neighbourhood of 𝑢 obtained by some strategy 𝑆 16Jure Leskovec, Stanford University
  • 17. Feature Learning as Optimization  Given 𝐺 = (𝑉, 𝐸)  Goal is to learn 𝑓: 𝑢 → ℝ 𝑑  where 𝑓 is a table lookup  We directly “learn” coordinates 𝑓(𝑢) of 𝑢  Given node 𝑢, we want to learn feature representation 𝑓(𝑢) that is predictive of nodes in 𝑢’s neighborhood 𝑁𝑆(𝑢) max 𝑓 𝑢 ∈𝑉 log Pr(𝑁𝑆(𝑢)| 𝑓 𝑢 ) Jure Leskovec, Stanford University 17
  • 18. Unsupervised Feature Learning Goal: Find embedding 𝑓(𝑢) that predicts nearby nodes 𝑁𝑆 𝑢 : Assume conditional likelihood factorizes: Then softmax: Estimate 𝑓(𝑢) using stochastic gradient descent.18Jure Leskovec, Stanford University
  • 19. How to determine 𝑁𝑆 𝑢 Two classic strategies to define a neighborhood 𝑁𝑆 𝑢 of a given node 𝑢: Jure Leskovec Stanford University jure@cs.stanford.edu re careful Recent re- led to sig- e features ensitiveto for learn- e2vec, we f features ween net- of node’s u s3 s2 s1 s4 s8 s9 s6 s7 s5 BFS DFS Figure1: BFSand DFSsearch strategies from node u (k = 3). and edges. A typical solution involves hand-engineering domain- specific featuresbased on expert knowledge. Even if onediscounts the tedious work of feature engineering, such features are usually 19 𝑁 𝐵𝐹𝑆 𝑢 = { 𝑠1, 𝑠2, 𝑠3} 𝑁 𝐷𝐹𝑆 𝑢 = { 𝑠4, 𝑠5, 𝑠6} Local microscopic view Global macroscopic view Jure Leskovec, Stanford University
  • 20. BFS vs. DFS BFS: Micro-view of neighbourhood u DFS: Macro-view of neighbourhood 20Jure Leskovec, Stanford University
  • 21. Interpolating BFS and DFS Biased random walk 𝑆 that given a node 𝑢 generates neighborhood 𝑁𝑆 𝑢  Two parameters:  Return parameter 𝑝:  Return back to the previous node  In-out parameter 𝑞:  Moving outwards (DFS) vs. inwards (BFS)  Intuitively, 𝑞 is the “ratio” of BFS vs. DFS 21Jure Leskovec, Stanford University
  • 22. Biased Random Walks Biased 2nd-order random walks explore network neighborhoods:  Rnd. walk started at 𝑢 and is now at 𝑤  Insight: Neighbors of 𝑤 can only be: Idea: Remember where that walk came from 22 s1 s2 w s3 u Closer to 𝒖 Same distance to 𝒖 Farther from 𝒖 Jure Leskovec, Stanford University
  • 23. Biased Random Walks  Walker is at w. Where to go next?  𝑝, 𝑞 model transition probabilities  𝑝 … return parameter  𝑞 … ”walk away” parameter 1 1/𝑞 1/𝑝 Jure Leskovec, Stanford University 23 1/𝑝, 1/𝑞, 1 are unnormalized probabilitiess1 s2 w s3 u
  • 24. Biased Random Walks  Walker is at w. Where to go next?  BFS-like walk: Low value of 𝑝  DFS-like walk: Low value of 𝑞 𝑁𝑆(𝑢) are the nodes visited by the Jure Leskovec, Stanford University 24 w → s1 s2 s3 1/𝑝 1 1/𝑞 Unnormalized transition prob. 1 1/𝑞 1/𝑝s1 s2 w s3 u
  • 25. node2vec algorithm 1) Simulate 𝑟 random walks of length 𝑙 starting from each node 𝑢 2) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity All 3 steps are individually parallelizable Jure Leskovec, Stanford University 25
  • 26. Experiments: Micro vs. Macro Network of character interactions in a novel Figure 3: Complementary visualizations of Les Misérables co- appearance network generated by node2vec with label colors reflecting homophily (top) and structural equivalence(bottom). also exclude arecent approach, GraRep [6], that generalizes LINE to incorporate information from network neighborhoods beyond 2- hops, but does not scale and hence, provides an unfair comparison withother neural embedding basedfeaturelearning methods. Apart from spectral clustering which has aslightly higher time complex- Algorithm Dataset BlogCatalog PPI Wikip Spectral Clustering 0.0405 0.0681 0.03 DeepWalk 0.2110 0.1768 0.12 LINE 0.0784 0.1447 0.11 node2vec 0.2581 0.1791 0.15 node2vec settings (p,q) 0.25, 0.25 4, 1 4, 0 Gain of node2vec [%] 22.3 1.3 21.8 Table 2: Macro-F1 scores for multilabel classification on Catalog, PPI (Homo sapiens) and Wikipedia word coo rencenetworkswith a balanced 50% train-test split. labeled data with a grid search over p, q 2 { 0.25, 0.50, 1, Under the above experimental settings, we present our resu two tasks under consideration. 4.3 Multi-label classification In the multi-label classification setting, every node is as oneor morelabelsfrom afiniteset L . During thetraining pha observe a certain fraction of nodes and all their labels. The to predict the labels for the remaining nodes. This is a challe task especially if L is large. We perform multi-label classifi on thefollowing datasets: • BlogCatalog [44]: This is a network of social relatio of the bloggers listed on the BlogCatalog website. T bels represent blogger interests inferred through the dataprovidedby thebloggers. Thenetwork has10,312 333,983 edges and 39 different labels. • Protein-Protein Interactions (PPI) [5]: We use a sub of the PPI network for Homo Sapiens. The subgrap responds to the graph induced by nodes for which we 𝑝 = 1, 𝑞 = 2 Microscopic view of the network neighbourhood 𝑝 = 1, 𝑞 = 0.5 Macroscopic view of the network neighbourhood 26Jure Leskovec, Stanford University
  • 27. Node Classification Outperforms in all cases, beating closest benchmark by up to 22%. Jure Leskovec, Stanford University 27 BlogCatalog Wiki-POS Spectral Clustering 0.0405 0.0395 DeepWalk 0.2110 0.1274 LINE 0.0784 0.1164 node2vec 0.2581 0.1552 p, q 0.25, 0.25 4, 0.5 % gain 22.3 21.8 Macro-F1 score
  • 28. Incomplete Network Data 28Jure Leskovec, Stanford University Predictiveperformance Predictiveperformance
  • 29. Multi-Layer Networks Extending node2vec to multi-layer networks Jure Leskovec, Stanford University 29 Predicting multicellular function through multi-layer tissue networks. M. Zitnik, J. Leskovec. Bioinformatics, 33 (14): i190-i198, 2017.
  • 30. Multi-Layer Networks  Given layers 𝐺𝑖 and hierarchy 𝑀  Output: features of nodes in layers and in internal levels of the hierarchy  Aim to capture multilevel hierarchical structure captured by 𝑀 30Jure Leskovec, Stanford University
  • 31. Multi-Layer Networks  For nodes in leaves 𝐺𝑖 use node2vec objective  For internal hierarchy:  𝑓𝑖(𝑢) in layer 𝑖 is close to 𝑓𝜋(𝑢) in parent 𝜋(𝑖) 31 Per-layer node2vec Hierarchical dependencyJure Leskovec, Stanford University
  • 32. Implications  Nodes in different layers representing the same entity/node have the same features in hierarchy ancestors  We learn feature representations at multiple scales:  features of nodes in the layers  features of nodes in non-leaves in the Jure Leskovec, Stanford University 32
  • 33. Application: Protein function  Proteins are worker molecules  Understanding protein function has great biomedical and pharmaceutical implications  Function of proteins depends on their tissue context [Greene et al., Nat Genet ‘15] 33 G1 G2 G3 G4 Jure Leskovec, Stanford University
  • 34. Experiments: Biological Nets 107 genome-wide tissue-specific protein interaction networks  584 tissue-specific cellular functions  Examples (tissue, cellular function):  (renal cortex, cortex development)  (artery, pulmonary artery morphogenesis) 34 FemaleReproductiveSystem Choroid Eye NervousSystem Placenta Integument Retina Hindbrain PancreaticIslet Basophil SpinalCord Spermatid EndocrineGland ReproductiveSystem ParietalLobe Hepatocyte CorpusCallosum Pons TemporalLobe Pancreas Oviduct BloodPlasma Lens Glia Jure Leskovec, Stanford University
  • 35. Tissue Specific Prediction 42% improvement over state-of-the-art baseline Tissues 35Jure Leskovec, Stanford University
  • 37. Embedding Brain Networks  Do embeddings match anatomy? 37Jure Leskovec, Stanford University
  • 38. node2vec: Summary Task-independent feature learning in networks:  An explicit locality preserving objective for feature learning  Biased random walks capture diversity of network patterns  Scalable and robust algorithm Jure Leskovec, Stanford University 38
  • 39. A Different Setting  So far: Node2vec  Unsupervised (task-agnostic)  Nodes have not attributes  Next: GraphSage  Supervised (task-specific)  Nodes have attributes  Text, image, etc. 39Jure Leskovec, Stanford University
  • 40. 40 GraphSAGE: Supervised Feature Learning Jure Leskovec, Stanford University Inductive Representation Learning on Large Graphs. W. Hamilton, R. Ying, J. Leskovec. Neural Information Processing Systems (NIPS), 2017. Representation Learning on Graphs: Methods and Applications. W. Hamilton, R. Ying, J. Leskovec. IEEE Data Engineering Bulletin, 2017.
  • 41. Idea: Convolutional Networks CNN on an image: Jure Leskovec, Stanford University 41 Goal is to generalize convolutions beyond simple lattices Leverage node features/attributes (e.g., text, images)
  • 42. From Images to Networks Single CNN layer with 3x3 filter: Jure Leskovec, Stanford University 42 Convolutional neural networks (on grids) (Animation by Vincent Dumoulin) Single CNN layer with 3x3 filter: Image Graph Transform information at the neighbors and combine it  Transform “messages” ℎ𝑖 from neighbors: 𝑊𝑖 ℎ𝑖  Add them up: 𝑊 ℎ
  • 43. Real-World Graphs But what if your graphs look like this? Jure Leskovec, Stanford University 43 or this: Graph-structured data … Input ReLU What if our data looks like this? or this: Graph-structured data What if our data looks like this? or this:  Examples: Social networks, Information networks, Knowledge graphs, Communication networks, Web graph, …
  • 44. A Naïve Approach  Join adjacency matrix and features  Feed them into a deep neural net:  Issues with this idea:  𝑂(𝑁) parameters  Not applicable to graphs of different sizes  Not invariant to node ordering Jure Leskovec, Stanford University 44 A B C D E A B C D E 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 Feat A naïve approach • Take adjacency matrix and feature matrix • Concatenate them • Feed them into deep (fully connected) neural net • Done? Problems: • Huge number of parameters • No inductive learning possible ?A C B D E [A , X ]
  • 45. Graph Convolutional Networks Graph Convolutional Networks: Jure Leskovec, Stanford University 45 Problem: For a given subgraph how to come with canonical node ordering Learning convolutional neural networks for graphs. M. Niepert, M. Ahmed, K. Kutzkov ICML. 2016. or this: aph-structured data … … Input Hidden layer Hidden la ReLU hat if our data looks like this?
  • 46. Our Approach: GraphSAGE Learn how to propagate information across the graph to compute node features 46Jure Leskovec, Stanford University Determine node computation graph Propagate and transform information 𝑖 Idea: Node’s neighborhood defines a computation graph Semi-Supervised Classification with Graph Convolutional Networks. T. N. Kipf, M. Welling, ICLR 2017
  • 47. Our Approach: GraphSAGE Update for node 𝒊: ℎ𝑖 (𝑘+1) = 𝑅𝑒𝐿𝑈 𝑊 𝑘 ℎ𝑖 𝑘 , 𝑛∈𝒩 𝑖 𝑅𝑒𝐿𝑈(𝑄 𝑘 ℎ 𝑛 𝑘 )  ℎ𝑖 0 = attributes of node 𝑖  Σ ⋅ : Aggregator function (e.g., avg., LSTM, max-pooling) 47Jure Leskovec, Stanford 𝑖 Transform 𝑖’s own features from level 𝑘 Transform and aggregate features of neighbors 𝑛 𝑘 + 1 𝑡ℎ level features of node 𝑖
  • 48. GraphSAGE: Example Jure Leskovec, Stanford University 48 W(2) Q(2) Q(1) W(1) Q(1) W(1) Q(1) W(1) W(2) W(2) W(2) Q(2) Q(2) Q(2) Supervised training to identify parameters: W(k), Q(k) W(1) Q(1)
  • 49. GraphSAGE: Benefits  Can use different aggregators 𝛾  Mean (simple element-wise mean), LSTM (to a random order of nodes), Max-pooling (element-wise max)  Can use different loss functions:  Cross entropy, Hinge loss, ranking loss  Model has a constant number of parameters  Fast scalable inference  Can be applied to any node in any networkJure Leskovec, Stanford University 49
  • 50. Application: Pinterest Human curated collection of pins Jure Leskovec, Stanford University 50 Pin:Avisual bookmark someone has saved from the internet to a board they’ve created. Pin: Image, text, link Board: A greater collection of ideas (pins having sth. in common)
  • 51. Pinterest Graph Graph: 2B pins, 1B boards, 17B edges  Graph is dynamic: need to apply to new nodes without model retraining  Rich node features: content, image Jure Leskovec, Stanford University 51 Q
  • 52. Task: Item-Item Recs Related Pin recommendations  Given user is looking at pin Q, what pin X are they going to save next: Jure Leskovec, Stanford University 52 Query Positive Hard negativeRnd. negative
  • 53. GraphSAGE Training  Leverage inductive capability, and train on individual subgraphs  300 million nodes, 1 billion edges, 1.2 billion pin pairs (Q, X)  Large batch size: 2048 per minibatch Jure Leskovec, Stanford University 53
  • 54. GraphSAGE: Inference  Use MapReduce for model inference  Avoids repeated computation Jure Leskovec, Stanford University 54
  • 55. Experiments Related Pin recommendations  Given user is looking at pin Q, predict what pin X are they going to save next  Baselines for comparison  Visual: VGG-16 visual features  Annotation: Word2Vec model  Combined: combine visual and annotation  RW: Random-walk based algorithm  GraphSAGE  Setup: Embed 2B pins, perform nearest neighbor to generate recommendations Jure Leskovec, Stanford University 55
  • 56. Results: Ranking Task: Given Q, rank X as high as possible among 2B pins  Hit-rate: Pct. P was among top-k  MRR: Mean reciprocal rank Jure Leskovec, Stanford University 56 Method Hit-rate MRR Visual 17% 0.23 Annotation 14% 0.19 Combined 27% 0.37 GraphSAGE 46% 0.56
  • 57. Results: User Study  User study: Which recommendation do you prefer? Jure Leskovec, Stanford University 57 Method Win Lose Draw Fraction of Wins GraphSAGE vs. Visual 26.7% 18.6% 54.7% 58.9% GraphSAGE vs. Annotation 28.4% 16.1% 55.5% 63.8% GraphSAGE vs. RW 32.2% 21.4% 46.4% 60.1%
  • 58. Example Recommendations Jure Leskovec, Stanford University 58 GS
  • 59. GraphSAGE: Summary  Graph Convolution Networks  Generalize beyond simple convolutions  Fuses node features & graph info  State-of-the-art accuracy for node classification and link prediction.  Model size independent of graph size; can scale to billions of nodes  Largest embedding to date (3B nodes, 17B edges)  Leads to significant performance gains Jure Leskovec, Stanford University 59
  • 60. Conclusion Feature learning for networks Jure Leskovec, Stanford University 60 vecnode 𝑓: 𝑢 → ℝ 𝑑 ℝ 𝑑 Feature representation, embedding u
  • 61. Conclusion Results from the past 1-2 years have shown:  Representation learning paradigm can be extended to graphs  No feature engineering necessary  Can effectively combine node attribute data with the network information  State-of-the-art results in a number of domains/tasks  Use end-to-end training instead of multi-stage approaches for better performance Jure Leskovec, Stanford University 61
  • 62. Conclusion Next steps:  Multimodal & dynamic/evolving settings  Domain-specific adaptations (e.g. for recommender systems)  Graph generation  Prediction beyond simple parwise edges  Multi-hop edge prediction  Theory Jure Leskovec, Stanford University 62
  • 63. 63 PhD Students Post-Doctoral Fellows Funding Collaborators Industry Partnerships Claire Donnat Mitchell Gordon David Hallac Emma Pierson Himabindu Lakkaraju Rex Ying Tim Althoff Will Hamilton David Jurgens Marinka Zitnik Michele Catasta Srijan Kumar Stephen Bach Rok Sosic Research Staff Peter Kacin Dan Jurafsky, Linguistics, Stanford University Christian Danescu-Miculescu-Mizil, Information Science, Cornell University Stephen Boyd, Electrical Engineering, Stanford University David Gleich, Computer Science, Purdue University VS Subrahmanian, Computer Science, University of Maryland Sarah Kunz, Medicine, Harvard University Russ Altman, Medicine, Stanford University Jochen Profit, Medicine, Stanford University Eric Horvitz, Microsoft Research Jon Kleinberg, Computer Science, Cornell University Sendhill Mullainathan, Economics, Harvard University Scott Delp, Bioengineering, Stanford University Jens Ludwig, Harris Public Policy, University of Chicago Geet Sethi Jure Leskovec, Stanford University
  • 64. 64 Post-doc positions open! Email us at jure@cs.stanford.edu
  • 65. References  node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016.  Predicting multicellular function through multi-layer tissue networks. M. Zitnik, J. Leskovec. Bioinformatics, 2017.  Inductive Representation Learning on Large Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS 2017  Representation Learning on Graphs: Methods and Applications. W. Hamilton, R. Ying, J. Leskovec. IEEE Data Engineering Bulletin, 2017.  Code:  http://snap.stanford.edu/node2vec Jure Leskovec, Stanford University 65