SlideShare a Scribd company logo
1 of 74
Download to read offline
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
http://www.catehuston.com/blog/2009/11/02/touchgraph/
Hadoop MapReduce デザ
インパターン
——MapReduceによる大規
模テキストデータ処理

1 Jimmy Lin, Chris Dyer�著、神
  林 飛志、野村 直之�監修、玉川
  竜司�訳
2 2011年10月01日 発売予定
3 210ページ
4 定価2,940円
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Shuffle &
     barrier




    job start/
     shutdown
i                i+1
Large-Scale Graph Processing〜Introduction〜(完全版)
1
        B                   E

    5           1
                        4
A                   D               G
        3
            3           2
                                4

        C           5       F
5               1
            B                   E
    5               1
                        3   4
A                       D               G
            3
                3           2                       5!4               min(6,4)
                                    4                             1
                                                     B                     E
            C           5       F
                                            5                 1
                        i                                         3   4
                                        A                         D                G
                                                    3
                                                          3            2
                                                                               4
                                                3                          2
                                                    C             5        F

                                                              i+1
a super step




         http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
.
.
.
a super step
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
a super step
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
1
        B                    E
    5            1
                         4
A                    D               G
        3
             3           2
                                 4

        C            5       F

            initialize
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathMapper(Mapper)
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for neighbour_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(
                             node_id, neighbour_node_id )
      emit neighbour_node_id, dist + dist_to_nbr
class ShortestPathReducer(Reducer):
    def reduce(self, node_id, dist_list):
      min_dist = sys.maxint
      for dist in dist_list:
        # dist_list contains a Node
        if is_node(dist):
          Node = dist
        elif dist < min_dist:
          min_dist = dist
      Node.set_value(min_dist)
"    emit node_id, Node
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
# In-Mapper Combiner
class ShortestPathMapper(Mapper):
  def __init__(self):
     self.buffer = {}

  def check_and_put(self, key, value):
    if key not in self.buffer or value < self.buffer[key]:
      self.buffer[key] = value

  def check_and_emit(self):
    if is_exceed_limit_buffer_size(self.buffer):
      for key, value in self.buffer.items():
         emit key, value
      self.buffer = {}

  def close(self):
    for key, value in self.buffer.items():
      emit key, value
#...continue
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for nbr_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(node_id, nbr_node_id)
      dist_nbr = dist + dist_to_nbr
      check_and_put(nbr_node_id, dist_nbr)
      check_and_emit()
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
# Shimmy trick
class ShortestPathReducer(Reducer):
  def __init__(self):
    P.open_graph_partition()


  def emit_precede_node(self, node_id):
    for pre_node_id, Node in P.read():
      if node_id == pre_node_id:
        return Node
      else:
        emit pre_node_id, Node
#(...continue)
  def reduce(node_id, dist_list):
    Node = self.emit_precede_node(node_id)
    min_dist = sys.maxint
    for dist in dist_list:
      if dist < min_dist:
        min_dist = dist
    Node.set_value(min_dist)
    emit node_id, Node
Large-Scale Graph Processing〜Introduction〜(完全版)
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathVertex:
  def compute(self, msgs):
    min_dist = 0 if self.is_source() else sys.maxint;
    # get values from all incoming edges.
    for msg in msgs:
      min_dist = min(min_dist, msg.get_value())
    if min_dist < self.get_value():
      # update current value(state).
   " self.set_current_value(min_dist)
      # send new value to outgoing edge.
      out_edge_iterator = self.get_out_edge_iterator()
      for out_edge in out_edge_iterator:
        recipient =
            out_edge.get_other_element(self.get_id())
        self.send_massage(recipient.get_id(),
                             min_dist + out_edge.get_distance() )
    self.vote_to_halt()
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Pregel
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Science and Technology), South Korea             edwardyoon@apache.org                  Science and Technology), South Korea
          swseo@calab.kaist.ac.kr                                                                jaehong@calab.kaist.ac.kr

           Seongwook Jin                                 Jin-Soo Kim                                   Seungryoul Maeng
     Computer Science Division       School of Information and Communication      Computer Science Division
KAIST (Korea Advanced Institute of    Sungkyunkwan University, South Korea   KAIST (Korea Advanced Institute of
Science and Technology), South Korea            jinsookim@skku.edu           Science and Technology), South Korea
       swjin@calab.kaist.ac.kr                                                      maeng@calab.kaist.ac.kr



   Abstract—APPLICATION. Various scientific computations                                    HAMA API
have become so complex, and thus computation tools play an                       HAMA Core                 HAMA Shell
important role. In this paper, we explore the state-of-the-art
framework providing high-level matrix computation primitives                                                            Computation Engine
with MapReduce through the case study approach, and demon-              MapReduce            BSP            Dryad       (Plugged In/Out)
strate these primitives with different computation engines to
show the performance and scalability. We believe the opportunity                           Zookeeper                    Distributed Locking
for using MapReduce in scientific computation is even more
promising than the success to date in the parallel systems
literature.                                                              HBase
                                                                                                                        Storage Systems
                                                                             HDFS                       RDBMS
                      I. I NTRODUCTION                                                        File

   As cloud computing environment emerges, Google has
                                                                                 Fig. 1.    The overall architecture of HAMA.
introduced the MapReduce framework to accelerate parallel
                                                                                                 http://wiki.apache.org/hama/Articles
and distributed computing on more than a thousand of in-
expensive machines. Google has shown that the MapReduce
framework is easy to use and provides massive scalability             HAMA is a distributed framework on Hadoop for massive
with extensive fault tolerance [2]. Especially, MapReduce fits      matrix and graph computations. HAMA aims at a power-
well with complex data-intensive computations such as high-        ful tool for various scientific applications, providing basic
dimensional scientific simulation, machine learning, and data       primitives for developers and researchers with simple APIs.
mining. Google and Yahoo! are known to operate dedicated           HAMA is currently being incubated as one of the subprojects
clusters for MapReduce applications, each cluster consisting       of Hadoop by the Apache Software Foundation [10].
of several thousands of nodes. One of typical MapReduce               Figure 1 illustrates the overall architecture of HAMA.
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)

More Related Content

Viewers also liked

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 

Viewers also liked (8)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 

More from Takahiro Inoue

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2Takahiro Inoue
 

More from Takahiro Inoue (20)

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB Oplog入門
MongoDB Oplog入門MongoDB Oplog入門
MongoDB Oplog入門
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2
 

Recently uploaded

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Recently uploaded (20)

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Large-Scale Graph Processing〜Introduction〜(完全版)

  • 4. Hadoop MapReduce デザ インパターン ——MapReduceによる大規 模テキストデータ処理 1 Jimmy Lin, Chris Dyer�著、神 林 飛志、野村 直之�監修、玉川 竜司�訳 2 2011年10月01日 発売予定 3 210ページ 4 定価2,940円
  • 8. Shuffle & barrier job start/ shutdown i i+1
  • 10. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F
  • 11. 5 1 B E 5 1 3 4 A D G 3 3 2 5!4 min(6,4) 4 1 B E C 5 F 5 1 i 3 4 A D G 3 3 2 4 3 2 C 5 F i+1
  • 12. a super step http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
  • 13. . . .
  • 24. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F initialize
  • 25. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 26. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 27. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 28. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 29. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 30. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 31. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 2
  • 32. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 33. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 34. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 3
  • 35. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 36. class ShortestPathMapper(Mapper) def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for neighbour_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance( node_id, neighbour_node_id ) emit neighbour_node_id, dist + dist_to_nbr
  • 37. class ShortestPathReducer(Reducer): def reduce(self, node_id, dist_list): min_dist = sys.maxint for dist in dist_list: # dist_list contains a Node if is_node(dist): Node = dist elif dist < min_dist: min_dist = dist Node.set_value(min_dist) " emit node_id, Node
  • 42. # In-Mapper Combiner class ShortestPathMapper(Mapper): def __init__(self): self.buffer = {} def check_and_put(self, key, value): if key not in self.buffer or value < self.buffer[key]: self.buffer[key] = value def check_and_emit(self): if is_exceed_limit_buffer_size(self.buffer): for key, value in self.buffer.items(): emit key, value self.buffer = {} def close(self): for key, value in self.buffer.items(): emit key, value
  • 43. #...continue def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for nbr_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance(node_id, nbr_node_id) dist_nbr = dist + dist_to_nbr check_and_put(nbr_node_id, dist_nbr) check_and_emit()
  • 48. # Shimmy trick class ShortestPathReducer(Reducer): def __init__(self): P.open_graph_partition() def emit_precede_node(self, node_id): for pre_node_id, Node in P.read(): if node_id == pre_node_id: return Node else: emit pre_node_id, Node
  • 49. #(...continue) def reduce(node_id, dist_list): Node = self.emit_precede_node(node_id) min_dist = sys.maxint for dist in dist_list: if dist < min_dist: min_dist = dist Node.set_value(min_dist) emit node_id, Node
  • 51. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 52. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 53. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 54. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 55. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 56. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 57. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 58. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 59. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 60. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 61. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 62. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 63. class ShortestPathVertex: def compute(self, msgs): min_dist = 0 if self.is_source() else sys.maxint; # get values from all incoming edges. for msg in msgs: min_dist = min(min_dist, msg.get_value()) if min_dist < self.get_value(): # update current value(state). " self.set_current_value(min_dist) # send new value to outgoing edge. out_edge_iterator = self.get_out_edge_iterator() for out_edge in out_edge_iterator: recipient = out_edge.get_other_element(self.get_id()) self.send_massage(recipient.get_id(), min_dist + out_edge.get_distance() ) self.vote_to_halt()
  • 72. Science and Technology), South Korea edwardyoon@apache.org Science and Technology), South Korea swseo@calab.kaist.ac.kr jaehong@calab.kaist.ac.kr Seongwook Jin Jin-Soo Kim Seungryoul Maeng Computer Science Division School of Information and Communication Computer Science Division KAIST (Korea Advanced Institute of Sungkyunkwan University, South Korea KAIST (Korea Advanced Institute of Science and Technology), South Korea jinsookim@skku.edu Science and Technology), South Korea swjin@calab.kaist.ac.kr maeng@calab.kaist.ac.kr Abstract—APPLICATION. Various scientific computations HAMA API have become so complex, and thus computation tools play an HAMA Core HAMA Shell important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives Computation Engine with MapReduce through the case study approach, and demon- MapReduce BSP Dryad (Plugged In/Out) strate these primitives with different computation engines to show the performance and scalability. We believe the opportunity Zookeeper Distributed Locking for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature. HBase Storage Systems HDFS RDBMS I. I NTRODUCTION File As cloud computing environment emerges, Google has Fig. 1. The overall architecture of HAMA. introduced the MapReduce framework to accelerate parallel http://wiki.apache.org/hama/Articles and distributed computing on more than a thousand of in- expensive machines. Google has shown that the MapReduce framework is easy to use and provides massive scalability HAMA is a distributed framework on Hadoop for massive with extensive fault tolerance [2]. Especially, MapReduce fits matrix and graph computations. HAMA aims at a power- well with complex data-intensive computations such as high- ful tool for various scientific applications, providing basic dimensional scientific simulation, machine learning, and data primitives for developers and researchers with simple APIs. mining. Google and Yahoo! are known to operate dedicated HAMA is currently being incubated as one of the subprojects clusters for MapReduce applications, each cluster consisting of Hadoop by the Apache Software Foundation [10]. of several thousands of nodes. One of typical MapReduce Figure 1 illustrates the overall architecture of HAMA.