SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
Hadoop



   Preferred Infrastructure

     20   8    25
•
                                                     (      NTT      Preferred Infras-

     tructure(      Preferred Infrastructure )
                      NTT                 Preferred Infrastructure


                        NTT




 •
     Preferred Infrastructure

     NTT
 •

     Preferred Infrastructure: E-mail: info@preferred.jp
     NTT                                   E-mail: pr@nttr.co.jp


Copyright c      NTT Resonant Inc. 2008




                                                     i
2008   8   25




                ii
1




1                                                                                                                         8

1.1            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   8

2      Hadoop                                                                                                             9


3      GFS       HDFS                                                                                                    10

3.1    GFS               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
      3.1.1            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

      3.1.2                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
      3.1.3     HDFS               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

      3.3.1                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
      3.3.2                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

      3.3.3                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
      3.3.4                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

      3.3.5                                                                . . . . . . . . . . . . . . . . . . . . . . 14

      3.3.6                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
      3.3.7                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

      3.3.8                                      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
      3.3.9                                      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

      3.3.10                                             . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
      3.3.11                             . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

      3.3.12                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
      3.3.13                         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

      3.3.14                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
      3.3.15                         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

      3.3.16                                                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

      3.4.1                                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2                                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
      3.4.3                             . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

      3.4.4               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

      3.5.1                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

      3.6.1                                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
      3.6.2                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

      3.6.3                                                       . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
      3.6.4                                                                    . . . . . . . . . . . . . . . . . . . . 22

      3.6.5                                                            . . . . . . . . . . . . . . . . . . . . . . . . 22
      3.6.6                                               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

      3.6.7                                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
      3.6.8                                                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

      3.6.9                                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
      3.6.10                       (Read-Only                  ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


4      Google MapReduce          Hadoop MapReduce                                                                      26
4.1    Google MapReduce                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

      4.1.1           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
      4.1.2                       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

      4.1.3     Hadoop MapReduce                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
      4.3.1     MapReduce                         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

      4.3.2                                       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

      4.3.3     Shuffle                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
      4.3.4     Map         Reduce                                . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

      4.3.5     Map                       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
      4.3.6                           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

      4.3.7                                       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
      4.3.8                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

      4.3.9                           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

      4.4.1                                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
      4.4.2                                                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

                                                           2
4.4.3                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

      4.5.1    Combine               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
      4.5.2                                                            . . . . . . . . . . . . . . . . . . . . . . . 34

      4.5.3    Map                               Shuffle         . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
      4.5.4                          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

      4.5.5    Map                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

      4.6.1                                                            . . . . . . . . . . . . . . . . . . . . . . . 35
      4.6.2                                                                                 . . . . . . . . . . . . 36

      4.6.3                                                                             . . . . . . . . . . . . . . 36
      4.6.4                                                            . . . . . . . . . . . . . . . . . . . . . . . 37

      4.6.5                                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37


5                                                                                                                    38

5.1                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2    org.apache.hadoop.util . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

      5.2.1    MergeSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
      5.2.2    PriorityQueue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

      5.2.3    ReflectionUtils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
      5.2.4    RunJar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

      5.2.5    Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3    org.apache.hadoop.io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

      5.3.1    Writable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
      5.3.2    SequenceFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

      5.3.3    compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4    org.apache.hadoop.ipc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
      5.4.1    VersionedProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

      5.4.2    RPC, Server, Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5    org.apache.hadoop.net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

      5.5.1    DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
      5.5.2    Node, NodeBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

      5.5.3    NetworkTopology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.6    org.apache.hadoop.fs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

      5.6.1    FileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
      5.6.2    LocalFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

                                                          3
5.6.3    InMemoryFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
      5.6.4    FSOutputSummer, FSInputStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

      5.6.5    Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
      5.6.6    Trash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

      5.6.7    FileUtil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
      5.6.8    FsShell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

      5.6.9    DU, DF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.7    org.apache.hadoop.dfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

      5.7.1    ClientProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
      5.7.2    DatanodeProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

      5.7.3    NamenodeProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
      5.7.4    DistributedFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

      5.7.5    DFSClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
      5.7.6    DataNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

      5.7.7    NameNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
      5.7.8    FSNamesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

      5.7.9    FSImage, FSEditLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

      5.7.10 ReplicationTargetChooser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
      5.7.11 SecondaryNameNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

      5.7.12 Balancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
      5.7.13 NamenodeFsck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.8    org.apache.hadoop.mapred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
      5.8.1    JobConf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

      5.8.2    InputFormat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
      5.8.3    OutputFormat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

      5.8.4    JobClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
      5.8.5    JobTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

      5.8.6    TaskTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
      5.8.7    StatusHttpServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.9           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


6                                                                                                                    52
6.1                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

      6.1.1                          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2    HDFS                    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

      6.2.1                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
      6.2.2                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

                                                          4
6.3                            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
      6.3.1                    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

      6.3.2                      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58


7                                                                                                                    59


                                                                                                                     60




                                                          5
2.1    Google, OSS                              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   9


5.1                                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2                                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3                          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4                                                               . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.5                                                              . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6    JobConf           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.7    JobConf                                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48


6.1    bonnie++                       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2    1G * 100                               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.3    1G * 100                  (                       (MB) /        ) . . . . . . . . . . . . . . . . . . . . . . . 53
6.4    1G * 100                                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.5    1G * 100                        (                           (MB) /       ) . . . . . . . . . . . . . . . . . . . 54
6.6    100G                                    (randomwriter.conf) . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.7    100G                                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.8    100G              (       /          ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.9    100G              (                         (MB) /        ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.10                                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.11   100G          (       /       ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.12   100G          (                        (MB) /        ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57




                                                             6
3.1   Google File System                     Hadoop                 . . . . . . . . . . . . . . . . . . . . . . 11


4.1   Google MapReduce                       Hadoop                 . . . . . . . . . . . . . . . . . . . . . . 27


5.1                         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.1                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52




                                                     7
1




      1




1.1
                                        Hadoop[4]


        2        Hadoop

 3, 4                 Google             Google File System[10]   MapReduce[9]
        Hadoop

 5          Hadoop
 6                   Hadoop

 7
                      Hadoop   0.16.4




                                            8
2   Hadoop




    2

Hadoop

 Hadoop         Yahoo! Inc.    Doug Cutting
                        Lucene[8]                                                      Lucene

                                                                      Hadoop   Google
     Google File System(       GFS)       MapReduce

 Hadoop   HDFS Hadoop Distributed File System             Hadoop MapReduce Framework
Google                                    GFS             MapReduce            2.1

     BigTable               hBase




                                       2.1 Google, OSS



 Hadoop          Java                      MapReduce                           Java

                           Hadoop Streaming[5]                                 C/C++ Ruby Python

                                    MapReduce




                                                      9
3    GFS   HDFS




       3

GFS               HDFS

                              GFS   HDFS   GFS
Hadoop



3.1 GFS
 GFS                   [10]


3.1.1

 GFS              PC

   •

             TB
   •

        PC

   •




3.1.2

 GFS                                                   64MB
             PC



 GFS
    3


   •
   •

   •

                                     10
3.2.                                                                    3   GFS    HDFS




  GFS




  GFS                                                                             GFS




3.1.3 HDFS

  HDFS                     GFS                        HDFS

             NameNode DataNode                               HDFS




3.2
       3.1    GFS                           Hadoop                                 HDFS




                         3.1: Google File System               Hadoop


                                                                        Hadoop




                                                     11
3.3.                                               3   GFS   HDFS


                                                   Hadoop




                           (Read-Only          )




3.3

3.3.1




Hadoop

                NameNode            NameNode




  DFSClient::mkdirs




                                        12
3.3.                                  3   GFS   HDFS



3.3.2




Hadoop

                NameNode   NameNode




  DFSClient::delete


3.3.3




Hadoop

                NameNode   NameNode




  DFSClient::create


3.3.4




Hadoop

                NameNode   NameNode




  DFSClient::delete




                             13
3.3.                                                                        3   GFS   HDFS



3.3.5



                                                         (5.6.6)




Hadoop

                   delete                                          /trash
                                       /trash




  NameNode.emptier


3.3.6




Hadoop

  DFSInputStream                          (5.7.5)           NameNode
                            DataNode                                              1




  DFSInputStream::read


3.3.7




Hadoop

  DFSOutputStream                           (5.7.5) NameNode

                                                               DataNode




                                                    14
3.3.                                                 3   GFS   HDFS




  DFSOutputStream::writeChunk


3.3.8




Hadoop

  DFSInputStream                  (5.7.5) NameNode

         DataNode                    NameNode




  DFSInputStream::read


3.3.9




Hadoop




3.3.10




Hadoop

  DFSClient   NameNode          NameNode




                                           15
3.3.                                                                 3    GFS   HDFS




    DFSClient::rename


3.3.11




Hadoop

    DFSClient       NameNode           NameNode




    DFSClient::listPaths


3.3.12




Hadoop



                        whoami                      bash -c groups

(               )                                      (bin/hadoop dfs)




    DFSClient::getFileInfo




    http://hadoop.apache.org/core/docs/current/hdfs permissions guide.html




                                               16
3.3.                                                               3       GFS   HDFS



3.3.13




Hadoop

                                                                                        whoami

                           bash -c groups                              (           )
                               (bin/hadoop dfs)




  DFSClient::getFileInfo




  http://hadoop.apache.org/core/docs/current/hdfs permissions guide.html


3.3.14




Hadoop




  HADOOP-1700




  http://issues.apache.org/jira/browse/HADOOP-1700




                                              17
3.4.                                                                           3   GFS      HDFS



3.3.15



                                     (                             )


Hadoop

  FSImage         NameNode




  FSImage


3.3.16




Hadoop

  HeartBeat
  DataNode                    NameNode        HeartBeat                     NameNode       DataNode

              HeartBeat                                         DataNode                           HeartBeat
                                     ”dfs.heartbeat.interval”                          3




  DataNode::offerService → NameNode::sendHeartbeat



3.4

3.4.1




Hadoop

  DataNode                NameNode       HeartBeat                                 NameNode        DataNode
                                                                 DataNode


                                                     18
3.4.                                                                      3   GFS   HDFS




  DataNode::offerService → NameNode::sendHeartbeat


3.4.2




Hadoop



  ./bin/hadoop/start-balancer.sh

                     DF::run             (5.6.9)         df
        DataNode         DF                                   HeartBeat                    NameNode




  Balancer




  https://issues.apache.org/jira/browse/HADOOP-1652


3.4.3




Hadoop




  FSNameSystem::checkPermission → PermissionChecker::checkPermission




                                                   19
3.5.                                                                   3    GFS   HDFS



3.4.4



                                      RPC


Hadoop

  Hadoop                              Commons-Logging([2])

                                   log4j




           (org.apache.commons.logging.*)



3.5

3.5.1




Hadoop



                                             (5.5.2, 5.5.3)                   ”dfs.network.scripts”




  FSNameSystem::getBlockLocations → NetworkTopology::pseudoSortByDistance




  http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf




                                               20
3.6.                                                3   GFS   HDFS




3.6

3.6.1




Hadoop

                     id                  NameNode                          id




  Block
  FSNamesystem::allocateBlock


3.6.2




Hadoop

               ”io.bytes.per.checksum”                   (           512
  )

                                                                NameNode




  DFSOutputStream
  DFSInputStream




                                          21
3.6.                                                                3   GFS   HDFS



3.6.3




Hadoop




3.6.4




Hadoop



                                      write

                      5.7.10




  FSNameSystem::pendingTransfers
  ReplicationTargetChooser(5.7.10)




  http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf


3.6.5



              2




                                              22
3.6.                                                      3   GFS    HDFS



Hadoop

  GFS                               DataNode

DataNode                          DataNode     DataNode




  DFSOutputStream


3.6.6




Hadoop

  RPC                         (5.4.2)

                    (5.7.5)




  RPC
  DFSOutputStream

  DFSInputStream


3.6.7




Hadoop

  FSEditLog        (5.7.9)                                          dfs.name.dir




  FSImage
  FSEditLog

                                        23
3.6.                                                          3     GFS   HDFS



3.6.8




Hadoop

                  SecondaryNameNode     NameNode

   (5.7.11) NameNode   SecondaryNameNode


  SecondaryNameNode                                NameNode       SecondaryNameNode




  SecondaryNameNode::run


3.6.9




Hadoop

  NameNode
                       FSImage          loadFSImage

(5.7.9)




  FSImage::loadFSImage → FSEditLog::loadFSEdits


3.6.10                     (Read-Only                  )




                                                  24
3.7.                                               3   GFS   HDFS



Hadoop

          SecondaryNameNode




3.7
  HDFS     GFS

       0.16.4                               0.19




  HDFS          DataNode
NameNode                         NameNode                    NameNode


                      NameNode




                                 25
4   Google MapReduce    Hadoop MapReduce




       4

Google MapReduce                     Hadoop MapReduce



4.1 Google MapReduce
 Google MapReduce              [9]


4.1.1

 Google MapReduce     PC


   • MapReduce
        MapReduce                                             PC

                           3

           – Map


           – Shuffle


           – Reduce


                                                      MapReduce


   •
        GFS

   •




                                      26
4.2.                                                          4       Google MapReduce     Hadoop MapReduce



4.1.2

  Google MapReduce                                     M
       Map                            M              Map                  Map                    PC


  Map                      Map                                    R

                  Reduce                         R              Reduce                  Reduce                PC


             M                      Reduce                                                            R
                              MR                     Reduce                    Reduce                     M

                                                                                           Reduce
  Google MapReduce                                                                      MapReduce

                                             MapReduce                    Map           Reduce
                                                     Map              Reduce


4.1.3 Hadoop MapReduce

  Hadoop MapReduce            Google MapReduce                                                   Hadoop MapReduce

                                             JobTracker, TaskTracker
                           Hadoop                                                                     HadoopStream-

ing                                                                    MapReduce
                                        HadoopStreaming[5]                                             MapReduce




4.2
       4.1       Google MapReduce                                      Hadoop

Hadoop MapReduce


                              4.1: Google MapReduce                        Hadoop


                                                                                             Hadoop

                    MapReduce


                    Shuffle

                    Map          Reduce
                    Map


                                                           27
4.3.                                        4   Google MapReduce    Hadoop MapReduce


                                                                      Hadoop




                Combine


                Map                Shuffle


                Map




4.3

4.3.1 MapReduce



  MapReduce


Hadoop

  Hadoop        Java   MapReduce                                   HadoopStreaming
                                   MapReduce




  JobClient

  JobTracker
  TaskTracker



                                           28
4.3.                                                 4     Google MapReduce     Hadoop MapReduce



4.3.2




Hadoop

  HeartBeat

  TaskTracker                 JobTracker      HeartBeat                    HeartBeat
                          (        /50 + 1)     (JobTracker::getNextHeartbeatInterval)                    5




  TaskTracker::transmitHeartBeat → JobTracker::heartbeat


4.3.3 Shuffle



                                 Reducer         (Shuffle)           Hash        Shuffle

                  Shuffle                                                         Shuffle


Hadoop

  JobConf.setPartitioner                                   HashPartitioner, KeyFieldBasedPartitioner
        HashPartitioner                                  Reducer               KeyFieldBasedPartitioner




  HashPartitioner
  KeyFieldBasedPartitioner


4.3.4 Map              Reduce



  Map         Reduce




                                                   29
4.3.                                        4   Google MapReduce   Hadoop MapReduce



Hadoop

  MapReduce                      JobConf.setNumReduceTasks, JobConf.setNumMapTasks

(5.8.1)




  JobConf::setNumMapTasks
  JobConf::setNumReduceTasks


4.3.5 Map



                               Map


Hadoop

  InputSplit                         Map                     (5.8.5) InputSplit




  InputSplit


4.3.6



  MapReduce


Hadoop




  JobConf::setJobPriority


4.3.7



  MapReduce

                                           30
4.3.                                                  4   Google MapReduce   Hadoop MapReduce



Hadoop

    JobConf.setInputFormat, JobConf.setOutputFormat         MapReduce                        (5.8.1)

                                                                 TextInputFormat key-value     1       1
              SequenceFileAsTextInputFormat                                          key-value         1

1                  TextOutputFormat




    InputFormat
    OutputFormat

    TextInputFormat
    TextOutputFormat

    SequenceFile


4.3.8




Hadoop

    Task           Counter enumeration                                        1    key-value



       • MAP INPUT RECORDS, - Map

       • MAP OUTPUT RECORDS, - Map

       • MAP INPUT BYTES, - Map
       • MAP OUTPUT BYTES, - Map

       • COMBINE INPUT RECORDS, - Combine
       • COMBINE OUTPUT RECORDS, - Combine

       • REDUCE INPUT GROUPS, - Reduce
       • REDUCE INPUT RECORDS, - Reduce

       • REDUCE OUTPUT RECORDS - Reduce




    Task



                                                  31
4.4.                                             4   Google MapReduce   Hadoop MapReduce



4.3.9




Hadoop

                        Reporter::incrCount




  Reporter




  http://www.jakobhoman.com/2007/11/quick-tour-of-hadoops-reporter-object.html



4.4

4.4.1



                       MapReduce


Hadoop

  HTTP                                           JobTracker   HTTP
                                      (5.8.7)

         CUI            %




  JobClient

  StatusHttpServer


4.4.2



                     MapReduce




                                                32
4.5.                                                4    Google MapReduce           Hadoop MapReduce



Hadoop

                                     JobConf       ”mapred.job.tracker”    local

Map               Shuffle Reduce                 1                          (5.8.1)




  JobClient::init → LocalJobRunner


4.4.3



            MapReduce


Hadoop

  ./bin/hadoop job -kill-task                                                              ./bin/hadoop job -list




  JobClient



4.5

4.5.1 Combine



  Map                                                   Combine




Hadoop

  JobConf.setCombinerClass           (5.8.1)




  JobConf




                                                   33
4.5.                                                       4     Google MapReduce   Hadoop MapReduce



4.5.2



                                 Map              Reduce




Hadoop



   3.5.1                            (               )                      Task

                              Map
  Reducer                   Shuffle




  JobInProgress::createCache → InputFormat::getLocations → DistributedFileSystem::getFileBlockLocations




  http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf


4.5.3 Map                                    Shuffle



                        Map                              Shuffle                       Map
        Shuffle


Hadoop

                Shuffle                                   Reduce                      Map

                        Reduce              Map                              TaskTracker
            Shuffle

                    Fetch               1    TaskTracker




  ReduceTask.ReduceCopier::fetchOutputs




                                                         34
4.6.                                                4    Google MapReduce   Hadoop MapReduce



4.5.4



                                             I/O                      MapReduce




Hadoop

  SequenceFile       key-value                                                    gzip   lzo
                      ”mapred.output.compress”          true                        (5.8.1)




  OutputFormatBase


4.5.5 Map



  Map                                               Shuffle


Hadoop

  JobConf.setCompressMapOutput               (5.8.1)           MapTask
       True




  JobConf

  MapTask.MapOutputBuffer::MapOutputBuffer



4.6

4.6.1




Hadoop



                                                   35
4.6.                                                  4      Google MapReduce   Hadoop MapReduce




  JobInProgress::failedTask


4.6.2




Hadoop

  SpeculativeTask                                                     JobConf.setMapSpeculativeExecution,

JobConf.setReduceSpeculativeExecution    true                enable      TaskInProgress::hasSpeculativeTask
                                                          (5.8.5)




  TaskInProgress::hasSpeculativeTask


4.6.3




Hadoop

                                                                                     KILL
                     JobInProgress::completedTask

                              alreadyCompletedTask        KILL                                  completed
        SUCCEEDED




  JobInProgress::completedTask → TaskInProgress::alreadyCompletedTask




                                                     36
4.7.                                          4     Google MapReduce   Hadoop MapReduce



4.6.4




Hadoop




4.6.5




                                     HTML


Hadoop




  HADOOP-153




  http://issues.apache.org/jira/browse/HADOOP-153



4.7
  Hadoop MapReduce   Google MapReduce


  Hadoop MapReduce




                                             37
5.1.                                                                 5




       5




           Hadoop



5.1
                src/




                                      5.1:




                       conf
                       dfs                           HDFS

                       filecache
                       fs

                       io
                       ipc        IPC(Inter Process Communication)

                       log
                       mapred     MapReduce

                       metrics
                       net

                       record

                       security
                       tools

                       util




                                               38
5.2. org.apache.hadoop.util                                                             5




5.2 org.apache.hadoop.util
                        Hadoop


5.2.1 MergeSort

                                 Map


5.2.2 PriorityQueue




5.2.3 ReflectionUtils

  Java                                                   ReflectionUtils::newInstance




5.2.4 RunJar

  Jar


5.2.5 Tool

  MapReduce                                                                                 ToolRunner::run




5.3 org.apache.hadoop.io

5.3.1 Writable

                                               MapReduce                   key, value
                                   java.io.DataInput, java.io.DataOutput

                                  IntWritable, LongWritable, FloatWritable, BytesWritable, ArrayWritable,
TwoDArrayWritable, MapWritable


5.3.2 SequenceFile

  Key-Value                                                   Key-Value




                                                   39
5.4. org.apache.hadoop.ipc                                                                   5



5.3.3 compress

 compress
          BlockCompressorStream                                       GzipCodec   LzoCodec



5.4 org.apache.hadoop.ipc

5.4.1 VersionedProtocol




5.4.2 RPC, Server, Client

 RPC(Remote Procedure Call)


                                                                5.1
                                                                                                         
  Configuration conf = new Configuration();
  Server server = RPC.getServer(this, quot;localhostquot;, 8000, conf);          // localhost:8000
  server.start();
                                                                                                         
                                                 5.1



                                     5.2                        ClientProtocol


                                                                                                         
  Configuration conf = new Configuration();
  InetSocketAddress addr = new InetSocketAddress(quot;localhostquot;, 8000); //
  ClientProtocol client = (ClientProtocol)RPC.waitForProxy(ClientProtocol.class,
    ClientProtocol.versionID, addr, conf);
                                                                                                         
                                                 5.2



        ClientProtocol                     5.3


                  ClientProtocol                                                  ClientProtocol
                              Writable

                              Java
                         (    5.4)

                             ”ipc.client.connect.max.retries”                                    (   10
                                                         40
5.5. org.apache.hadoop.net                                                                5

                                                                                                             
  interface ClientProtocol extends org.apache.hadoop.ipc.VersionedProtocol {
    public static final long versionID = 1L;
    HeartbeatResponse heartbeat();
  }
  public class HeartbeatResponse implements org.apache.hadoop.io.Writable {
    String status;
    public void write(DataOutput out) throws IOException {
      UTF8.writeString(out, status);
    }
    public void readFields(DataInput in) throws IOException {
      this.status = UTF8.readString(in);
    }
  }
                                                                                                             
                                                   5.3

                                                                                                             
  client.heartbeat();
                                                                                                             
                                 5.4



 )                          60   (FSConstants.READ TIMEOUT)                           1



5.5 org.apache.hadoop.net

5.5.1 DNS

 DNS             (reverseDns      )                                                           IP
  (getIPs    )


5.5.2 Node, NodeBase

                                                                                      ”dfs.network.scripts”

                                          (3.5.1         )


5.5.3 NetworkTopology

 Hadoop                                                              Node

                        /
                                                                  (isOnSameRack   )

                            getDistance                                                       1




                                                             41
5.6. org.apache.hadoop.fs                                                        5




5.6 org.apache.hadoop.fs

5.6.1 FileSystem


                                              Amazon S3                 (                     s3

  )
  Hadoop                               hdfs://                                           file://

                     Amazon S3          s3:// Kosmos              [7]           kfs://
  createFileSystem                            (URI)                         ”fs.[scheme].impl”

                                                                    ”fs.hdfs.impl”

org.apache.hadoop.dfs.DistributedFileSystem
                                                                                                   
  Configuration conf = new Configuration();

  FileSystem fs1 = FileSystem.getNamed(quot;hdfs:///quot;, conf);
  Path inFile = new Path(quot;hdfs:///user/kzk/infilequot;);
  FSDataInputStream in = fs1.open(inFile);

  FileSystem fs2 = FileSystem.getNamed(quot;s3:///quot;, conf);
  Path outFile = new Path(quot;s3:///user/kzk/outfilequot;);

  FSDataOutputStream out = fs2.create(outFile);
  while((bytesRead = in.read(buffer))  0){
    out.write(buffer, 0, bytesRead);
  }
  in.close();
  out.close();
                                                                                                   
                                 5.5




                                                      5.5


5.6.2 LocalFileSystem

                                               FileSystem


5.6.3 InMemoryFileSystem


                                                   reserveSpace     reserveSpaceWithCheckSum

                                                                               InMemoryFileSystem
ReduceTask            Key     Value
                                                        42
5.7. org.apache.hadoop.dfs                          5



5.6.4 FSOutputSummer, FSInputStream

  FileSystem


5.6.5 Path

             Path


5.6.6 Trash

                              HDFS
                       (3.3.5) Emptier


5.6.7 FileUtil

  copy


5.6.8 FsShell




5.6.9 DU, DF

  UNIX       du          df                             DataNode




5.7 org.apache.hadoop.dfs

5.7.1 ClientProtocol

                     NameNode            RPC


5.7.2 DatanodeProtocol

  DataNode         NameNode          RPC


5.7.3 NamenodeProtocol

  Balancer        NameNode        RPC




                                               43
5.7. org.apache.hadoop.dfs                                                                       5



5.7.4 DistributedFileSystem

  FileSystem(5.6.1)                                                                 ”hdfs”
                                    DFSClient


5.7.5 DFSClient

  DFSClient     HDFS                                                                                   open(), create(),
exists(), listPaths(), mkdir()

                                 createNamenode           NameNode                                         ClientProto-
col

                                            DFSInputStream          DFSOutputStream                             HDFS




DFSInputStream

  DFSInputStream            NameNode                                         DataNode
                                                BlockReader                       BlockReader        DFSInputStream

blockSeekTo
  BlockReader         RPC                                       Socket


                                                                                         DataNode

                                                       DataNode
(DFSInputStream::readBuffer          )


DFSOutputStream

  DFSOutputStream                                       64K          ”Packet”

                   512K
DFSOutputStream                           Socket

                  dataQueue                 DataStreamer                  dataQueue
              DataNode                                                   ackQueue                    DataNode

                ack                                     ResponseProcessor                   DataNode         ack


         DataNode           ack                                   ackQueue

       ackQueue        dataQueue                                                 Datanode
  (DataStreamer::processDatanodeError           )


                                                    (DataStreamer::run       )
                                                          44
5.7. org.apache.hadoop.dfs                                                                       5



5.7.6 DataNode

  DataNode
                        NameNode

  DataNode                NameNode           HeartBeat                                (DataNode::offerService        )
HeartBeat                   DataNode

            RPC                             DatanodeProtocol                         HeartBeat
   DatanodeCommand                                                                            NameNode

             NameNode                     HeartBeat
  DataNode                                                                NameNode

NameNode


5.7.7 NameNode

  NameNode                                                                                           NameNode   1



  NameNode        ClientProtocol
DatanodeProtocol              DataNode        HeartBeat                   (               )


5.7.8 FSNamesystem

                                             NameNode               ClientProtocol                    FSNamesystem
                                         NameNode                                      RPC


  FSNamesystem


   • (1)

   • (2)                                        ((1)        )
   • (3)

   • (4)                                          ((3)          )

                     HDFS


FSDirectory

  FSNameSystem


INode




                                                       45
5.7. org.apache.hadoop.dfs                                                                 5



BlocksMap

                                                                                           INode




5.7.9 FSImage, FSEditLog

  FSImage

  FSImage                         FSEditLog


5.7.10 ReplicationTargetChooser



                             DataNode
DataNode                                                                       2                     1

                                        3                1




5.7.11 SecondaryNameNode

  SecondaryNameNode                 NameNode                                       NameNode

      ”fs.checkpoint.size”
         NameNode

           ”fs.checkpoint.dir”                                      SecondaryNameNode          NameNode
             ClientProtocol


5.7.12 Balancer

  Balancer    DataNode                                                                             DataNode
                      HDFS                                              DataNode

                 Balancer                      (3.4.2)                                 3.4.2


5.7.13 NamenodeFsck

  HDFS                                                                               [3]
      DataNode


                                                             NameNode




                                                     46
5.8. org.apache.hadoop.mapred                                                  5




5.8 org.apache.hadoop.mapred

5.8.1 JobConf

 JobConf                 MapReduce                                                          JobConf



   •           (setJobName)

   • Mapper           (setMapperClass)
   • Combiner           (setCombinerClass)

   • Reducer           (setReducerClass)
   • InputFormat           (setInputFormat)

   • OutputFormat               (setOutputFormat)
   •           (setInputPath)

   •           (setOutputPath)


           JobConf                  5.6
                                                                                                 
  // Create a new JobConf
  JobConf job = new JobConf(new Configuration(), MyJob.class);

  // Specify various job-specific parameters
  job.setJobName(quot;myjobquot;);

  job.setMapperClass(MyJob.MyMapper.class);
  job.setCombinerClass(MyJob.MyReducer.class);
  job.setReducerClass(MyJob.MyReducer.class);

  job.setInputFormat(SequenceFileInputFormat.class);
  job.setOutputFormat(SequenceFileOutputFormat.class);

  job.setInputPath(new Path(quot;inquot;));
  job.setOutputPath(new Path(quot;outquot;));
                                                                                                 
                                                  5.6 JobConf



                                                     (   5.7)


5.8.2 InputFormat

 InputFormat     MapReduce                                                         InputFormat




   •                             (validateInput      )
   •    Mapper                                                (getSplits   )
                                                         47
5.8. org.apache.hadoop.mapred                                                            5

                                                                                                                  
  // Map
  conf.setNumMapTasks(100);
  // Reduce
  conf.setNumReduceTasks(40);
  // Map
  conf.setMapDebugScript(quot;/home/kzk/debug/map-fail.shquot;);
  // Reduce
  conf.setReduceDebugScript(quot;/home/kzk/debug/reduce-fail.shquot;);
  // Map
  conf.setCompressMapOutput(true);
  //
  conf.setBoolean(quot;mapred.output.compressquot;, true);
  // MapReduce
  conf.set(quot;mapred.job.trackerquot;, quot;localquot;);
  conf.set(quot;fs.default.namequot;, quot;localquot;);
                                                                                                                  
                                        5.7 JobConf



   • InputSplit(            )                    RecordReader         (getRecordReader       )


 getSplits                        InputSplit

                    FileSplit
 getRecordReader                               Key-Value         (       )                   RecordReader

                                      RecordReader::next
                                   InputFormat


TextInputFormat

 TextInputFormat                                                InputFormat

                getRecordReader                LineRecordReader


                                InputFormat


KeyValueTextInputFormat

 KeyValueTextInputFormat                            Key-Value                                             Input-
Format        KeyValueTextInputFormat                                                                  getRecor-

dReader            KeyValueLineRecordReader




5.8.3 OutputFormat

 OutputFormat       MapReduce                                                                    OutputFormat



                                                       48
5.8. org.apache.hadoop.mapred                                                                       5


   •                                 (checkOutputSpecs          )
   •                                 RecordWriter          (getRecordWriter         )


                   TextOutputFormat        OutputFormat                                              keytvalue
                                                  OutputFormatBase::setCompressOutput




5.8.4 JobClient

 Job    JobTracker


 JobClient.runJob             Job                                                            JobTracker
            Job


5.8.5 JobTracker

 JobTracker        Job              TaskTracker         Task                     JobClient        JobTracker   submitJob
       RPC                      Job

 Job          jobInitQueue      add           JobInitThread                         JobInProgress::initTasks            Job
                            InputSplit

 TaskTracker             HeartBeat                                          (heartbeat        )
    TaskTracker                                                 TaskTrackerAction

LaunchJobAction, KillJobAction, KillTaskAction, ReinitTrackerAction                          TaskTracker                Task
                     LaunchTaskAction

              Task       TaskTracker                       getNewTaskForTaskTracker

       Map                                    Reduce                                                    JobInProgress
    obtainNewMapTask                obtainNewReduceTask

                              findNewTask
                                                                            TaskInProgress::hasSpeculativeTask

                                                               SpeculativeTask

   • Task

   •                     SpeculativeTask
   •                                                                SPECULATIVE GAP(2         )

   •                 SPECULATIVE LAG(60             )

   •        Task




                                                               49
5.8. org.apache.hadoop.mapred                                                                      5



5.8.6 TaskTracker

  TaskTracker               Task                                     offerService                   JobTracker
HeartBeat                                                                                    LaunchTaskAction

                                     (startNewTask          )
  startNewTask                   localizeJob                                          jar       HDFS

                       launchTaskForJob                                                     (TaskInProgress::launchTask
    )

  launchTask                       localizeTask
createRunner                      TaskRunner


TaskRunner

  TaskRunner                                                                                             java




MapTask

  MapTask             Map

  run                Map                            run              Map                           collector
MapRunner::run                                                           MapRunner::run            RecordReader

                                                          map
  collector        Reduce                         DirectMapOutputCollector                             MapOutputBuffer

                                       MapOutputBuffer
  MapOutputBuffer::collect                                                map

                                                                                            ReduceTask
MergeSorter                                                                    MergeSorter::addKeyValue


              MergeSorter                                                             (maxBufferSize)

                                                         (bufferWriter)              sortAndSpillToDisk


  sortAndSpillToDisk                 MergeSorter                                             (pendingSortImpl[i].sort())
     Combiner                                  combine                                                 RecordWriter

                            (spill      ) startPartition             RecordWriter                endPartition

                                                                 Partition
                                                                ReduceTask                      Partition

         run                 collector::flush                                                  mergeParts
Partition                    1                                                                   SequenceFile::Sorter
                                                                50
5.9.                                                                                       5




                      map                                               ReduceTask




ReduceTask

  ReduceTask            Reduce
  run             ReduceTask                             ReduceCopier                                 fetchOut-

puts             Map                                                                  Reducer
                                     Map                                          1

  run                       reduce               ReduceValuesIterator    Reduce                          collector
   reduce              collector     colect            RecordWriter


Mapper

  map                                                                                           map




Reducer

  reduce                                                                                        reduce




5.8.7 StatusHttpServer

  JobTracker, TaskTracker       StatusHttpServer                          HTTP

       (4.4.1) HTTP                  Jetty[6]




5.9


  Map         Reduce


                                                UNIX




                                                           51
6




      6




                                                     HDFS       Hadoop MapReduce



6.1
                 12                                                    DataNode        TaskTracker

1                              NameNode            JobTracker

                                         6.1                         100MBps      Ethernet

                                                      6.1

                              CPU            Intel Xeon E5430 2.66 GHz Quad Core
                            Memory           16G
                              Disk           SAS
                              OS             Linux 2.6.18-53.1.14.el5PAE
                              NIC            Broadcom NetXtreme II BGM5708 Gigabit Ethernet
                      I/O Scheduler          CFQ(Completely Fair Queing)




6.1.1

                                             bonnie++[1]

                            read/write                                                           bonnie++

                        6.1
                        (                )     80.2MB/sec                     (              )   94.2MB/sec

            347.7



6.2 HDFS
    HDFS                        MapReduce

        Hadoop                 TestDFSIO(hadoop-0.16.4-test.jar                    )
                    1                                                     3

                                                                52
6.2. HDFS                                                                                                   6

                                                                                                               
  $ tar vzxf bonnie++-1.03c.tar.gz
  $ cd bonnie++-1.03c
  $ ./configure
  $ make
  $ ./bonnie++
  Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
  Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
  tmr001          32G 44896 66 80263 14 39105     7 66683 94 94257 11 347.7     0
                      ------Sequential Create------ --------Random Create--------
                      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
                files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                   16 99920 98 585100 96 116893 100 101574 99 917100 100 121861 100
                                                                                                               
                                                        6.1 bonnie++




6.2.1

                                          6.2                           1G              100
5.7.10
                                                                                                               
  $ ./bin/hadoop jar hadoop-0.16.4-test.jar TestDFSIO -write -nrFiles 100 -fileSize 1000
                                                                                                               
                                                   6.2 1G * 100



                                                                     6.3

                                80


                                75


                                70


                                65
                MegaBytes/Sec




                                60


                                55


                                50


                                45


                                40


                                35
                                     5      6           7       8              9   10         11       12
                                                                    Machines

                                         6.3 1G * 100               (              (MB) /          )




                                                                    53
6.3.                                                                                                                   6



6.2.2

                100G                                 6.4


                                                                                                                                    
  $ ./bin/hadoop jar hadoop-0.16.4-test.jar TestDFSIO -read -nrFiles 100 -fileSize 1000
                                                                                                                                    
                                                      6.4 1G * 100



                                                                               6.5

                                200



                                190



                                180
                MegaBytes/Sec




                                170



                                160



                                150



                                140



                                130
                                      5          6         7          8              9    10          11          12
                                                                          Machines

                                          6.5 1G * 100                    (                    (MB) /        )




6.3
  MapReduce

                                100G                                                                                        1




6.3.1

                                Hadoop                         randomwrite                                 100G
         6.6                                                                             Key, Value                        10 1000

                                  100G                           1G                              1         Map
Key-Value                                       1KB                                      100M

       6.7
                                                                          54
6.3.                                                                                     6

                                                                                             
  ?xml version=quot;1.0quot;?
  ?xml-stylesheet type=quot;text/xslquot; href=quot;configuration.xslquot;?
  configuration
    property
      nametest.randomwrite.min_key/name
      value10/value
    /property
    property
      nametest.randomwrite.max_key/name
      value1000/value
    /property
    property
      nametest.randomwrite.min_value/name
      value10/value
    /property
    property
      nametest.randomwrite.max_value/name
      value1000/value
    /property
    property
      nametest.randomwriter.bytes_per_map/name
      value1000000000/value
    /property
    property
      nametest.randomwrite.total_bytes/name
      value100000000000/value
    /property
  /configuration
                                                                                             
                            6.6 100G                      (randomwriter.conf)

                                                                                             
  $ ./bin/hadoop jar hadoop-0.16.4-examples.jar randomwriter -conf randomwriter.conf random
                                                                                             
                                       6.7 100G



                     100G                                           6.8

                                           6.9




                                                  55
6.3.                                                                                                                        6




                         9000


                         8000


                         7000


                         6000
       Sec




                         5000


                         4000


                         3000


                         2000


                         1000
                                  3       4        5           6        7         8           9        10        11    12
                                                                         Machines

                                                       6.8 100G                 (     /           )




                         60


                         55


                         50


                         45


                         40
         MegaBytes/Sec




                         35


                         30


                         25


                         20


                         15


                         10
                              3       4        5           6           7         8        9           10        11    12
                                                                        Machines

                                          6.9 100G                 (                      (MB) /            )




                                                                       56
6.3.                                                                                                            6



6.3.2

                                                      6.10
                                                                                                                   
  $ ./bin/hadoop jar hadoop-0.16.4-examples.jar sort random radom-sort
                                                                                                                   
                                                       6.10



                                900


                                800


                                700


                                600
                Sec




                                500


                                400


                                300


                                200


                                100
                                      3   4       5       6       7         8        9       10       11   12
                                                                   Machines

                                                      6.11 100G          (      /        )



                                550


                                500


                                450


                                400
                MegaBytes/Sec




                                350


                                300


                                250


                                200


                                150


                                100
                                      3   4       5       6       7         8        9       10       11   12
                                                                   Machines

                                              6.12 100G       (                     (MB) /        )



                                                                        6.11

        6.12
                                                                  57
6.4.                               6




6.4
                Hadoop   12
                                       1

       Hadoop
                 3

Hadoop




                              58
7




7




             Hadoop                 GFS, Google MapReduce

Hadoop                           Hadoop




    Hadoop                                                  Hadoop


              Hadoop   12

                                  Hadoop




                            59
[1] Bonnie++ project homepage. http://www.coker.com.au/bonnie++/.
 [2] Commons logging. http://commons.apache.org/logging/.

 [3] Hadoop dfs user guide. http://hadoop.apache.org/core/docs/current/hdfs user guide.html.
 [4] Hadoop project homepage. http://hadoop.apache.org/core/.

 [5] Hadoop streaming documentation. http://hadoop.apache.org/core/docs/current/streaming.html.
 [6] Jetty. http://www.mortbay.org/jetty-6/.

 [7] Kosmos filesystem. http://kosmosfs.sourceforge.net/.
 [8] Lucene project homepage. http://lucene.apache.org/.

 [9] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun.

    ACM, 51(1):107–113, 2008.
[10] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. SIGOPS Oper. Syst.

    Rev., 37(5):29–43, 2003.




                                                    60

Weitere ähnliche Inhalte

Was ist angesagt?

Malware en Android: Discovering, Reversing & Forensics
Malware en Android: Discovering, Reversing & ForensicsMalware en Android: Discovering, Reversing & Forensics
Malware en Android: Discovering, Reversing & ForensicsTelefónica
 
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.Ngb Djd
 
Índice Libro "macOS Hacking" de 0xWord
Índice Libro "macOS Hacking" de 0xWordÍndice Libro "macOS Hacking" de 0xWord
Índice Libro "macOS Hacking" de 0xWordTelefónica
 
เอกสารคัดเลือกชำนาญการ.
เอกสารคัดเลือกชำนาญการ.เอกสารคัดเลือกชำนาญการ.
เอกสารคัดเลือกชำนาญการ.Wimol Get
 
Índice Pentesting con Kali 2.0
Índice Pentesting con Kali 2.0Índice Pentesting con Kali 2.0
Índice Pentesting con Kali 2.0Chema Alonso
 
Guide rs1213 n-
Guide rs1213 n-Guide rs1213 n-
Guide rs1213 n-lct 77
 
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...Egor
 
The Ring programming language version 1.5 book - Part 1 of 31
The Ring programming language version 1.5 book - Part 1 of 31The Ring programming language version 1.5 book - Part 1 of 31
The Ring programming language version 1.5 book - Part 1 of 31Mahmoud Samir Fayed
 
ΑΒΑΚΙΟ Εγχειρίδιο μαθηματικών
ΑΒΑΚΙΟ   Εγχειρίδιο μαθηματικώνΑΒΑΚΙΟ   Εγχειρίδιο μαθηματικών
ΑΒΑΚΙΟ Εγχειρίδιο μαθηματικώνpliakas
 
6-daftar isi ekonomi pencemaran udara
  6-daftar isi ekonomi pencemaran udara  6-daftar isi ekonomi pencemaran udara
6-daftar isi ekonomi pencemaran udaraFurqaan Hamsyani
 
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...nataliej4
 
Rekapitulasi dpr ri kedungjaran 2019
Rekapitulasi dpr ri kedungjaran 2019Rekapitulasi dpr ri kedungjaran 2019
Rekapitulasi dpr ri kedungjaran 2019ari saridjo
 

Was ist angesagt? (20)

Cobit 41.rus.blank
Cobit 41.rus.blankCobit 41.rus.blank
Cobit 41.rus.blank
 
Malware en Android: Discovering, Reversing & Forensics
Malware en Android: Discovering, Reversing & ForensicsMalware en Android: Discovering, Reversing & Forensics
Malware en Android: Discovering, Reversing & Forensics
 
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.
фізика 8 клас бойко м.п., венгер є.ф., мельничук о.в.
 
Luận văn: Giải pháp phát triển sản phẩm Ngoại hối phái sinh tại Ngân hàng Côn...
Luận văn: Giải pháp phát triển sản phẩm Ngoại hối phái sinh tại Ngân hàng Côn...Luận văn: Giải pháp phát triển sản phẩm Ngoại hối phái sinh tại Ngân hàng Côn...
Luận văn: Giải pháp phát triển sản phẩm Ngoại hối phái sinh tại Ngân hàng Côn...
 
Dhmiourgos_montelwn
Dhmiourgos_montelwn Dhmiourgos_montelwn
Dhmiourgos_montelwn
 
GuruBhaktiYog
GuruBhaktiYogGuruBhaktiYog
GuruBhaktiYog
 
Índice Libro "macOS Hacking" de 0xWord
Índice Libro "macOS Hacking" de 0xWordÍndice Libro "macOS Hacking" de 0xWord
Índice Libro "macOS Hacking" de 0xWord
 
เอกสารคัดเลือกชำนาญการ.
เอกสารคัดเลือกชำนาญการ.เอกสารคัดเลือกชำนาญการ.
เอกสารคัดเลือกชำนาญการ.
 
Índice Pentesting con Kali 2.0
Índice Pentesting con Kali 2.0Índice Pentesting con Kali 2.0
Índice Pentesting con Kali 2.0
 
Guide rs1213 n-
Guide rs1213 n-Guide rs1213 n-
Guide rs1213 n-
 
5058
50585058
5058
 
Tipografia1
Tipografia1Tipografia1
Tipografia1
 
Daftar isi
Daftar isiDaftar isi
Daftar isi
 
Inschrijfformulier Kokkie
Inschrijfformulier KokkieInschrijfformulier Kokkie
Inschrijfformulier Kokkie
 
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...
ГОСТ Р ИСО/МЭК 15910-2002 Процесс создания документации пользователя программ...
 
The Ring programming language version 1.5 book - Part 1 of 31
The Ring programming language version 1.5 book - Part 1 of 31The Ring programming language version 1.5 book - Part 1 of 31
The Ring programming language version 1.5 book - Part 1 of 31
 
ΑΒΑΚΙΟ Εγχειρίδιο μαθηματικών
ΑΒΑΚΙΟ   Εγχειρίδιο μαθηματικώνΑΒΑΚΙΟ   Εγχειρίδιο μαθηματικών
ΑΒΑΚΙΟ Εγχειρίδιο μαθηματικών
 
6-daftar isi ekonomi pencemaran udara
  6-daftar isi ekonomi pencemaran udara  6-daftar isi ekonomi pencemaran udara
6-daftar isi ekonomi pencemaran udara
 
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...
BÁO CÁO THUYẾT MINH TỔNG HỢP ĐIỀU CHỈNH QUY HOẠCH SỬ DỤNG ĐẤT ĐẾN NĂM 2020 VÀ...
 
Rekapitulasi dpr ri kedungjaran 2019
Rekapitulasi dpr ri kedungjaran 2019Rekapitulasi dpr ri kedungjaran 2019
Rekapitulasi dpr ri kedungjaran 2019
 

Andere mochten auch

wa-cometjava-pdf
wa-cometjava-pdfwa-cometjava-pdf
wa-cometjava-pdfHiroshi Ono
 
Webマイニングと情報論的学習理論
Webマイニングと情報論的学習理論Webマイニングと情報論的学習理論
Webマイニングと情報論的学習理論Hiroshi Ono
 
私がチャレンジしたSBMデータマイニング
私がチャレンジしたSBMデータマイニング私がチャレンジしたSBMデータマイニング
私がチャレンジしたSBMデータマイニングHiroshi Ono
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説Hiroshi Ono
 

Andere mochten auch (8)

fj
fjfj
fj
 
p723-zukowski
p723-zukowskip723-zukowski
p723-zukowski
 
PythonTribe
PythonTribePythonTribe
PythonTribe
 
future-search
future-searchfuture-search
future-search
 
wa-cometjava-pdf
wa-cometjava-pdfwa-cometjava-pdf
wa-cometjava-pdf
 
Webマイニングと情報論的学習理論
Webマイニングと情報論的学習理論Webマイニングと情報論的学習理論
Webマイニングと情報論的学習理論
 
私がチャレンジしたSBMデータマイニング
私がチャレンジしたSBMデータマイニング私がチャレンジしたSBMデータマイニング
私がチャレンジしたSBMデータマイニング
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説
 

Mehr von Hiroshi Ono

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipediaHiroshi Ono
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitectureHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdfHiroshi Ono
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 

Mehr von Hiroshi Ono (20)

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipedia
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitecture
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdf
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdf
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 

Hadoop 調査報告書

  • 1. Hadoop Preferred Infrastructure 20 8 25
  • 2. ( NTT Preferred Infras- tructure( Preferred Infrastructure ) NTT Preferred Infrastructure NTT • Preferred Infrastructure NTT • Preferred Infrastructure: E-mail: info@preferred.jp NTT E-mail: pr@nttr.co.jp Copyright c NTT Resonant Inc. 2008 i
  • 3. 2008 8 25 ii
  • 4. 1 1 8 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Hadoop 9 3 GFS HDFS 10 3.1 GFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.3 HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.5 . . . . . . . . . . . . . . . . . . . . . . 14 3.3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
  • 5. 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6.4 . . . . . . . . . . . . . . . . . . . . 22 3.6.5 . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.6.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.6.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6.10 (Read-Only ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Google MapReduce Hadoop MapReduce 26 4.1 Google MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.3 Hadoop MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.3 Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.4 Map Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.5 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2
  • 6. 4.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.1 Combine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.2 . . . . . . . . . . . . . . . . . . . . . . . 34 4.5.3 Map Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5.5 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.6.1 . . . . . . . . . . . . . . . . . . . . . . . 35 4.6.2 . . . . . . . . . . . . 36 4.6.3 . . . . . . . . . . . . . . 36 4.6.4 . . . . . . . . . . . . . . . . . . . . . . . 37 4.6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 38 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 org.apache.hadoop.util . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.1 MergeSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.2 PriorityQueue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.3 ReflectionUtils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.4 RunJar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.5 Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 org.apache.hadoop.io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3.1 Writable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3.2 SequenceFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3.3 compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 org.apache.hadoop.ipc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4.1 VersionedProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4.2 RPC, Server, Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 org.apache.hadoop.net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.1 DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.2 Node, NodeBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.3 NetworkTopology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.6 org.apache.hadoop.fs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6.1 FileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6.2 LocalFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3
  • 7. 5.6.3 InMemoryFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6.4 FSOutputSummer, FSInputStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.6.5 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.6.6 Trash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.6.7 FileUtil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.6.8 FsShell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.6.9 DU, DF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.7 org.apache.hadoop.dfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.7.1 ClientProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.7.2 DatanodeProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.7.3 NamenodeProtocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.7.4 DistributedFileSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.7.5 DFSClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.7.6 DataNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.7.7 NameNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.7.8 FSNamesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.7.9 FSImage, FSEditLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.7.10 ReplicationTargetChooser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.7.11 SecondaryNameNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.7.12 Balancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.7.13 NamenodeFsck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.8 org.apache.hadoop.mapred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.8.1 JobConf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.8.2 InputFormat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.8.3 OutputFormat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.8.4 JobClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.8.5 JobTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.8.6 TaskTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.8.7 StatusHttpServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6 52 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2 HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4
  • 8. 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7 59 60 5
  • 9. 2.1 Google, OSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6 JobConf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.7 JobConf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.1 bonnie++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 1G * 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3 1G * 100 ( (MB) / ) . . . . . . . . . . . . . . . . . . . . . . . 53 6.4 1G * 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.5 1G * 100 ( (MB) / ) . . . . . . . . . . . . . . . . . . . 54 6.6 100G (randomwriter.conf) . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.7 100G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.8 100G ( / ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.9 100G ( (MB) / ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.11 100G ( / ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.12 100G ( (MB) / ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6
  • 10. 3.1 Google File System Hadoop . . . . . . . . . . . . . . . . . . . . . . 11 4.1 Google MapReduce Hadoop . . . . . . . . . . . . . . . . . . . . . . 27 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7
  • 11. 1 1 1.1 Hadoop[4] 2 Hadoop 3, 4 Google Google File System[10] MapReduce[9] Hadoop 5 Hadoop 6 Hadoop 7 Hadoop 0.16.4 8
  • 12. 2 Hadoop 2 Hadoop Hadoop Yahoo! Inc. Doug Cutting Lucene[8] Lucene Hadoop Google Google File System( GFS) MapReduce Hadoop HDFS Hadoop Distributed File System Hadoop MapReduce Framework Google GFS MapReduce 2.1 BigTable hBase 2.1 Google, OSS Hadoop Java MapReduce Java Hadoop Streaming[5] C/C++ Ruby Python MapReduce 9
  • 13. 3 GFS HDFS 3 GFS HDFS GFS HDFS GFS Hadoop 3.1 GFS GFS [10] 3.1.1 GFS PC • TB • PC • 3.1.2 GFS 64MB PC GFS 3 • • • 10
  • 14. 3.2. 3 GFS HDFS GFS GFS GFS 3.1.3 HDFS HDFS GFS HDFS NameNode DataNode HDFS 3.2 3.1 GFS Hadoop HDFS 3.1: Google File System Hadoop Hadoop 11
  • 15. 3.3. 3 GFS HDFS Hadoop (Read-Only ) 3.3 3.3.1 Hadoop NameNode NameNode DFSClient::mkdirs 12
  • 16. 3.3. 3 GFS HDFS 3.3.2 Hadoop NameNode NameNode DFSClient::delete 3.3.3 Hadoop NameNode NameNode DFSClient::create 3.3.4 Hadoop NameNode NameNode DFSClient::delete 13
  • 17. 3.3. 3 GFS HDFS 3.3.5 (5.6.6) Hadoop delete /trash /trash NameNode.emptier 3.3.6 Hadoop DFSInputStream (5.7.5) NameNode DataNode 1 DFSInputStream::read 3.3.7 Hadoop DFSOutputStream (5.7.5) NameNode DataNode 14
  • 18. 3.3. 3 GFS HDFS DFSOutputStream::writeChunk 3.3.8 Hadoop DFSInputStream (5.7.5) NameNode DataNode NameNode DFSInputStream::read 3.3.9 Hadoop 3.3.10 Hadoop DFSClient NameNode NameNode 15
  • 19. 3.3. 3 GFS HDFS DFSClient::rename 3.3.11 Hadoop DFSClient NameNode NameNode DFSClient::listPaths 3.3.12 Hadoop whoami bash -c groups ( ) (bin/hadoop dfs) DFSClient::getFileInfo http://hadoop.apache.org/core/docs/current/hdfs permissions guide.html 16
  • 20. 3.3. 3 GFS HDFS 3.3.13 Hadoop whoami bash -c groups ( ) (bin/hadoop dfs) DFSClient::getFileInfo http://hadoop.apache.org/core/docs/current/hdfs permissions guide.html 3.3.14 Hadoop HADOOP-1700 http://issues.apache.org/jira/browse/HADOOP-1700 17
  • 21. 3.4. 3 GFS HDFS 3.3.15 ( ) Hadoop FSImage NameNode FSImage 3.3.16 Hadoop HeartBeat DataNode NameNode HeartBeat NameNode DataNode HeartBeat DataNode HeartBeat ”dfs.heartbeat.interval” 3 DataNode::offerService → NameNode::sendHeartbeat 3.4 3.4.1 Hadoop DataNode NameNode HeartBeat NameNode DataNode DataNode 18
  • 22. 3.4. 3 GFS HDFS DataNode::offerService → NameNode::sendHeartbeat 3.4.2 Hadoop ./bin/hadoop/start-balancer.sh DF::run (5.6.9) df DataNode DF HeartBeat NameNode Balancer https://issues.apache.org/jira/browse/HADOOP-1652 3.4.3 Hadoop FSNameSystem::checkPermission → PermissionChecker::checkPermission 19
  • 23. 3.5. 3 GFS HDFS 3.4.4 RPC Hadoop Hadoop Commons-Logging([2]) log4j (org.apache.commons.logging.*) 3.5 3.5.1 Hadoop (5.5.2, 5.5.3) ”dfs.network.scripts” FSNameSystem::getBlockLocations → NetworkTopology::pseudoSortByDistance http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf 20
  • 24. 3.6. 3 GFS HDFS 3.6 3.6.1 Hadoop id NameNode id Block FSNamesystem::allocateBlock 3.6.2 Hadoop ”io.bytes.per.checksum” ( 512 ) NameNode DFSOutputStream DFSInputStream 21
  • 25. 3.6. 3 GFS HDFS 3.6.3 Hadoop 3.6.4 Hadoop write 5.7.10 FSNameSystem::pendingTransfers ReplicationTargetChooser(5.7.10) http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf 3.6.5 2 22
  • 26. 3.6. 3 GFS HDFS Hadoop GFS DataNode DataNode DataNode DataNode DFSOutputStream 3.6.6 Hadoop RPC (5.4.2) (5.7.5) RPC DFSOutputStream DFSInputStream 3.6.7 Hadoop FSEditLog (5.7.9) dfs.name.dir FSImage FSEditLog 23
  • 27. 3.6. 3 GFS HDFS 3.6.8 Hadoop SecondaryNameNode NameNode (5.7.11) NameNode SecondaryNameNode SecondaryNameNode NameNode SecondaryNameNode SecondaryNameNode::run 3.6.9 Hadoop NameNode FSImage loadFSImage (5.7.9) FSImage::loadFSImage → FSEditLog::loadFSEdits 3.6.10 (Read-Only ) 24
  • 28. 3.7. 3 GFS HDFS Hadoop SecondaryNameNode 3.7 HDFS GFS 0.16.4 0.19 HDFS DataNode NameNode NameNode NameNode NameNode 25
  • 29. 4 Google MapReduce Hadoop MapReduce 4 Google MapReduce Hadoop MapReduce 4.1 Google MapReduce Google MapReduce [9] 4.1.1 Google MapReduce PC • MapReduce MapReduce PC 3 – Map – Shuffle – Reduce MapReduce • GFS • 26
  • 30. 4.2. 4 Google MapReduce Hadoop MapReduce 4.1.2 Google MapReduce M Map M Map Map PC Map Map R Reduce R Reduce Reduce PC M Reduce R MR Reduce Reduce M Reduce Google MapReduce MapReduce MapReduce Map Reduce Map Reduce 4.1.3 Hadoop MapReduce Hadoop MapReduce Google MapReduce Hadoop MapReduce JobTracker, TaskTracker Hadoop HadoopStream- ing MapReduce HadoopStreaming[5] MapReduce 4.2 4.1 Google MapReduce Hadoop Hadoop MapReduce 4.1: Google MapReduce Hadoop Hadoop MapReduce Shuffle Map Reduce Map 27
  • 31. 4.3. 4 Google MapReduce Hadoop MapReduce Hadoop Combine Map Shuffle Map 4.3 4.3.1 MapReduce MapReduce Hadoop Hadoop Java MapReduce HadoopStreaming MapReduce JobClient JobTracker TaskTracker 28
  • 32. 4.3. 4 Google MapReduce Hadoop MapReduce 4.3.2 Hadoop HeartBeat TaskTracker JobTracker HeartBeat HeartBeat ( /50 + 1) (JobTracker::getNextHeartbeatInterval) 5 TaskTracker::transmitHeartBeat → JobTracker::heartbeat 4.3.3 Shuffle Reducer (Shuffle) Hash Shuffle Shuffle Shuffle Hadoop JobConf.setPartitioner HashPartitioner, KeyFieldBasedPartitioner HashPartitioner Reducer KeyFieldBasedPartitioner HashPartitioner KeyFieldBasedPartitioner 4.3.4 Map Reduce Map Reduce 29
  • 33. 4.3. 4 Google MapReduce Hadoop MapReduce Hadoop MapReduce JobConf.setNumReduceTasks, JobConf.setNumMapTasks (5.8.1) JobConf::setNumMapTasks JobConf::setNumReduceTasks 4.3.5 Map Map Hadoop InputSplit Map (5.8.5) InputSplit InputSplit 4.3.6 MapReduce Hadoop JobConf::setJobPriority 4.3.7 MapReduce 30
  • 34. 4.3. 4 Google MapReduce Hadoop MapReduce Hadoop JobConf.setInputFormat, JobConf.setOutputFormat MapReduce (5.8.1) TextInputFormat key-value 1 1 SequenceFileAsTextInputFormat key-value 1 1 TextOutputFormat InputFormat OutputFormat TextInputFormat TextOutputFormat SequenceFile 4.3.8 Hadoop Task Counter enumeration 1 key-value • MAP INPUT RECORDS, - Map • MAP OUTPUT RECORDS, - Map • MAP INPUT BYTES, - Map • MAP OUTPUT BYTES, - Map • COMBINE INPUT RECORDS, - Combine • COMBINE OUTPUT RECORDS, - Combine • REDUCE INPUT GROUPS, - Reduce • REDUCE INPUT RECORDS, - Reduce • REDUCE OUTPUT RECORDS - Reduce Task 31
  • 35. 4.4. 4 Google MapReduce Hadoop MapReduce 4.3.9 Hadoop Reporter::incrCount Reporter http://www.jakobhoman.com/2007/11/quick-tour-of-hadoops-reporter-object.html 4.4 4.4.1 MapReduce Hadoop HTTP JobTracker HTTP (5.8.7) CUI % JobClient StatusHttpServer 4.4.2 MapReduce 32
  • 36. 4.5. 4 Google MapReduce Hadoop MapReduce Hadoop JobConf ”mapred.job.tracker” local Map Shuffle Reduce 1 (5.8.1) JobClient::init → LocalJobRunner 4.4.3 MapReduce Hadoop ./bin/hadoop job -kill-task ./bin/hadoop job -list JobClient 4.5 4.5.1 Combine Map Combine Hadoop JobConf.setCombinerClass (5.8.1) JobConf 33
  • 37. 4.5. 4 Google MapReduce Hadoop MapReduce 4.5.2 Map Reduce Hadoop 3.5.1 ( ) Task Map Reducer Shuffle JobInProgress::createCache → InputFormat::getLocations → DistributedFileSystem::getFileBlockLocations http://issues.apache.org/jira/secure/attachment/12345251/Rack aware HDFS proposal.pdf 4.5.3 Map Shuffle Map Shuffle Map Shuffle Hadoop Shuffle Reduce Map Reduce Map TaskTracker Shuffle Fetch 1 TaskTracker ReduceTask.ReduceCopier::fetchOutputs 34
  • 38. 4.6. 4 Google MapReduce Hadoop MapReduce 4.5.4 I/O MapReduce Hadoop SequenceFile key-value gzip lzo ”mapred.output.compress” true (5.8.1) OutputFormatBase 4.5.5 Map Map Shuffle Hadoop JobConf.setCompressMapOutput (5.8.1) MapTask True JobConf MapTask.MapOutputBuffer::MapOutputBuffer 4.6 4.6.1 Hadoop 35
  • 39. 4.6. 4 Google MapReduce Hadoop MapReduce JobInProgress::failedTask 4.6.2 Hadoop SpeculativeTask JobConf.setMapSpeculativeExecution, JobConf.setReduceSpeculativeExecution true enable TaskInProgress::hasSpeculativeTask (5.8.5) TaskInProgress::hasSpeculativeTask 4.6.3 Hadoop KILL JobInProgress::completedTask alreadyCompletedTask KILL completed SUCCEEDED JobInProgress::completedTask → TaskInProgress::alreadyCompletedTask 36
  • 40. 4.7. 4 Google MapReduce Hadoop MapReduce 4.6.4 Hadoop 4.6.5 HTML Hadoop HADOOP-153 http://issues.apache.org/jira/browse/HADOOP-153 4.7 Hadoop MapReduce Google MapReduce Hadoop MapReduce 37
  • 41. 5.1. 5 5 Hadoop 5.1 src/ 5.1: conf dfs HDFS filecache fs io ipc IPC(Inter Process Communication) log mapred MapReduce metrics net record security tools util 38
  • 42. 5.2. org.apache.hadoop.util 5 5.2 org.apache.hadoop.util Hadoop 5.2.1 MergeSort Map 5.2.2 PriorityQueue 5.2.3 ReflectionUtils Java ReflectionUtils::newInstance 5.2.4 RunJar Jar 5.2.5 Tool MapReduce ToolRunner::run 5.3 org.apache.hadoop.io 5.3.1 Writable MapReduce key, value java.io.DataInput, java.io.DataOutput IntWritable, LongWritable, FloatWritable, BytesWritable, ArrayWritable, TwoDArrayWritable, MapWritable 5.3.2 SequenceFile Key-Value Key-Value 39
  • 43. 5.4. org.apache.hadoop.ipc 5 5.3.3 compress compress BlockCompressorStream GzipCodec LzoCodec 5.4 org.apache.hadoop.ipc 5.4.1 VersionedProtocol 5.4.2 RPC, Server, Client RPC(Remote Procedure Call) 5.1 Configuration conf = new Configuration(); Server server = RPC.getServer(this, quot;localhostquot;, 8000, conf); // localhost:8000 server.start(); 5.1 5.2 ClientProtocol Configuration conf = new Configuration(); InetSocketAddress addr = new InetSocketAddress(quot;localhostquot;, 8000); // ClientProtocol client = (ClientProtocol)RPC.waitForProxy(ClientProtocol.class, ClientProtocol.versionID, addr, conf); 5.2 ClientProtocol 5.3 ClientProtocol ClientProtocol Writable Java ( 5.4) ”ipc.client.connect.max.retries” ( 10 40
  • 44. 5.5. org.apache.hadoop.net 5 interface ClientProtocol extends org.apache.hadoop.ipc.VersionedProtocol { public static final long versionID = 1L; HeartbeatResponse heartbeat(); } public class HeartbeatResponse implements org.apache.hadoop.io.Writable { String status; public void write(DataOutput out) throws IOException { UTF8.writeString(out, status); } public void readFields(DataInput in) throws IOException { this.status = UTF8.readString(in); } } 5.3 client.heartbeat(); 5.4 ) 60 (FSConstants.READ TIMEOUT) 1 5.5 org.apache.hadoop.net 5.5.1 DNS DNS (reverseDns ) IP (getIPs ) 5.5.2 Node, NodeBase ”dfs.network.scripts” (3.5.1 ) 5.5.3 NetworkTopology Hadoop Node / (isOnSameRack ) getDistance 1 41
  • 45. 5.6. org.apache.hadoop.fs 5 5.6 org.apache.hadoop.fs 5.6.1 FileSystem Amazon S3 ( s3 ) Hadoop hdfs:// file:// Amazon S3 s3:// Kosmos [7] kfs:// createFileSystem (URI) ”fs.[scheme].impl” ”fs.hdfs.impl” org.apache.hadoop.dfs.DistributedFileSystem Configuration conf = new Configuration(); FileSystem fs1 = FileSystem.getNamed(quot;hdfs:///quot;, conf); Path inFile = new Path(quot;hdfs:///user/kzk/infilequot;); FSDataInputStream in = fs1.open(inFile); FileSystem fs2 = FileSystem.getNamed(quot;s3:///quot;, conf); Path outFile = new Path(quot;s3:///user/kzk/outfilequot;); FSDataOutputStream out = fs2.create(outFile); while((bytesRead = in.read(buffer)) 0){ out.write(buffer, 0, bytesRead); } in.close(); out.close(); 5.5 5.5 5.6.2 LocalFileSystem FileSystem 5.6.3 InMemoryFileSystem reserveSpace reserveSpaceWithCheckSum InMemoryFileSystem ReduceTask Key Value 42
  • 46. 5.7. org.apache.hadoop.dfs 5 5.6.4 FSOutputSummer, FSInputStream FileSystem 5.6.5 Path Path 5.6.6 Trash HDFS (3.3.5) Emptier 5.6.7 FileUtil copy 5.6.8 FsShell 5.6.9 DU, DF UNIX du df DataNode 5.7 org.apache.hadoop.dfs 5.7.1 ClientProtocol NameNode RPC 5.7.2 DatanodeProtocol DataNode NameNode RPC 5.7.3 NamenodeProtocol Balancer NameNode RPC 43
  • 47. 5.7. org.apache.hadoop.dfs 5 5.7.4 DistributedFileSystem FileSystem(5.6.1) ”hdfs” DFSClient 5.7.5 DFSClient DFSClient HDFS open(), create(), exists(), listPaths(), mkdir() createNamenode NameNode ClientProto- col DFSInputStream DFSOutputStream HDFS DFSInputStream DFSInputStream NameNode DataNode BlockReader BlockReader DFSInputStream blockSeekTo BlockReader RPC Socket DataNode DataNode (DFSInputStream::readBuffer ) DFSOutputStream DFSOutputStream 64K ”Packet” 512K DFSOutputStream Socket dataQueue DataStreamer dataQueue DataNode ackQueue DataNode ack ResponseProcessor DataNode ack DataNode ack ackQueue ackQueue dataQueue Datanode (DataStreamer::processDatanodeError ) (DataStreamer::run ) 44
  • 48. 5.7. org.apache.hadoop.dfs 5 5.7.6 DataNode DataNode NameNode DataNode NameNode HeartBeat (DataNode::offerService ) HeartBeat DataNode RPC DatanodeProtocol HeartBeat DatanodeCommand NameNode NameNode HeartBeat DataNode NameNode NameNode 5.7.7 NameNode NameNode NameNode 1 NameNode ClientProtocol DatanodeProtocol DataNode HeartBeat ( ) 5.7.8 FSNamesystem NameNode ClientProtocol FSNamesystem NameNode RPC FSNamesystem • (1) • (2) ((1) ) • (3) • (4) ((3) ) HDFS FSDirectory FSNameSystem INode 45
  • 49. 5.7. org.apache.hadoop.dfs 5 BlocksMap INode 5.7.9 FSImage, FSEditLog FSImage FSImage FSEditLog 5.7.10 ReplicationTargetChooser DataNode DataNode 2 1 3 1 5.7.11 SecondaryNameNode SecondaryNameNode NameNode NameNode ”fs.checkpoint.size” NameNode ”fs.checkpoint.dir” SecondaryNameNode NameNode ClientProtocol 5.7.12 Balancer Balancer DataNode DataNode HDFS DataNode Balancer (3.4.2) 3.4.2 5.7.13 NamenodeFsck HDFS [3] DataNode NameNode 46
  • 50. 5.8. org.apache.hadoop.mapred 5 5.8 org.apache.hadoop.mapred 5.8.1 JobConf JobConf MapReduce JobConf • (setJobName) • Mapper (setMapperClass) • Combiner (setCombinerClass) • Reducer (setReducerClass) • InputFormat (setInputFormat) • OutputFormat (setOutputFormat) • (setInputPath) • (setOutputPath) JobConf 5.6 // Create a new JobConf JobConf job = new JobConf(new Configuration(), MyJob.class); // Specify various job-specific parameters job.setJobName(quot;myjobquot;); job.setMapperClass(MyJob.MyMapper.class); job.setCombinerClass(MyJob.MyReducer.class); job.setReducerClass(MyJob.MyReducer.class); job.setInputFormat(SequenceFileInputFormat.class); job.setOutputFormat(SequenceFileOutputFormat.class); job.setInputPath(new Path(quot;inquot;)); job.setOutputPath(new Path(quot;outquot;)); 5.6 JobConf ( 5.7) 5.8.2 InputFormat InputFormat MapReduce InputFormat • (validateInput ) • Mapper (getSplits ) 47
  • 51. 5.8. org.apache.hadoop.mapred 5 // Map conf.setNumMapTasks(100); // Reduce conf.setNumReduceTasks(40); // Map conf.setMapDebugScript(quot;/home/kzk/debug/map-fail.shquot;); // Reduce conf.setReduceDebugScript(quot;/home/kzk/debug/reduce-fail.shquot;); // Map conf.setCompressMapOutput(true); // conf.setBoolean(quot;mapred.output.compressquot;, true); // MapReduce conf.set(quot;mapred.job.trackerquot;, quot;localquot;); conf.set(quot;fs.default.namequot;, quot;localquot;); 5.7 JobConf • InputSplit( ) RecordReader (getRecordReader ) getSplits InputSplit FileSplit getRecordReader Key-Value ( ) RecordReader RecordReader::next InputFormat TextInputFormat TextInputFormat InputFormat getRecordReader LineRecordReader InputFormat KeyValueTextInputFormat KeyValueTextInputFormat Key-Value Input- Format KeyValueTextInputFormat getRecor- dReader KeyValueLineRecordReader 5.8.3 OutputFormat OutputFormat MapReduce OutputFormat 48
  • 52. 5.8. org.apache.hadoop.mapred 5 • (checkOutputSpecs ) • RecordWriter (getRecordWriter ) TextOutputFormat OutputFormat keytvalue OutputFormatBase::setCompressOutput 5.8.4 JobClient Job JobTracker JobClient.runJob Job JobTracker Job 5.8.5 JobTracker JobTracker Job TaskTracker Task JobClient JobTracker submitJob RPC Job Job jobInitQueue add JobInitThread JobInProgress::initTasks Job InputSplit TaskTracker HeartBeat (heartbeat ) TaskTracker TaskTrackerAction LaunchJobAction, KillJobAction, KillTaskAction, ReinitTrackerAction TaskTracker Task LaunchTaskAction Task TaskTracker getNewTaskForTaskTracker Map Reduce JobInProgress obtainNewMapTask obtainNewReduceTask findNewTask TaskInProgress::hasSpeculativeTask SpeculativeTask • Task • SpeculativeTask • SPECULATIVE GAP(2 ) • SPECULATIVE LAG(60 ) • Task 49
  • 53. 5.8. org.apache.hadoop.mapred 5 5.8.6 TaskTracker TaskTracker Task offerService JobTracker HeartBeat LaunchTaskAction (startNewTask ) startNewTask localizeJob jar HDFS launchTaskForJob (TaskInProgress::launchTask ) launchTask localizeTask createRunner TaskRunner TaskRunner TaskRunner java MapTask MapTask Map run Map run Map collector MapRunner::run MapRunner::run RecordReader map collector Reduce DirectMapOutputCollector MapOutputBuffer MapOutputBuffer MapOutputBuffer::collect map ReduceTask MergeSorter MergeSorter::addKeyValue MergeSorter (maxBufferSize) (bufferWriter) sortAndSpillToDisk sortAndSpillToDisk MergeSorter (pendingSortImpl[i].sort()) Combiner combine RecordWriter (spill ) startPartition RecordWriter endPartition Partition ReduceTask Partition run collector::flush mergeParts Partition 1 SequenceFile::Sorter 50
  • 54. 5.9. 5 map ReduceTask ReduceTask ReduceTask Reduce run ReduceTask ReduceCopier fetchOut- puts Map Reducer Map 1 run reduce ReduceValuesIterator Reduce collector reduce collector colect RecordWriter Mapper map map Reducer reduce reduce 5.8.7 StatusHttpServer JobTracker, TaskTracker StatusHttpServer HTTP (4.4.1) HTTP Jetty[6] 5.9 Map Reduce UNIX 51
  • 55. 6 6 HDFS Hadoop MapReduce 6.1 12 DataNode TaskTracker 1 NameNode JobTracker 6.1 100MBps Ethernet 6.1 CPU Intel Xeon E5430 2.66 GHz Quad Core Memory 16G Disk SAS OS Linux 2.6.18-53.1.14.el5PAE NIC Broadcom NetXtreme II BGM5708 Gigabit Ethernet I/O Scheduler CFQ(Completely Fair Queing) 6.1.1 bonnie++[1] read/write bonnie++ 6.1 ( ) 80.2MB/sec ( ) 94.2MB/sec 347.7 6.2 HDFS HDFS MapReduce Hadoop TestDFSIO(hadoop-0.16.4-test.jar ) 1 3 52
  • 56. 6.2. HDFS 6 $ tar vzxf bonnie++-1.03c.tar.gz $ cd bonnie++-1.03c $ ./configure $ make $ ./bonnie++ Version 1.03c ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP tmr001 32G 44896 66 80263 14 39105 7 66683 94 94257 11 347.7 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 99920 98 585100 96 116893 100 101574 99 917100 100 121861 100 6.1 bonnie++ 6.2.1 6.2 1G 100 5.7.10 $ ./bin/hadoop jar hadoop-0.16.4-test.jar TestDFSIO -write -nrFiles 100 -fileSize 1000 6.2 1G * 100 6.3 80 75 70 65 MegaBytes/Sec 60 55 50 45 40 35 5 6 7 8 9 10 11 12 Machines 6.3 1G * 100 ( (MB) / ) 53
  • 57. 6.3. 6 6.2.2 100G 6.4 $ ./bin/hadoop jar hadoop-0.16.4-test.jar TestDFSIO -read -nrFiles 100 -fileSize 1000 6.4 1G * 100 6.5 200 190 180 MegaBytes/Sec 170 160 150 140 130 5 6 7 8 9 10 11 12 Machines 6.5 1G * 100 ( (MB) / ) 6.3 MapReduce 100G 1 6.3.1 Hadoop randomwrite 100G 6.6 Key, Value 10 1000 100G 1G 1 Map Key-Value 1KB 100M 6.7 54
  • 58. 6.3. 6 ?xml version=quot;1.0quot;? ?xml-stylesheet type=quot;text/xslquot; href=quot;configuration.xslquot;? configuration property nametest.randomwrite.min_key/name value10/value /property property nametest.randomwrite.max_key/name value1000/value /property property nametest.randomwrite.min_value/name value10/value /property property nametest.randomwrite.max_value/name value1000/value /property property nametest.randomwriter.bytes_per_map/name value1000000000/value /property property nametest.randomwrite.total_bytes/name value100000000000/value /property /configuration 6.6 100G (randomwriter.conf) $ ./bin/hadoop jar hadoop-0.16.4-examples.jar randomwriter -conf randomwriter.conf random 6.7 100G 100G 6.8 6.9 55
  • 59. 6.3. 6 9000 8000 7000 6000 Sec 5000 4000 3000 2000 1000 3 4 5 6 7 8 9 10 11 12 Machines 6.8 100G ( / ) 60 55 50 45 40 MegaBytes/Sec 35 30 25 20 15 10 3 4 5 6 7 8 9 10 11 12 Machines 6.9 100G ( (MB) / ) 56
  • 60. 6.3. 6 6.3.2 6.10 $ ./bin/hadoop jar hadoop-0.16.4-examples.jar sort random radom-sort 6.10 900 800 700 600 Sec 500 400 300 200 100 3 4 5 6 7 8 9 10 11 12 Machines 6.11 100G ( / ) 550 500 450 400 MegaBytes/Sec 350 300 250 200 150 100 3 4 5 6 7 8 9 10 11 12 Machines 6.12 100G ( (MB) / ) 6.11 6.12 57
  • 61. 6.4. 6 6.4 Hadoop 12 1 Hadoop 3 Hadoop 58
  • 62. 7 7 Hadoop GFS, Google MapReduce Hadoop Hadoop Hadoop Hadoop Hadoop 12 Hadoop 59
  • 63. [1] Bonnie++ project homepage. http://www.coker.com.au/bonnie++/. [2] Commons logging. http://commons.apache.org/logging/. [3] Hadoop dfs user guide. http://hadoop.apache.org/core/docs/current/hdfs user guide.html. [4] Hadoop project homepage. http://hadoop.apache.org/core/. [5] Hadoop streaming documentation. http://hadoop.apache.org/core/docs/current/streaming.html. [6] Jetty. http://www.mortbay.org/jetty-6/. [7] Kosmos filesystem. http://kosmosfs.sourceforge.net/. [8] Lucene project homepage. http://lucene.apache.org/. [9] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, 2008. [10] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. SIGOPS Oper. Syst. Rev., 37(5):29–43, 2003. 60