SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Hadoop
 lessons  learned
@tcurdt
 github.com/tcurdt
yourdailygeekery.com
Data
Hadoop - Lessons Learned
hiring
Agenda

· hadoop?  really?  cloud?
· integration
· mapreduce
· operations
· community  and  outlook
Why  Hadoop?
“It is a new and improved
version of enterprise tape
           drive”
20  machines
                                       Map  Reduce
20  files,  1.5  GB  each



                                hadoop job grep.jar


                                grep “needle” file


                                               ir
0   17.5   35.0   52.5   70.0



                                           f a
                                      u  n
Hadoop - Lessons Learned
Hadoop - Lessons Learned
Run  your  own?




http://bit.ly/elastic-mr-pig
Integration
black  box
Engineers

·   hadoop-cat

·   hadoop-grep

·   hadoop-range
     --prefix /logs
     --from 2012-05-15 --until 2012-05-22
     --postfix /*play*.seq | xargs hadoop jar

·   streaming  jobs
Non-Engineering  Folks



·   mount  hdfs

·   pig  /  hive

·   data  dumps
Map  Reduce
                                  HDFS files


                                 InputFormat


  Split                 Split                     Split                 Split


  Map                   Map                       Map                   Map


Combiner              Combiner                  Combiner              Combiner


  Sort                  Sort                      Sort                  Sort


                                  Partitioner


                                Copy and Merge


           Combiner                                        Combiner


           Reducer                                         Reducer


                                 OutputFormat
Job  Counters

12/05/25   01:27:38 INFO mapred.JobClient:        Reduce input records=106
..
12/05/25   01:27:38   INFO   mapred.JobClient:    Combine output records=409
12/05/25   01:27:38   INFO   mapred.JobClient:    Map input records=112705844
12/05/25   01:27:38   INFO   mapred.JobClient:    Reduce output records=4
12/05/25   01:27:38   INFO   mapred.JobClient:    Combine input records=64842079
..
12/05/25   01:27:38 INFO mapred.JobClient:        Map output records=64841776

map in                :   112705844   *********************************
map out               :    64841776   *****************
combine in            :    64842079   *****************
combine out           :         409   |
reduce in             :         106   |
reduce out            :           4   |


                                                 MAPREDUCE-346  (since  2009)
Job  Counters




map in        :   20000   **************
map out       :   40000   ******************************
combine in    :   40000   ******************************
combine out   :   10001   ********
reduce in     :   10001   ********
reduce out    :   10001   ********
Map-only




mapred.reduce.tasks = 0
EOF  on  append
public class EofSafeSequenceFileInputFormat<K,V>
  extends SequenceFileInputFormat<K,V> {
  ...
}

public class EofSafeRecordReader<K,V>
  extends RecordReader<K,V> {
  ...
  public boolean nextKeyValue()
    throws IOException, InterruptedException {
    try {
      return this.delegate.nextKeyValue();
    } catch(EOFException e) {
      return false;
    }
  }
  ...
}
Serialization


before

 ASN1, custom java serialization, Thrift


now


 protobuf
Custom  Writables
public static class Play extends CustomWritable {

    public final LongWritable time
      = new LongWritable();

    public final LongWritable owner_id
      = new LongWritable();

    public final LongWritable track_id
      = new LongWritable();

    public Play() {
      fields = new WritableComparable[] {
        owner_id, track_id, time };
    }
}
Fear  the  State




BytesWritable bytes = new BytesWritable();
...
byte[] buffer = bytes.getBytes();
Re-Iterate

public void reduce(
  LongTriple key,
  Iterable<LongWritable> values,
  Context ctx) {

    for(LongWritable v : values) { }
    for(LongWritable v : values) { }
}

public void reduce(
  LongTriple key,
  Iterable<LongWritable> values,
  Context ctx) {
    buffer.clear();
    for(LongWritable v : values) { buffer.add(v); }
    for(LongWritable v : buffer.values()) { }
}
                             HADOOP-5266  (applied  to  0.21.0)
BitSets



long min = 1;
long max = 10000000;

FastBitSet set = new FastBitSet(min, max);

for(long i = min; i<max; i++) {
    set.set(i);
}



                         org.apache.lucene.util.*BitSet
Data  Structures




http://bit.ly/data-structures
http://bit.ly/bloom-filters
http://bit.ly/stream-lib
General  Tips

·   test  on  small  datasets,  test  on  your  machine

·   many  reducers

·   always  consider  a  combiner  and  partitioner

·   pig  /  streaming  for  one-time  jobs,
    java/scala  for  recurring



     http://bit.ly/map-reduce-book
Operations

use  chef  /  puppet




runit  /  init.d



pdsh  /  dsh

    pdsh -w "hdd[001-019]" 
    "sudo sv restart /etc/sv/hadoop-tasktracker"
Hardware



·   2x  name  nodes  raid  1

·   12  cores,  48GB  RAM,  xfs,  2x1TB

·   n  x  data  nodes  no  raid

·   12  cores,  16GB  RAM,  xfs,  4x2TB
Monitoring

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=...

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=...

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=...

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
rpc.period=10
rpc.servers=...

# ignore
ugi.class=org.apache.hadoop.metrics.spi.NullContext
Monitoring




total  capacity   capacity  used
Compression

#  of  64MB  blocks
#  of  bytes  needed
#  of  bytes  used
#  bytes  reclaimed



    bzip2  /  gzip  /  lzo  /     snappy

                        io.seqfile.compression.type = BLOCK
                        io.seqfile.compression.blocksize = 512000
Janitor


hadoop-expire
 -url namenode.here
 -path /tmp
 -mtime 7d
 -delete
The last block of an HDFS block only
occupies the required space. So a 4k
file only consumes 4k on disk.
-- Owen

                             E D
                        S T
                  B U
Logfiles

find 
 -wholename "/var/log/hadoop/hadoop-*" 
 -wholename "/var/log/hadoop/job_*.xml" 
 -wholename "/var/log/hadoop/history/*" 
 -wholename "/var/log/hadoop/history/.*.crc" 
 -wholename "/var/log/hadoop/history/done/*" 
 -wholename "/var/log/hadoop/history/done/.*.crc" 
 -wholename "/var/log/hadoop/userlogs/attempt_*" 
 -mtime +7 
 -daystart 
 -delete
Limits

sysctl.conf


     fs.file-max = 128000


limits.conf


    hdfs hard nofile 128000
    hdfs soft nofile 64000
    mapred hard nofile 128000
    mapred soft nofile 64000
Localhost

before


  127.0.0.1   localhost localhost.localdomain
  127.0.1.1   hdd01.some.net hdd01



hadoop

 127.0.0.1    localhost localhost.localdomain
 127.0.1.1    hdd01
Rackaware
site  config

    <property>
      <name>topology.script.file.name</name>
      <value>/path/to/script/location-from-ip</value>
      <final>true</final>
    </property>




topology  script

    #!/usr/bin/ruby
    location = {
      'hdd001.some.net'   =>   '/ams/1',
      '10.20.2.1'         =>   '/ams/1',
      'hdd002.some.net'   =>   '/ams/2',
      '10.20.2.2'         =>   '/ams/2',
    }

    puts ARGV.map { |ip| location[ARGV.first] || '/default-rack' }.join(' ')
Fix  the  Policy



for f in `hdfs hadoop fsck / | grep "Replica
placement policy is violated" | awk -F: '{print $1}'
| sort | uniq | head -n1000`; do
  hadoop fs -setrep -w 4 $f
  hadoop fs -setrep 3 $f
done
Fsck



hadoop fsck / -openforwrite -files | grep -i
"OPENFORWRITE: MISSING 1 blocks of total size" | awk
'{print $1}' | xargs -L 1 -i hadoop dfs -mv {} /lost
+notfound
Community



hadoop




              *  from  markmail.org
Community



     The  Enterprise  Effect


“The  Community  Effect”  (in  2011)
Hadoop - Lessons Learned
Hadoop - Lessons Learned
Community


core




        mapreduce


             *  from  markmail.org
The  Future


incremental
                real  time

                             refined  API
 flexible  pipelines


         refined  implementation
Real  Time  Datamining  and  Aggregation  at  Scale  (Ted  Dunning)


        Eventually  Consistent  Data  Structures  (Sean  Cribbs)


            Real-time  Analytics  with  HBase  (Alex  Baranau)


Profiling  and  performance-tuning  your  Hadoop  pipelines  (Aaron  Beppu)


         From  Batch  to  Realtime  with  Hadoop  (Lars  George)


         Event-Stream  Processing  with  Kafka  (Tim  Lossen)


    Real-/Neartime  analysis  with  Hadoop  &  VoltDB  (Ralf  Neeb)
Take  Aways



· use  hadoop  only  if  you  must
· really  understand  the  pipeline
· unbox  the  black  box
That’s  it  
    folks!
      @tcurdt
 github.com/tcurdt
yourdailygeekery.com

Weitere ähnliche Inhalte

Was ist angesagt?

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationDataWorks Summit
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceMing Ma
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 

Was ist angesagt? (20)

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Hadoop 24/7
Hadoop 24/7Hadoop 24/7
Hadoop 24/7
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 

Andere mochten auch

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Andere mochten auch (11)

Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Ähnlich wie Hadoop - Lessons Learned

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍Tae Young Lee
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Cosimo Streppone
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Jyotirmoy Sundi
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 

Ähnlich wie Hadoop - Lessons Learned (20)

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Hadoop
HadoopHadoop
Hadoop
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 

Kürzlich hochgeladen

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Kürzlich hochgeladen (20)

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Hadoop - Lessons Learned