SlideShare a Scribd company logo
1 of 38
Download to read offline
Apache Hadoop & the Cloud


Jim Walker
Dir. Product Marketing, Hortonworks
Twitter @jaymce

July 10, 2012




© Hortonworks Inc. 2012
1941


               2012
                                 Page 2
© Hortonworks Inc. 2012
Big data market segments

                          Software
  Hardware                                ETL & Mgmnt              Analytics       Applications           Services
                        Distributions
•  Storage             •  OSS Apache      •  Distributed file   •  Analytic       •  Data            •  Consulting
•  Servers                Hadoop             stores                application       visualization   •  Training
•  Networking          •  Enterprise      •  NoSQL                 development       tools           •  Tech support
                          Distributions      databases             platforms      •  Business        •  Software
                       •  Non-Hadoop      •  Data               •  Advanced          intelligence       maintenance
                          big data           integration           analytics         applications    •  Hardware
                          frameworks      •  Data quality &        applications                         maintenance
                                             governance                                              •  hosting




         Next Generation Data Warehouse

•  MPP columnar data warehouse appliances
•  In-memory analytics engines
•  Fast data loading




                © Hortonworks Inc. 2012
Big data market segments

                          Software
  Hardware                                ETL & Mgmnt              Analytics       Applications           Services
                        Distributions
•  Storage             •  OSS Apache      •  Distributed file   •  Analytic       •  Data            •  Consulting
•  Servers                Hadoop             stores                application       visualization   •  Training
•  Networking          •  Enterprise      •  NoSQL                 development       tools           •  Tech support
                          Distributions      databases             platforms      •  Business        •  Software
                       •  Non-Hadoop      •  Data               •  Advanced          intelligence       maintenance
                          big data           integration           analytics         applications    •  Hardware
                          frameworks      •  Data quality &        applications                         maintenance
                                             governance                                              •  hosting

                               cloud          cloud                 cloud              cloud


         Next Generation Data Warehouse

•  MPP columnar data warehouse appliances
•  In-memory analytics engines
•  Fast data loading




                © Hortonworks Inc. 2012
Analytics started with basic purchase history…




 Megabytes
                ERP
                 Purchase detail
                 Purchase record
                 Payment record



                                       Increasing Data Variety and Complexity

                                                                  Source: Crated in conjunction with Teradata, Inc.


             © Hortonworks Inc. 2012
then we added customer information…




Gigabytes       CRM
                                   Segmentation

                                       Customer Touches
 Megabytes
                ERP
                 Purchase detail            Support Contacts
                 Purchase record
                 Payment record                   Offer details



                                              Increasing Data Variety and Complexity

                                                                         Source: Crated in conjunction with Teradata, Inc.


             © Hortonworks Inc. 2012
and the web started to impact…




Terabytes       WEB                Web logs

                                       A/B testing

                                                Behavioral Targeting
 Gigabytes      CRM                                        Dynamic Pricing
                                   Segmentation
                                                                  Search Marketing
                                       Customer Touches
 Megabytes
                ERP                                                  Affiliate Networks
                 Purchase detail              Support Contacts
                                                                       Dynamic Funnels
                 Purchase record
                 Payment record                   Offer details          Offer history



                                                Increasing Data Variety and Complexity

                                                                                          Source: Crated in conjunction with Teradata, Inc.


             © Hortonworks Inc. 2012
Big data changes the game

                                                                    Transactions + Interactions
Petabytes
                 BIG DATA                       Mobile Web                  + Observations
                                                Sentiment

                                                 User Click Stream
                                                                   SMS/MMS
                                                                                  = BIG DATA
                                                                        Speech to Text

                                                               Social Interactions & Feeds
 Terabytes       WEB                Web logs
                                                                        Spatial & GPS Coordinates
                                        A/B testing
                                                                               Sensors / RFID / Devices
                                                 Behavioral Targeting
  Gigabytes      CRM                                        Dynamic Pricing
                                                                                       Business Data Feeds
                                    Segmentation                                             External Demographics
                                                                   Search Marketing
                                        Customer Touches                                       User Generated Content
  Megabytes
                 ERP                                                  Affiliate Networks
                  Purchase detail              Support Contacts                                     HD Video, Audio, Images
                                                                        Dynamic Funnels
                  Purchase record
                                                   Offer details          Offer history               Product/Service Logs
                  Payment record



                                                 Increasing Data Variety and Complexity

                                                                                             Source: Crated in conjunction with Teradata, Inc.


              © Hortonworks Inc. 2012
Next-gen data architecture drivers


Business                •     Enable new business models & drive faster growth (20%+)
 Drivers                •     Find insights for competitive advantage & optimal returns




Technical               •     Data continues to grow exponentially

  Drivers               •     Data is increasingly everywhere and in many formats
                        •     Legacy solutions unfit for new requirements growth
 cloud

Financial               •     Cost of data systems, as % of IT spend, continues to grow
  Drivers               •     Cost advantages of commodity hardware & open source




         © Hortonworks Inc. 2012
Apache Hadoop
                          Open Source Data Management Software



                          One of the best examples of open source
                          driving innovation and creating a market
                           •  Foundation for big data solutions
                           •  Enables a rational economics model
                           •  Powers data-driven business
                           •  Commodity hardware
                           •  Loosely coupled, ship early/ship often
                           •  Consists of many specialized sub-projects

© Hortonworks Inc. 2012
Apache Hadoop & Cloud Makes Sense

                             •  Broader access of Hadoop to end users, IT
                                professionals, and developers
   cloud
                             •  Easy installation and configuration and
                                simplified programming
                             •  Enterprise-ready distribution with greater
                                security, performance, ease of management
                                and options for Hybrid IT usage.
                             •  Integrate with everything via RESTful API
                             •  Spin up a cluster on demand
                             •  Ease management




                                                                          Page 11
   © Hortonworks Inc. 2012
5 Reasons for Hadoop in the Cloud


                                              People say "should
                                              you run Hadoop in
                                              the cloud?”


                                              I say "it depends".




 http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html

                                                                                  Page 12
      © Hortonworks Inc. 2012
5 Reasons for Hadoop in the Cloud


                             1        If your data is stored in a cloud, local analysis
                                      may make more sense… "work near the data"


                             2        For periodic processing (nightly, etc…)
                                      it might make sense to just rent.


                             3        No upfront capital expense,
                                      fund from success


                             4        Easier to expand a cluster;
                                      no need to buy just find


                             5        Eliminate networking concerns

                             http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html

                                                                                                              Page 13
   © Hortonworks Inc. 2012
What is Apache Hadoop?

1 PROCESSING – Map/Reduce
                              •    Splits a task across processors “near”
                                   the data & assembles results
                              •    2004 white paper
                                   MapReduce: Simplified Data Processing on Large Clusters

                              •    Base of much new tech




2 STORAGE – Hadoop Distributed File System
                              •    Distributed across “nodes”
                              •    Natively redundant
                              •    Name node tracks locations



    © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
                               Apache Hive is a data
5    HCatalog                  warehouse infrastructure built
                               on top of Hadoop (originally by
6    Pig                       Facebook) for providing data
                               summarization, ad-hoc query,
7    Oozie                     and analysis of large datasets.
                               It provides a mechanism to
                               project structure onto this data
8    Ambari                    and query the data using a
                               SQL-like language called
9    Sqoop                     HiveQL (HQL).

10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
5    HCatalog                  HBase is a non-relational
                               database. It is columnar and
                               provides fault-tolerant storage
6    Pig                       and quick access to large
                               quantities of sparse data. It
7    Oozie                     also adds transactional
                               capabilities to Hadoop,
8    Ambari                    allowing users to conduct
                               updates, inserts and deletes.
9    Sqoop
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive                      HCatalog
4    HBase                     HCatalog is a metadata
                               management service for
5    HCatalog                  Apache Hadoop. It opens up
                               the platform and allows
6    Pig                       interoperability across data
                               processing tools such as Pig,
                               Map Reduce and Hive. It also
7    Oozie                     provides a table abstraction so
                               that users need not be
8    Ambari                    concerned with where or how
                               their data is stored.
9    Sqoop
                               Aster SQL-H interfaces
                               with HCatalog
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
                               Apache Pig allows you to write
                               complex map reduce
5    HCatalog                  transformations using a simple
                               scripting language. Pig latin
6    Pig                       (the language) defines a set of
                               transformations on a data set
7    Oozie                     such as aggregate, join and
                               sort among others. Pig Latin is
                               sometimes extended using
8    Ambari                    UDF (User Defined
                               Functions), which the user can
9    Sqoop                     write in Java and then call
                               directly from the language.
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
5    HCatalog                  Oozie coordinates jobs written
                               in multiple languages such as
6    Pig                       Map Reduce, Pig and Hive. It
                               is a workflow system that links
7    Oozie                     these jobs and allows
                               specification of order and
                               dependencies between them.
8    Ambari
9    Sqoop
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
5    HCatalog                  Apache Ambari
                               operationalizes Hadoop. It
                               provides a mechanism to
6    Pig                       monitor and manage a cluster.
                               It also provisions nodes.
7    Oozie
                               Ambari is a monitoring,
8    Ambari                    administration and lifecycle
                               management project for
                               Apache Hadoop clusters
9    Sqoop
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
5    HCatalog
                               Sqoop is a set of tools that
                               allow non-Hadoop data stores
6    Pig                       to interact with traditional
                               relational databases and data
7    Oozie                     warehouses.

8    Ambari
9    Sqoop
10   Zookeeper

     © Hortonworks Inc. 2012
Apache Hadoop related projects

3    Hive
4    HBase
5    HCatalog                  ZooKeeper is a centralized
                               service for maintaining
6    Pig                       configuration information,
                               naming, providing distributed
7    Oozie                     synchronization, and providing
                               group services.
8    Ambari
9    Sqoop
10   Zookeeper

     © Hortonworks Inc. 2012
Hadoop in Action
                                                       Interfaces with HCatalog to
  1     Web Log files via WebHDFS APIs             4
                                                       analyze website visits by the
                                                       type of end results
  Website    Web
Interactions Logs

                                        Big Data
      Order                             Refinery
                    DB
      Data


Customer
                    DB
  Data


        Customer & Order data via Talend               Pre-processes, refines, and
 2                                                 3
        & HCatalog for schema                          joins data via Talend, Pig, &
                                                       HCatalog


              © Hortonworks Inc. 2012
Hortonworks Vision & Role

                                We believe that by the end of 2015,
                                more than half the world's data will be
                                processed by Apache Hadoop.



  1       Be diligent stewards of the open source core

  2       Be tireless innovators beyond the core

  3       Provide robust data platform services & open APIs

  4       Enable the ecosystem at each layer of the stack

  5       Make the platform enterprise-ready & easy to use


      © Hortonworks Inc. 2012
Balancing Innovation & Stability
customers
 relative %




                                              The CHASM
          Innovators,              Early                     Early
                                                                           Late majority,            Laggards,
          technology             adopters,                  majority,
                                                                           conservatives              Skeptics
          enthusiasts           visionaries               pragmatists




                                                                                                                          time
                  Customers want                                            Customers want
              technology & performance                                  solutions & convenience

                                                                                             Source: Geoffrey Moore - Crossing the Chasm



                                                                                                                                 Page 25
                 © Hortonworks Inc. 2012
Enabling Hadoop as Enterprise Big Data Platform



  Applications,                                                              Installation & Configuration,
  Business Tools,                                                            Administration,
  Development Tools,                                                         Monitoring,
  Open APIs and access                                                       High Availability,
  Data Movement & Integration,                                               Replication,
  Data Management Systems,                                                   Multi-tenancy, ..
  Systems Management
                                             Hortonworks
                                             Data Platform

                                         DEVELOPER
                                  Data Platform Services & Open APIs

                                     Metadata, Indexing, Search, Security,
                                    Management, Data Extract & Load, APIs




        © Hortonworks Inc. 2012
Hortonworks Data Platform


                             The ONLY 100% open source data
                             platform for Hadoop

                    •  Tightly aligned with core Apache code line
                    •  All code committed back to open source
                    •  Most complete Apache Hadoop platform
                    •  Comprehensive management and monitoring
                    •  Intuitive graphical data integration tools
                    •  Centralized metadata services for easy data sharing



                                                                        Page 27
   © Hortonworks Inc. 2012
Hortonworks Data Platform

                                                           •  Simplify deployment to get
                                                              started quickly and easily

                                                           •  Monitor, manage any size cluster
                                                              with familiar console and tools

                                                           •  Only platform to include data
                                                              integration services to interact
                                1                             with any data source

                                                           •  Metadata services opens the
                                                              platform for integration with
           Hortonworks Data Platform                          existing applications
    Delivers enterprise grade functionality on a proven
    Apache Hadoop distribution to ease management,         •  Dependable high availability
   simplify use and ease integration into the enterprise      architecture




The only 100% open source data platform for Apache Hadoop

      © Hortonworks Inc. 2012
Apache Distribution Stack

Built on Hadoop 1.0
(a.k.a. 0.20.205)
 •  Proven at large scale enterprise
    implementations                                                         0.92.1+                                           5.1.1
 •  Most stable and reliable version   1.0.3
                                                           0.9.2                                        3.3.4
    of Hadoop to date
 •  First Apache line supporting               0.4.0
    security, HBase, WebHDFS
 •  Driven by core committers and                                  0.9.0+                      3.1.3
    architects at Hortonworks
                                                                                      0.9.0+
                                                                                                                     beta




                                                                                                         Zookeeper
Includes necessary components



                                                HCatalog




                                                                                                                     Ambari
                                                                              HBase




                                                                                                                                 Talend
                                                                                       Sqoop
already integrated and tested




                                                                                                Oozie
                                        Core




                                                                    Hive
                                                            Pig
together
                                       1.0.3   0.4.0       0.9.2 0.9.0+ 0.92.1+ 0.9.0+ 3.1.3            3.3.4        beta     5.1.1
Most stable versions of all
                                                              Hortonworks Distribution
components are chosen
                                               Tested, Hardened & Proven
                                                Distribution Reduces Risk
                                                                                                                       Page 29
         © Hortonworks Inc. 2012
Management & Monitoring Svcs

Hortonworks Management Center
   – View the health of cluster operations,
     server utilization and performance levels
   – Customizable dashboards
   – APIs for integration into 3rd party
     monitoring tools
   – 100% open source management &
     monitoring, powered by Apache Ambari,
     Puppet, Nagios and Gaglia
   – Simple wizard-based installation,
     configuration & provisioning of any size
     Hadoop cluster

Optimize performance for your Hadoop cluster
Simplify Installation and provisioning

                                                 Page 30
       © Hortonworks Inc. 2012
Data Integration Services

•  Intuitive graphical data
   integration tools for HDFS,
   Hive, HBase, HCatalog and Pig

•  Oozie scheduling allows you to
   manage and stage jobs

•  Connectors for any database,
   business application or system

•  Integrated HCatalog storage

 Bridge the gap between
 legacy data & Hadoop

 Simplify and speed development

                                    Page 31
      © Hortonworks Inc. 2012
Which is best for the cloud?



                              vs.




                                    Page 32
    © Hortonworks Inc. 2012
Metadata Services
Apache HCatalog provides flexible metadata
services across tools and external access
 •  Consistency of metadata and data models across tools
    (MapReduce, Pig, HBase and Hive)
 •  Accessibility: share data as tables in and out of HDFS
 •  Availability: enables flexible, thin-client access via REST API




                                  HCatalog                        Shared table
                                                                  and schema
                                                                  management
   •  Raw Hadoop data                        Table access         opens the
   •  Inconsistent, unknown                  Aligned metadata     platform
   •  Tool specific access                   REST API



        © Hortonworks Inc. 2012
Services Integration

Provides RESTful API as
“front door” for Hadoop             Existing & New Applications




•    Opens the door to              WebHDFS            HCatalog RESTful Web Services
     languages other than Java

•    Thin clients via web                      MapReduce           Pig   Hive
     services vs. fat-clients in                             HCatalog
     gateway

•    Insulation from interface                                           External
                                        HDFS               HBase
     changes release to release                                           Store




     Opens Hadoop to integration with existing and new applications


          © Hortonworks Inc. 2012
Use cases: optimize outcomes at scale
                      Media     optimize                 Content
        Intelligence            optimize                 Detection
         Investment             optimize                 Algorithms
        Advertising             optimize                 Performance
                      Fraud     optimize                 Prevention
          Regulation            optimize                 Compliance
 Retail / Wholesale             optimize                 Inventory turns
    Manufacturing               optimize                 Supply chains
          Healthcare            optimize                 Patient outcomes
            Education           optimize                 Learning outcomes
      Government                optimize                 Citizen services
                                      Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

      © Hortonworks Inc. 2012
Connecting Transactions + Interactions + Observations
 Audio,                 Retain runtime models and
 Video,
Images
                         historical data for ongoing           5     Business         Web, Mobile, CRM,
                              refinement & analysis                                   ERP, SCM, …
                                                                   Transactions
 Docs,                                                             & Interactions
 Text,
 XML


  Web
 Logs,
 Clicks
                           Big Data                      4                    Data
Social,                    Refinery                                       Discovery &                           Classic
Graph,                                                                                                       1     ETL
Feeds                                                                     Investigative                      processing
                                                                            Analytics
Sensors,     3                                    Share refined
Devices,
  RFID
                                                  data & runtime                         2
           Store, aggregate, and                  models                                  Interactive
           transform multi-structured                                                     data
Spatial,   data to unlock value                                         Business          exploration
 GPS
                                                                       Intelligence
                                                                       & Analytics
                                       Retain historical data to
Events,
 Other
                                       unlock additional value     6
                                                                                      Dashboards, Reports,
                                                                                      Visualization, …


             © Hortonworks Inc. 2012
5 Reasons for Hadoop in the Cloud


                             1        If your data is stored in a cloud, local analysis
                                      may make more sense… "work near the data"


                             2        For periodic processing (nightly, etc…)
                                      it might make sense to just rent.


                             3        No upfront capital expense,
                                      fund from success


                             4        Easier to expand a cluster;
                                      no need to buy just find


                             5        Eliminate networking concerns

                             http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html

                                                                                                              Page 37
   © Hortonworks Inc. 2012
THANK YOU

                                          Jim Walker
                                          jim@hortonworks.com
                                          @jaymce




1                                 Get Hortonworks Data Platform
                                  hortonworks.com/download




2   Use the getting started guide
    hortonworks.com/get-started



3   Learn more… get support
     hortonworks.com/training           hortonworks.com/support



                                                                  Page 38
        © Hortonworks Inc. 2012

More Related Content

What's hot

Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCAmazon Web Services
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid Clouds
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid CloudsBe Prepared for Tomorrow's IT Forecast Great Chance of Hybrid Clouds
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid CloudsEucalyptus Systems, Inc.
 
RightScale overview and why I find it elegant
RightScale overview and why I find it elegantRightScale overview and why I find it elegant
RightScale overview and why I find it elegantGiri Fox
 
Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Hortonworks
 
Open stack in action hp cloud openstack
Open stack in action  hp cloud  openstackOpen stack in action  hp cloud  openstack
Open stack in action hp cloud openstackeNovance
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
NIC 2013 - Configure and Deploy Private Cloud
NIC 2013 - Configure and Deploy Private CloudNIC 2013 - Configure and Deploy Private Cloud
NIC 2013 - Configure and Deploy Private CloudKristian Nese
 
Leveraging The Clouds For Reliable Web Applications Presentation
Leveraging The Clouds For Reliable Web Applications PresentationLeveraging The Clouds For Reliable Web Applications Presentation
Leveraging The Clouds For Reliable Web Applications PresentationWeb 2.0 Expo
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Java Web Programming Using Cloud Platform: Module 10
Java Web Programming Using Cloud Platform: Module 10Java Web Programming Using Cloud Platform: Module 10
Java Web Programming Using Cloud Platform: Module 10IMC Institute
 
Presentation introduction to cloud computing and technical issues
Presentation   introduction to cloud computing and technical issuesPresentation   introduction to cloud computing and technical issues
Presentation introduction to cloud computing and technical issuesxKinAnx
 
Deploying hp cloud
Deploying hp cloudDeploying hp cloud
Deploying hp clouddamienjoyce
 
Kaavo Introduction 08012011
Kaavo Introduction 08012011Kaavo Introduction 08012011
Kaavo Introduction 08012011sams2618
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
CCitDG Presenation
CCitDG PresenationCCitDG Presenation
CCitDG PresenationDatabarracks
 

What's hot (19)

Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid Clouds
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid CloudsBe Prepared for Tomorrow's IT Forecast Great Chance of Hybrid Clouds
Be Prepared for Tomorrow's IT Forecast Great Chance of Hybrid Clouds
 
RightScale overview and why I find it elegant
RightScale overview and why I find it elegantRightScale overview and why I find it elegant
RightScale overview and why I find it elegant
 
Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?
 
Open stack in action hp cloud openstack
Open stack in action  hp cloud  openstackOpen stack in action  hp cloud  openstack
Open stack in action hp cloud openstack
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
NIC 2013 - Configure and Deploy Private Cloud
NIC 2013 - Configure and Deploy Private CloudNIC 2013 - Configure and Deploy Private Cloud
NIC 2013 - Configure and Deploy Private Cloud
 
Leveraging The Clouds For Reliable Web Applications Presentation
Leveraging The Clouds For Reliable Web Applications PresentationLeveraging The Clouds For Reliable Web Applications Presentation
Leveraging The Clouds For Reliable Web Applications Presentation
 
Resume_KapilDeshpande
Resume_KapilDeshpandeResume_KapilDeshpande
Resume_KapilDeshpande
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Java Web Programming Using Cloud Platform: Module 10
Java Web Programming Using Cloud Platform: Module 10Java Web Programming Using Cloud Platform: Module 10
Java Web Programming Using Cloud Platform: Module 10
 
Presentation introduction to cloud computing and technical issues
Presentation   introduction to cloud computing and technical issuesPresentation   introduction to cloud computing and technical issues
Presentation introduction to cloud computing and technical issues
 
Deploying hp cloud
Deploying hp cloudDeploying hp cloud
Deploying hp cloud
 
Kaavo Introduction 08012011
Kaavo Introduction 08012011Kaavo Introduction 08012011
Kaavo Introduction 08012011
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
CCitDG Presenation
CCitDG PresenationCCitDG Presenation
CCitDG Presenation
 

Viewers also liked

Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the CloudMapR Technologies
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01MapR Technologies
 
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudne
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest TrudneWdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudne
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudneflexray
 
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupu
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupuvSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupu
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupuMaciej Stopa
 
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...asquelt
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
Xen Orchestra: XAPI and XenServer from the web-XPUS13 Lambert
Xen Orchestra: XAPI and XenServer from the web-XPUS13 LambertXen Orchestra: XAPI and XenServer from the web-XPUS13 Lambert
Xen Orchestra: XAPI and XenServer from the web-XPUS13 LambertThe Linux Foundation
 

Viewers also liked (20)

CloudStack technical overview
CloudStack technical overviewCloudStack technical overview
CloudStack technical overview
 
CloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community worksCloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community works
 
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud ComputingvBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
 
vBACD July 2012 - Scaling Storage with Ceph
vBACD July 2012 - Scaling Storage with CephvBACD July 2012 - Scaling Storage with Ceph
vBACD July 2012 - Scaling Storage with Ceph
 
vBACD July 2012 - Xen Cloud Platform
vBACD July 2012 - Xen Cloud PlatformvBACD July 2012 - Xen Cloud Platform
vBACD July 2012 - Xen Cloud Platform
 
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS cloudsCloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
 
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState StackatovBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
 
CloudStack Hyderabad Meetup: Migrating applications to IaaS clouds
CloudStack Hyderabad Meetup: Migrating applications to IaaS cloudsCloudStack Hyderabad Meetup: Migrating applications to IaaS clouds
CloudStack Hyderabad Meetup: Migrating applications to IaaS clouds
 
Apache CloudStack from API to UI
Apache CloudStack from API to UIApache CloudStack from API to UI
Apache CloudStack from API to UI
 
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
 
CloudStack Architecture
CloudStack ArchitectureCloudStack Architecture
CloudStack Architecture
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
 
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudne
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest TrudneWdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudne
Wdrozenie Chmury W Oparciu O VMware vCloud Suite W Polsce Nie Jest Trudne
 
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupu
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupuvSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupu
vSphere Data Protection czyli jak utracic dane dzieki oprogramowaniu do backupu
 
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...
Advanced automation and provisioning in Red Hat Satellite 6 - Red Hat Archite...
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Xen Orchestra: XAPI and XenServer from the web-XPUS13 Lambert
Xen Orchestra: XAPI and XenServer from the web-XPUS13 LambertXen Orchestra: XAPI and XenServer from the web-XPUS13 Lambert
Xen Orchestra: XAPI and XenServer from the web-XPUS13 Lambert
 

Similar to vBACD July 2012 - Apache Hadoop, Now and Beyond

Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotInside Analysis
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Hortonworks
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityDatabase Architechs
 

Similar to vBACD July 2012 - Apache Hadoop, Now and Beyond (20)

Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 

More from CloudStack - Open Source Cloud Computing Project

More from CloudStack - Open Source Cloud Computing Project (12)

Virtualization in the cloud
Virtualization in the cloudVirtualization in the cloud
Virtualization in the cloud
 
Build a Cloud Day San Francisco - Ubuntu Cloud
Build a Cloud Day San Francisco - Ubuntu CloudBuild a Cloud Day San Francisco - Ubuntu Cloud
Build a Cloud Day San Francisco - Ubuntu Cloud
 
CloudStack Scalability
CloudStack ScalabilityCloudStack Scalability
CloudStack Scalability
 
Cloudstack UI Customization
Cloudstack UI CustomizationCloudstack UI Customization
Cloudstack UI Customization
 
CloudStack Networking
CloudStack NetworkingCloudStack Networking
CloudStack Networking
 
Management server internals
Management server internalsManagement server internals
Management server internals
 
Introduction to CloudStack
Introduction to CloudStack Introduction to CloudStack
Introduction to CloudStack
 
vBACD - Introduction to Puppet, Configuration Management and IT Automation So...
vBACD - Introduction to Puppet, Configuration Management and IT Automation So...vBACD - Introduction to Puppet, Configuration Management and IT Automation So...
vBACD - Introduction to Puppet, Configuration Management and IT Automation So...
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
 
vBACD - Crash Course in Open Source Cloud Computing - 2/28
vBACD - Crash Course in Open Source Cloud Computing - 2/28vBACD - Crash Course in Open Source Cloud Computing - 2/28
vBACD - Crash Course in Open Source Cloud Computing - 2/28
 
vBACD - Introduction to Opscode Chef - 2/29
vBACD - Introduction to Opscode Chef - 2/29vBACD - Introduction to Opscode Chef - 2/29
vBACD - Introduction to Opscode Chef - 2/29
 
vBACD - Deploying Infrastructure-as-a-Service with CloudStack - 2/28
vBACD - Deploying Infrastructure-as-a-Service with CloudStack - 2/28vBACD - Deploying Infrastructure-as-a-Service with CloudStack - 2/28
vBACD - Deploying Infrastructure-as-a-Service with CloudStack - 2/28
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

vBACD July 2012 - Apache Hadoop, Now and Beyond

  • 1. Apache Hadoop & the Cloud Jim Walker Dir. Product Marketing, Hortonworks Twitter @jaymce July 10, 2012 © Hortonworks Inc. 2012
  • 2. 1941 2012 Page 2 © Hortonworks Inc. 2012
  • 3. Big data market segments Software Hardware ETL & Mgmnt Analytics Applications Services Distributions •  Storage •  OSS Apache •  Distributed file •  Analytic •  Data •  Consulting •  Servers Hadoop stores application visualization •  Training •  Networking •  Enterprise •  NoSQL development tools •  Tech support Distributions databases platforms •  Business •  Software •  Non-Hadoop •  Data •  Advanced intelligence maintenance big data integration analytics applications •  Hardware frameworks •  Data quality & applications maintenance governance •  hosting Next Generation Data Warehouse •  MPP columnar data warehouse appliances •  In-memory analytics engines •  Fast data loading © Hortonworks Inc. 2012
  • 4. Big data market segments Software Hardware ETL & Mgmnt Analytics Applications Services Distributions •  Storage •  OSS Apache •  Distributed file •  Analytic •  Data •  Consulting •  Servers Hadoop stores application visualization •  Training •  Networking •  Enterprise •  NoSQL development tools •  Tech support Distributions databases platforms •  Business •  Software •  Non-Hadoop •  Data •  Advanced intelligence maintenance big data integration analytics applications •  Hardware frameworks •  Data quality & applications maintenance governance •  hosting cloud cloud cloud cloud Next Generation Data Warehouse •  MPP columnar data warehouse appliances •  In-memory analytics engines •  Fast data loading © Hortonworks Inc. 2012
  • 5. Analytics started with basic purchase history… Megabytes ERP Purchase detail Purchase record Payment record Increasing Data Variety and Complexity Source: Crated in conjunction with Teradata, Inc. © Hortonworks Inc. 2012
  • 6. then we added customer information… Gigabytes CRM Segmentation Customer Touches Megabytes ERP Purchase detail Support Contacts Purchase record Payment record Offer details Increasing Data Variety and Complexity Source: Crated in conjunction with Teradata, Inc. © Hortonworks Inc. 2012
  • 7. and the web started to impact… Terabytes WEB Web logs A/B testing Behavioral Targeting Gigabytes CRM Dynamic Pricing Segmentation Search Marketing Customer Touches Megabytes ERP Affiliate Networks Purchase detail Support Contacts Dynamic Funnels Purchase record Payment record Offer details Offer history Increasing Data Variety and Complexity Source: Crated in conjunction with Teradata, Inc. © Hortonworks Inc. 2012
  • 8. Big data changes the game Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Dynamic Pricing Business Data Feeds Segmentation External Demographics Search Marketing Customer Touches User Generated Content Megabytes ERP Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Source: Crated in conjunction with Teradata, Inc. © Hortonworks Inc. 2012
  • 9. Next-gen data architecture drivers Business •  Enable new business models & drive faster growth (20%+) Drivers •  Find insights for competitive advantage & optimal returns Technical •  Data continues to grow exponentially Drivers •  Data is increasingly everywhere and in many formats •  Legacy solutions unfit for new requirements growth cloud Financial •  Cost of data systems, as % of IT spend, continues to grow Drivers •  Cost advantages of commodity hardware & open source © Hortonworks Inc. 2012
  • 10. Apache Hadoop Open Source Data Management Software One of the best examples of open source driving innovation and creating a market •  Foundation for big data solutions •  Enables a rational economics model •  Powers data-driven business •  Commodity hardware •  Loosely coupled, ship early/ship often •  Consists of many specialized sub-projects © Hortonworks Inc. 2012
  • 11. Apache Hadoop & Cloud Makes Sense •  Broader access of Hadoop to end users, IT professionals, and developers cloud •  Easy installation and configuration and simplified programming •  Enterprise-ready distribution with greater security, performance, ease of management and options for Hybrid IT usage. •  Integrate with everything via RESTful API •  Spin up a cluster on demand •  Ease management Page 11 © Hortonworks Inc. 2012
  • 12. 5 Reasons for Hadoop in the Cloud People say "should you run Hadoop in the cloud?” I say "it depends". http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html Page 12 © Hortonworks Inc. 2012
  • 13. 5 Reasons for Hadoop in the Cloud 1 If your data is stored in a cloud, local analysis may make more sense… "work near the data" 2 For periodic processing (nightly, etc…) it might make sense to just rent. 3 No upfront capital expense, fund from success 4 Easier to expand a cluster; no need to buy just find 5 Eliminate networking concerns http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html Page 13 © Hortonworks Inc. 2012
  • 14. What is Apache Hadoop? 1 PROCESSING – Map/Reduce •  Splits a task across processors “near” the data & assembles results •  2004 white paper MapReduce: Simplified Data Processing on Large Clusters •  Base of much new tech 2 STORAGE – Hadoop Distributed File System •  Distributed across “nodes” •  Natively redundant •  Name node tracks locations © Hortonworks Inc. 2012
  • 15. Apache Hadoop related projects 3 Hive 4 HBase Apache Hive is a data 5 HCatalog warehouse infrastructure built on top of Hadoop (originally by 6 Pig Facebook) for providing data summarization, ad-hoc query, 7 Oozie and analysis of large datasets. It provides a mechanism to project structure onto this data 8 Ambari and query the data using a SQL-like language called 9 Sqoop HiveQL (HQL). 10 Zookeeper © Hortonworks Inc. 2012
  • 16. Apache Hadoop related projects 3 Hive 4 HBase 5 HCatalog HBase is a non-relational database. It is columnar and provides fault-tolerant storage 6 Pig and quick access to large quantities of sparse data. It 7 Oozie also adds transactional capabilities to Hadoop, 8 Ambari allowing users to conduct updates, inserts and deletes. 9 Sqoop 10 Zookeeper © Hortonworks Inc. 2012
  • 17. Apache Hadoop related projects 3 Hive HCatalog 4 HBase HCatalog is a metadata management service for 5 HCatalog Apache Hadoop. It opens up the platform and allows 6 Pig interoperability across data processing tools such as Pig, Map Reduce and Hive. It also 7 Oozie provides a table abstraction so that users need not be 8 Ambari concerned with where or how their data is stored. 9 Sqoop Aster SQL-H interfaces with HCatalog 10 Zookeeper © Hortonworks Inc. 2012
  • 18. Apache Hadoop related projects 3 Hive 4 HBase Apache Pig allows you to write complex map reduce 5 HCatalog transformations using a simple scripting language. Pig latin 6 Pig (the language) defines a set of transformations on a data set 7 Oozie such as aggregate, join and sort among others. Pig Latin is sometimes extended using 8 Ambari UDF (User Defined Functions), which the user can 9 Sqoop write in Java and then call directly from the language. 10 Zookeeper © Hortonworks Inc. 2012
  • 19. Apache Hadoop related projects 3 Hive 4 HBase 5 HCatalog Oozie coordinates jobs written in multiple languages such as 6 Pig Map Reduce, Pig and Hive. It is a workflow system that links 7 Oozie these jobs and allows specification of order and dependencies between them. 8 Ambari 9 Sqoop 10 Zookeeper © Hortonworks Inc. 2012
  • 20. Apache Hadoop related projects 3 Hive 4 HBase 5 HCatalog Apache Ambari operationalizes Hadoop. It provides a mechanism to 6 Pig monitor and manage a cluster. It also provisions nodes. 7 Oozie Ambari is a monitoring, 8 Ambari administration and lifecycle management project for Apache Hadoop clusters 9 Sqoop 10 Zookeeper © Hortonworks Inc. 2012
  • 21. Apache Hadoop related projects 3 Hive 4 HBase 5 HCatalog Sqoop is a set of tools that allow non-Hadoop data stores 6 Pig to interact with traditional relational databases and data 7 Oozie warehouses. 8 Ambari 9 Sqoop 10 Zookeeper © Hortonworks Inc. 2012
  • 22. Apache Hadoop related projects 3 Hive 4 HBase 5 HCatalog ZooKeeper is a centralized service for maintaining 6 Pig configuration information, naming, providing distributed 7 Oozie synchronization, and providing group services. 8 Ambari 9 Sqoop 10 Zookeeper © Hortonworks Inc. 2012
  • 23. Hadoop in Action Interfaces with HCatalog to 1 Web Log files via WebHDFS APIs 4 analyze website visits by the type of end results Website Web Interactions Logs Big Data Order Refinery DB Data Customer DB Data Customer & Order data via Talend Pre-processes, refines, and 2 3 & HCatalog for schema joins data via Talend, Pig, & HCatalog © Hortonworks Inc. 2012
  • 24. Hortonworks Vision & Role We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use © Hortonworks Inc. 2012
  • 25. Balancing Innovation & Stability customers relative % The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 25 © Hortonworks Inc. 2012
  • 26. Enabling Hadoop as Enterprise Big Data Platform Applications, Installation & Configuration, Business Tools, Administration, Development Tools, Monitoring, Open APIs and access High Availability, Data Movement & Integration, Replication, Data Management Systems, Multi-tenancy, .. Systems Management Hortonworks Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs © Hortonworks Inc. 2012
  • 27. Hortonworks Data Platform The ONLY 100% open source data platform for Hadoop •  Tightly aligned with core Apache code line •  All code committed back to open source •  Most complete Apache Hadoop platform •  Comprehensive management and monitoring •  Intuitive graphical data integration tools •  Centralized metadata services for easy data sharing Page 27 © Hortonworks Inc. 2012
  • 28. Hortonworks Data Platform •  Simplify deployment to get started quickly and easily •  Monitor, manage any size cluster with familiar console and tools •  Only platform to include data integration services to interact 1 with any data source •  Metadata services opens the platform for integration with Hortonworks Data Platform existing applications Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease management, •  Dependable high availability simplify use and ease integration into the enterprise architecture The only 100% open source data platform for Apache Hadoop © Hortonworks Inc. 2012
  • 29. Apache Distribution Stack Built on Hadoop 1.0 (a.k.a. 0.20.205) •  Proven at large scale enterprise implementations 0.92.1+ 5.1.1 •  Most stable and reliable version 1.0.3 0.9.2 3.3.4 of Hadoop to date •  First Apache line supporting 0.4.0 security, HBase, WebHDFS •  Driven by core committers and 0.9.0+ 3.1.3 architects at Hortonworks 0.9.0+ beta Zookeeper Includes necessary components HCatalog Ambari HBase Talend Sqoop already integrated and tested Oozie Core Hive Pig together 1.0.3 0.4.0 0.9.2 0.9.0+ 0.92.1+ 0.9.0+ 3.1.3 3.3.4 beta 5.1.1 Most stable versions of all Hortonworks Distribution components are chosen Tested, Hardened & Proven Distribution Reduces Risk Page 29 © Hortonworks Inc. 2012
  • 30. Management & Monitoring Svcs Hortonworks Management Center – View the health of cluster operations, server utilization and performance levels – Customizable dashboards – APIs for integration into 3rd party monitoring tools – 100% open source management & monitoring, powered by Apache Ambari, Puppet, Nagios and Gaglia – Simple wizard-based installation, configuration & provisioning of any size Hadoop cluster Optimize performance for your Hadoop cluster Simplify Installation and provisioning Page 30 © Hortonworks Inc. 2012
  • 31. Data Integration Services •  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig •  Oozie scheduling allows you to manage and stage jobs •  Connectors for any database, business application or system •  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development Page 31 © Hortonworks Inc. 2012
  • 32. Which is best for the cloud? vs. Page 32 © Hortonworks Inc. 2012
  • 33. Metadata Services Apache HCatalog provides flexible metadata services across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  • 34. Services Integration Provides RESTful API as “front door” for Hadoop Existing & New Applications •  Opens the door to WebHDFS HCatalog RESTful Web Services languages other than Java •  Thin clients via web MapReduce Pig Hive services vs. fat-clients in HCatalog gateway •  Insulation from interface External HDFS HBase changes release to release Store Opens Hadoop to integration with existing and new applications © Hortonworks Inc. 2012
  • 35. Use cases: optimize outcomes at scale Media optimize Content Intelligence optimize Detection Investment optimize Algorithms Advertising optimize Performance Fraud optimize Prevention Regulation optimize Compliance Retail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. © Hortonworks Inc. 2012
  • 36. Connecting Transactions + Interactions + Observations Audio, Retain runtime models and Video, Images historical data for ongoing 5 Business Web, Mobile, CRM, refinement & analysis ERP, SCM, … Transactions Docs, & Interactions Text, XML Web Logs, Clicks Big Data 4 Data Social, Refinery Discovery & Classic Graph, 1 ETL Feeds Investigative processing Analytics Sensors, 3 Share refined Devices, RFID data & runtime 2 Store, aggregate, and models Interactive transform multi-structured data Spatial, data to unlock value Business exploration GPS Intelligence & Analytics Retain historical data to Events, Other unlock additional value 6 Dashboards, Reports, Visualization, … © Hortonworks Inc. 2012
  • 37. 5 Reasons for Hadoop in the Cloud 1 If your data is stored in a cloud, local analysis may make more sense… "work near the data" 2 For periodic processing (nightly, etc…) it might make sense to just rent. 3 No upfront capital expense, fund from success 4 Easier to expand a cluster; no need to buy just find 5 Eliminate networking concerns http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html Page 37 © Hortonworks Inc. 2012
  • 38. THANK YOU Jim Walker jim@hortonworks.com @jaymce 1 Get Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support hortonworks.com/training hortonworks.com/support Page 38 © Hortonworks Inc. 2012