SlideShare ist ein Scribd-Unternehmen logo
1 von 31
APACHE HADOOP
            ON AZURE AND WINDOWS
                 MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE




ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS
Brad Sarsfield
Engineering Architect
Microsoft Big Data | Haodoop
March 2012 | revision 1.02
ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD


       “The next frontier is all about uniting the power of the cloud
       with the power of data to gain insights that simply weren’t
       possible even just a few years ago”
                                                    Ted Kummert, CVP Business Platforms
                                                    SQL PASS, October 2011
BIG DATA IS HERE AND HADOOP IS CENTER STAGE
15 out of 17
sectors in the US have more data
stored per company than the
US Library of Congress
                                                                140,000-190,000
                                                                more deep analytical talent positions
                             1.5 million                                                       50-60%
                             more data savvy managers
                                                                 increase in the number of Hadoop developers
                             in the US alone                        within organizations already using Hadoop
                                                                                                  within a year
   €250 billion
   Potential annual value to
   Europe’s public sector
                                                    $300 billion
                                                    Potential annual value to US healthcare

 ECONOMIC CONTEXT AND EXEMPLAR

                                   Special Report: The CEO’s Guide to Hadoop
                                    Learn how large corporations are coping with the increasing flow of
                                    unstructured data by using a free software program called Hadoop


                    http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY
Isotope is designed to enable solution building with all key dimensions in mind
Deep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
Cassandra             Hadoop                 BackType                MR/GFS                  SimpleDB
      Hive                  Oozie                  Hadoop                  Bigtable                Dynamo
      Scribe                PigLatin               Pig HBase               Dremel                  EC2/EMR/S3
      Hadoop                …                      Cassandra               …                       …




                         Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]




VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT
Scalable machine learning and data mining [Mahout]
Statistical modeling and analysis [R]
Coordination and workflow [Oozie, Cascading]
Data integration and transformation [SQOOP, Flume]
Social network analytics and petascale graph learning [Pegasus]
Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]
Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]
Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
ENTER ISOTOPE
Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
Un- and Semi-Structured
         Sensors
         Crawlers
                                             SQL REPORTING
          Devices                                                     Interactive Reports
                                                                         with Crescent
            Bots
           Apps
                                                                                                                 Business
                          HADOOP              SQL ANALYSIS
                                                                                                                  Users
                                                                          Excel with
                                                                          PowerPivot

         EIS
         ERP                                   SQL DATA
                                              WAREHOUSING
         CRM
         LOB
                                                                      Embedded BI Apps

       Structured

OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE
Self-service business intelligence at any scale on premise or cloud
Complete integration of information assets from log files to collaboration artifacts to enterprise data stores
Familiar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making
Transparent, federated identity and security management for all big data services
High availability data protection and recovery services for enterprises through cloud
Enterprise-grade support for all service, frameworks, and tools
HADOOP
                                           [Azure and Enterprise]


   Java OM        Streaming OM   HiveQL                   PigLatin               .NET/C#/F#           (T)SQL




                                              OCEAN OF DATA
              NOSQL              [unstructured, semi-structured, structured]                  ETL




                                             HDFS



A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS




  EIS / ERP           RDBMS                  File System                       OData [RSS]          Azure Storage
PROJECT ISOTOPE OFFERINGS
•   Bi-directional connectors between Hadoop and SQL and PDW
•   ODBC driver for Hadoop
•   Hive plug-in for Excel
•   Hosted elastic Hadoop service on Azure
•   Microsoft’s Apache Hadoop-based solution for Windows Azure
•   Microsoft’s Apache Hadoop-based solution for Windows Server
•   JavaScript support for Hadoop, with web-based interactive environment
•   Contributions back to the open source community via the Apache Foundation
HIVE PLUG-IN FOR EXCEL
•   Connect Excel directly to Hive
•   Browse Hive objects – tables, columns, etc.
•   Construct and issue queries
HOSTED ELASTIC HADOOP SERVICE ON AZURE
•   Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools
•   Simple management UI
•   Full Hadoop compatibility
•   Native support for Azure Blob Storage from HDFS
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE

•   One-click deployment of Hadoop on Azure cluster
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS
•   All standard Hadoop modules supported:
             Hadoop | HDFS | Pig | Hive | Monitoring Pages
•   One-click installer
•   Simplified cluster configuration
•   Integration with Microsoft ecosystem
           System Center | Active Directory | etc.
// Map Reduce function in JavaScript
// -------------------------------------------------------
-----------

var map = function (key, value, context) {
           var words = value.split(/[^a-zA-Z]/);
           for (var i = 0; i < words.length; i++) {
                      if (words[i] !== "") {

           context.write(words[i].toLowerCase(), 1);
                      }
           }
};

var reduce = function (key, values, context) {
           var sum = 0;
           while (values.hasNext()) {
                      sum += parseInt(values.next());
           }
           context.write(key, sum);
};




 ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA
 •   Write MapReduce jobs in JavaScript
 •   Interactive development environment
 •   Interactive data query and analytics of petascale datasets
 •   HIVE command line for interactive HIVE
 •   Charting and graphing for insight and analytics visualization
“We are excited to work with Microsoft to help make Apache
      Hadoop a compelling platform for storing and processing data.
      Hortonworks welcomes Microsoft to the Hadoop ecosystem
      and looks forward to lending our deep domain expertise to
      help accelerate the delivery of Microsoft’s Apache Hadoop-
      based solution for Windows Server and service for Windows
      Azure.”
                                                  Eric Baldeschwieler
                                                  CEO

GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY
Microsoft will be working with the community to contribute back significant code to the Apache Foundation
Microsoft has announced a partnership with Hortonworks to help accelerate our open source support
APACHE HADOOP
            ON AZURE AND WINDOWS
                 MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE




SUMMARY
Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop
Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of
products and services Microsoft is delivering in 2012 an beyond

Weitere ähnliche Inhalte

Was ist angesagt?

10 things ever architect should know about the Windows Azure Platform - ericnel
10 things ever architect should know about the Windows Azure Platform -  ericnel10 things ever architect should know about the Windows Azure Platform -  ericnel
10 things ever architect should know about the Windows Azure Platform - ericnelEric Nelson
 
Understanding The Azure Platform March 2010
Understanding The Azure Platform   March 2010Understanding The Azure Platform   March 2010
Understanding The Azure Platform March 2010DavidGristwood
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
Understanding The Azure Platform Jan
Understanding The Azure Platform   JanUnderstanding The Azure Platform   Jan
Understanding The Azure Platform JanDavidGristwood
 
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMicrosoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMark Kromer
 
Migrating Data and Databases to Azure
Migrating Data and Databases to AzureMigrating Data and Databases to Azure
Migrating Data and Databases to AzureKaren Lopez
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The CloudTed Leung
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsAnimesh Singh
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
What's New + The Lean Methodology: Introduction to AWS, Cambridge
What's New + The Lean Methodology: Introduction to AWS, CambridgeWhat's New + The Lean Methodology: Introduction to AWS, Cambridge
What's New + The Lean Methodology: Introduction to AWS, CambridgeAmazon Web Services
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
Windows Azure for Developers - Service Management
Windows Azure for Developers - Service ManagementWindows Azure for Developers - Service Management
Windows Azure for Developers - Service ManagementMichael Collier
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureNuno Godinho
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
Windows Azure for Developers - Building Block Services
Windows Azure for Developers - Building Block ServicesWindows Azure for Developers - Building Block Services
Windows Azure for Developers - Building Block ServicesMichael Collier
 
Windows Azure Platform: Articles from the Trenches, Volume One
Windows Azure Platform: Articles from the Trenches, Volume OneWindows Azure Platform: Articles from the Trenches, Volume One
Windows Azure Platform: Articles from the Trenches, Volume OneEric Nelson
 

Was ist angesagt? (20)

10 things ever architect should know about the Windows Azure Platform - ericnel
10 things ever architect should know about the Windows Azure Platform -  ericnel10 things ever architect should know about the Windows Azure Platform -  ericnel
10 things ever architect should know about the Windows Azure Platform - ericnel
 
Understanding The Azure Platform March 2010
Understanding The Azure Platform   March 2010Understanding The Azure Platform   March 2010
Understanding The Azure Platform March 2010
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Move to azure
Move to azureMove to azure
Move to azure
 
Understanding The Azure Platform Jan
Understanding The Azure Platform   JanUnderstanding The Azure Platform   Jan
Understanding The Azure Platform Jan
 
A Lap Around Azure
A Lap Around AzureA Lap Around Azure
A Lap Around Azure
 
Spring in the Cloud
Spring in the CloudSpring in the Cloud
Spring in the Cloud
 
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMicrosoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
 
Migrating Data and Databases to Azure
Migrating Data and Databases to AzureMigrating Data and Databases to Azure
Migrating Data and Databases to Azure
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The Cloud
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
What's New + The Lean Methodology: Introduction to AWS, Cambridge
What's New + The Lean Methodology: Introduction to AWS, CambridgeWhat's New + The Lean Methodology: Introduction to AWS, Cambridge
What's New + The Lean Methodology: Introduction to AWS, Cambridge
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Windows Azure for Developers - Service Management
Windows Azure for Developers - Service ManagementWindows Azure for Developers - Service Management
Windows Azure for Developers - Service Management
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Windows Azure for Developers - Building Block Services
Windows Azure for Developers - Building Block ServicesWindows Azure for Developers - Building Block Services
Windows Azure for Developers - Building Block Services
 
Windows Azure Platform: Articles from the Trenches, Volume One
Windows Azure Platform: Articles from the Trenches, Volume OneWindows Azure Platform: Articles from the Trenches, Volume One
Windows Azure Platform: Articles from the Trenches, Volume One
 

Andere mochten auch

Installing Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsInstalling Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsJonathan Bloom
 
Togaf introduction and core concepts
Togaf introduction and core conceptsTogaf introduction and core concepts
Togaf introduction and core conceptsPaul Sullivan
 
Understanding and Applying The Open Group Architecture Framework (TOGAF)
Understanding and Applying The Open Group Architecture Framework (TOGAF)Understanding and Applying The Open Group Architecture Framework (TOGAF)
Understanding and Applying The Open Group Architecture Framework (TOGAF)Nathaniel Palmer
 
Introduction to Enterprise Architecture and TOGAF 9.1
Introduction to Enterprise Architecture and TOGAF 9.1Introduction to Enterprise Architecture and TOGAF 9.1
Introduction to Enterprise Architecture and TOGAF 9.1iasaglobal
 
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Sam Mandebvu
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Chandrashekhar More
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewEnterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewWinton Winton
 

Andere mochten auch (15)

Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Installing Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsInstalling Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for Windows
 
Hadoop on Windows 8
Hadoop on Windows 8Hadoop on Windows 8
Hadoop on Windows 8
 
Enterprise architecture as practice
Enterprise architecture as practiceEnterprise architecture as practice
Enterprise architecture as practice
 
Data modelling qlik view
Data modelling qlik viewData modelling qlik view
Data modelling qlik view
 
Case study haad operating model improvement model
Case study  haad operating model improvement modelCase study  haad operating model improvement model
Case study haad operating model improvement model
 
Integrating Zachman and TOGAF-ADM
Integrating Zachman and TOGAF-ADMIntegrating Zachman and TOGAF-ADM
Integrating Zachman and TOGAF-ADM
 
Zachman Tutorial
Zachman TutorialZachman Tutorial
Zachman Tutorial
 
Togaf introduction and core concepts
Togaf introduction and core conceptsTogaf introduction and core concepts
Togaf introduction and core concepts
 
TOGAF Complete Slide Deck
TOGAF Complete Slide DeckTOGAF Complete Slide Deck
TOGAF Complete Slide Deck
 
Understanding and Applying The Open Group Architecture Framework (TOGAF)
Understanding and Applying The Open Group Architecture Framework (TOGAF)Understanding and Applying The Open Group Architecture Framework (TOGAF)
Understanding and Applying The Open Group Architecture Framework (TOGAF)
 
Introduction to Enterprise Architecture and TOGAF 9.1
Introduction to Enterprise Architecture and TOGAF 9.1Introduction to Enterprise Architecture and TOGAF 9.1
Introduction to Enterprise Architecture and TOGAF 9.1
 
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewEnterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
 

Ähnlich wie Apache Hadoop on Azure and Windows

Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyNilesh Shah
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
20130117 - Big Data Architectures
20130117 - Big Data Architectures20130117 - Big Data Architectures
20130117 - Big Data ArchitecturesBlueMetalInc
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business IntelligenceHGanesh
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Milos Milovanovic
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaMopuru Babu
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Darko Marjanovic
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 

Ähnlich wie Apache Hadoop on Azure and Windows (20)

Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
20130117 - Big Data Architectures
20130117 - Big Data Architectures20130117 - Big Data Architectures
20130117 - Big Data Architectures
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 
Mukul-Resume
Mukul-ResumeMukul-Resume
Mukul-Resume
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 

Apache Hadoop on Azure and Windows

  • 1. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS Brad Sarsfield Engineering Architect Microsoft Big Data | Haodoop March 2012 | revision 1.02
  • 2. ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago” Ted Kummert, CVP Business Platforms SQL PASS, October 2011
  • 3. BIG DATA IS HERE AND HADOOP IS CENTER STAGE
  • 4. 15 out of 17 sectors in the US have more data stored per company than the US Library of Congress 140,000-190,000 more deep analytical talent positions 1.5 million 50-60% more data savvy managers increase in the number of Hadoop developers in the US alone within organizations already using Hadoop within a year €250 billion Potential annual value to Europe’s public sector $300 billion Potential annual value to US healthcare ECONOMIC CONTEXT AND EXEMPLAR Special Report: The CEO’s Guide to Hadoop Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
  • 5. THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY Isotope is designed to enable solution building with all key dimensions in mind Deep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
  • 6. Cassandra Hadoop BackType MR/GFS SimpleDB Hive Oozie Hadoop Bigtable Dynamo Scribe PigLatin Pig HBase Dremel EC2/EMR/S3 Hadoop … Cassandra … … Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ] VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT Scalable machine learning and data mining [Mahout] Statistical modeling and analysis [R] Coordination and workflow [Oozie, Cascading] Data integration and transformation [SQOOP, Flume] Social network analytics and petascale graph learning [Pegasus] Real-time stream analytics and business intelligence merged with petascale computation[HStreamming] Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3] Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
  • 7. ENTER ISOTOPE Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
  • 8. Un- and Semi-Structured Sensors Crawlers SQL REPORTING Devices Interactive Reports with Crescent Bots Apps Business HADOOP SQL ANALYSIS Users Excel with PowerPivot EIS ERP SQL DATA WAREHOUSING CRM LOB Embedded BI Apps Structured OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE Self-service business intelligence at any scale on premise or cloud Complete integration of information assets from log files to collaboration artifacts to enterprise data stores Familiar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making Transparent, federated identity and security management for all big data services High availability data protection and recovery services for enterprises through cloud Enterprise-grade support for all service, frameworks, and tools
  • 9. HADOOP [Azure and Enterprise] Java OM Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL OCEAN OF DATA NOSQL [unstructured, semi-structured, structured] ETL HDFS A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
  • 10. PROJECT ISOTOPE OFFERINGS • Bi-directional connectors between Hadoop and SQL and PDW • ODBC driver for Hadoop • Hive plug-in for Excel • Hosted elastic Hadoop service on Azure • Microsoft’s Apache Hadoop-based solution for Windows Azure • Microsoft’s Apache Hadoop-based solution for Windows Server • JavaScript support for Hadoop, with web-based interactive environment • Contributions back to the open source community via the Apache Foundation
  • 11. HIVE PLUG-IN FOR EXCEL • Connect Excel directly to Hive • Browse Hive objects – tables, columns, etc. • Construct and issue queries
  • 12. HOSTED ELASTIC HADOOP SERVICE ON AZURE • Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools • Simple management UI • Full Hadoop compatibility • Native support for Azure Blob Storage from HDFS
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE • One-click deployment of Hadoop on Azure cluster
  • 28. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS • All standard Hadoop modules supported: Hadoop | HDFS | Pig | Hive | Monitoring Pages • One-click installer • Simplified cluster configuration • Integration with Microsoft ecosystem System Center | Active Directory | etc.
  • 29. // Map Reduce function in JavaScript // ------------------------------------------------------- ----------- var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } } }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA • Write MapReduce jobs in JavaScript • Interactive development environment • Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE • Charting and graphing for insight and analytics visualization
  • 30. “We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop- based solution for Windows Server and service for Windows Azure.” Eric Baldeschwieler CEO GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY Microsoft will be working with the community to contribute back significant code to the Apache Foundation Microsoft has announced a partnership with Hortonworks to help accelerate our open source support
  • 31. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE SUMMARY Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of products and services Microsoft is delivering in 2012 an beyond

Hinweis der Redaktion

  1. Key Message: Big Data is a real problem, and Hadoop’s star is rising. It is economically transformative in the way LAMP was in the previous decade. (Linux, Apache, MySQL, Php/Python)Reference numbers from McKinsey Global Institute – Big Data: The next frontier for innovation competition (http://www.mckinsey.com/mgi/publications/big_data/index.asp)http://www.karmasphere.com/images/documents/Karmasphere-HadoopDeveloperResearch.pdfHadoop is moving into mainstream consciousness now. Businessweek recently had a special report dedicated to Hadoop, with half a dozen articles.http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
  2. http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-dataKEY POINT: Hadoop is part of the solution -
  3. Hadoop is an AND, not an OR. But it requires a certain philosophy that MSFT has not historically embraced. A key benefit of Hadoop is the large, vibrant open source community around it. To succeed, Microsoft needs to not only acknowledge but thrive in this community.
  4. BIG self service BIBillions+ of data itemsUnstructured, semi-structured, log dataReal-time feedsNew analysis types leveraging large server clusters Leverage the Hadoop ecosystem and ride its momentumIW centric designGive business users direct access to the Big Data storeDeliver IW-centric experiences optimized for unstructured and semi-structured queriesCreate, enrich, visualize and share big data sets through fun and immersive experiencesDo it all in the tool they already use - ExcelIncrease the number of questions, reduce the cost of exploratory mining to zeroLeverage new class of analytics and visualizationEnable new types of questions with new types of data and visualizationsLeverage analysis of text, sentiment, clickstream, time windows, classification, clusteringVisualize big data in impactful ways: tag clouds, graphs, timelines, tree maps, etc. Natural extension of our BI platformMaintain a consistent semantic model, consistent expression languageProvide an iterative, experimental, business-driven workflow from the desktop to the Big Data clusterBuild on existing IW skills with the Microsoft BI platform (Excel, PowerPivot, Crescent)Optimized for cloudIntegrate with Azure DataMarket to connect to Bing and other public data sourcesHost big data sets on Azure , integrated with MyDataLeverage Isotope to run analytics clusters
  5. Isotope is the all-up effort around Microsoft and Hadoop. It includes several components:A full distribution of Apache Hadoop that runs on standard windows hardware.A full version of Apache Hadoop that runs on the Azure cloudConnectors from Hadoop (any Hadoop, not just Microsoft’s) to Microsoft’s key products – SQL, Excel, PDW, etc.Jscript shell for live scripting of Hadoop from the browserAdmin, monitoring, and authoring tools to make Microsoft Hadoop best-in-class