SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Tackling Big Data with Hadoop and
Open Source Integration




                         Ciaran Dynes
                         Remy Dubois
Agenda



 1. Talend’s Goal: Democratizing Integration

 2. What is Big Data (integration)?

 3. Big Data for the Masses: Talend’s strategy and vision




© Talend 2011                                               2
Our goal
Talend – The Market Leading Unified Integration Platform

                                     Talend Enterprise


                 Data            Data
                                              MDM     ESB         BPM
                Quality       Integration

                                                                          ¾  Commercial license
                                                                          ¾  Subscription model

         Studio            Repository Deployment Execution   Monitoring



                                                                          ¾  Open source license

                           Talend Open Studio          for
                                                                          ¾  Free of charge
                                                                          ¾  Optional support

                  Data             Data
                 Quality        Integration   MDM     ESB




Recognized as the open source leader in each of its market
            category by all industry analysts
© Talend 2011                                                                                       4
Who uses Talend?

 A high adoption rate

  § 20 million downloads
  § 950,000 users
  § 3,500 customers


                1 product download   150 new customers
                 every 30 seconds        per month

© Talend 2011                                            5
Trying to get from this…




 © Talend 2011 – Stri2y Private & Confidential
 © Talend 2011                                   6
to this…




 Why Talend…

 ONLY Talend generates code that is executed within map reduce. This
 open approach removes the limitation of a proprietary “engine” to
 provide a truly unique and powerful set of tools for big data.
Big data is….



                                          Hans Rosling – uses big data to analyze world health trends




     Key Takeaway #1
    transactions, interactions, observations

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                           8
Big Data = Transactions + Interactions + Observations


                                                       Sensors/RFID/Devices                                   User Generated Content
                                                                                       Big Data
                Mega, Giga, Tera, Peta bytes


                                                            Sentiment                                        Social Interactions & Feeds
                                                            Mobile Web
                                                                                                             Spatial & GPS coordinates
                                                            User Clicks
                                                                                                               External Demographics

                                                   Web logs                WEB                                  Business Data Feeds
                                                 Offer history                             A/B testing          Video, Audio, Images
                                                                                         Dynamic pricing             SMS/MMS
                                                             CRM Segmentation           Affiliate Networks
                                                                                        Search Marketing
                                                    ERP              Offer details
                                               Purchase detail   Customer Touchpoints Behavioral Targeting
                                               Purchase record     Support Contacts     Dynamic Funnels
                                               Payment record




                                                             Increasing Data Variety and Complexity



                                                                                                                Source: Hortonworks

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                                                              9
What is Big Data integration?
Traditional Data Flows


          CRM


                                                 ETL
                                                               Normalized   Traditional Data
          ERP                                    Data             Data
                                                                              Warehouse
                                                Quality

       Finance




 •  Scheduled–daily or weekly,
    sometimes more frequently.                                               Business           Business
                                                                             Analyst            User
 •  Volumes rarely exceed
    terabytes                                           Warehouse
                                                      Administrator
                                                                                               Executives
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                          11
The new world of big data

                                                             Social
                                                           Networking
          CRM




          ERP
                                                Big Data


       Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                           12
The new world of big data

                                                              Social
                                                            Networking
          CRM


                                                           Mobile Devices

          ERP



                                                Big Data
       Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                               13
The new world of big data

                                                              Social
                                                            Networking
          CRM


                                                           Mobile Devices

          ERP

                                                            Transactions


       Finance

                                                Big Data




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                               14
The new world of big data

                                                               Social
                                                             Networking
          CRM


                                                           Mobile Devices

          ERP

                                                            Transactions


       Finance
                                                           Network Devices



                                                Big Data       Sensors




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                15
Key Takeaway #2

                 Forces us to think
© Talend 2011
                 differently
© Talend 2011 – Stri2y Private & Confidential   16
But for Talend…. Big data is…




                …everything that is old, is new again!

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                            17
Data driven business


                            enables
          data            governance




                                                         supports
                                  information                                       decisions


                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then                                           Your
  the result can be missed opportunities, or                                         business
  higher costs.
      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      18
BIG data driven business

                            enables
     BIG data             governance




                                                         supports
                                      BIG                                            BIG
                                  information                                       decisions

                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then
  the result can be missed opportunities, or                                         BIG
  higher costs.                                                                      business

      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      19
“Big Data for the Masses”
Goal: Democratize Big Data


                                                 Talend Open Studio for Big Data
                                                 ¾  “Big Data for the Masses”
                                                   ¾  Improves efficiency of big data job
                                                      design with graphic interface
                                                   ¾  Abstracts and generates code
                                                   ¾  Run transforms inside Hadoop

                                          Pig
                                                   ¾  Native support for HDFS, Pig, HBase,
                                                      Sqoop and Hive
                                                   ¾  Apache License 2.0
                                                   ¾  Embedded in Hortonworks Data
         …an open source                              Platform
           ecosystem
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                 21
Let us show you…




© Talend 2012
Where to next?




© Talend 2012
How is big data integration being used?

 Use Cases
 •     Recommendation Engine
 •     Sentiment Analysis
 •     Risk Modeling
 •     Fraud Detection
 •     Marketing Campaign Analysis
 •     Customer Churn Analysis
 •     Social Graph Analysis
 •     Customer Experience Analytics
 •     Network Monitoring
 •     Research And Development

 BUT: to what level is DQ required for your use
 case?
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                     24
Poor Data Quality + Big Data = Big Problems
Poor Data Quality * Big Data = Big Problems^2




           Key Takeaway #3
           In big data…
           poor data quality can be magnified at huge scale

© Talend 2011                                                 25
Two methods for inserting data quality into a big data job




 1.  Pipelining: as part of the load process


 2.  Load the cluster than implement and execute
     a data quality map reduce job




© Talend 2011                                                 26
E-T-L - Load
      Extract – Transform

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   27
E- DQ -L
      Extract – Improve/Cleanse - Load
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   28
Pipelining: data quality with big data



               CRM
                                                DQ


               ERP



                                                DQ
            Finance
                                                            Big Data

           Social
         Networking
                                                     •  Use traditional data quality tools
                                                     •  No new programming, no PHDs
                                                     •  Once and done
      Mobile Devices



© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                29
Big data alternative: Load and improve within the cluster



               CRM

                                                      DQ

               ERP
                                                            DQ

            Finance
                                                         Big Data

           Social
         Networking
                                                •    Load first, improve later
                                                •    Really complex to build, limited tools
                                                •    Constant on, increments
      Mobile Devices
                                                •    Insane performance


© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                 30
big
2012
         data                                   now   Q4   2013


Talend Open Studio for Big Data
¾ Packaged within Hortonworks Data Platform
     …Eclipse tools for HIVE, HDFS, PIG, SCOOP

     …supports Oozie, Hcatalog, Kerberos


¾ Free to download and use under the Apache license
   …democratizing big data through intuitive tools




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                     31
Thanks for attending
Sessions will resume at 11:25am




                             Page 33

Weitere ähnliche Inhalte

Was ist angesagt?

BI Forum 2009 - BI Mega Trends
BI Forum 2009 - BI Mega TrendsBI Forum 2009 - BI Mega Trends
BI Forum 2009 - BI Mega TrendsOKsystem
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Mike Stolz Dramatic Scalability
Mike Stolz Dramatic ScalabilityMike Stolz Dramatic Scalability
Mike Stolz Dramatic Scalabilitydeimos
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
The Best Analytics Tools
The Best Analytics ToolsThe Best Analytics Tools
The Best Analytics ToolsDatalicious
 
Big Data for Everyman
Big Data for EverymanBig Data for Everyman
Big Data for EverymanMichael Wilde
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotModern Data Stack France
 
MWG Big Data & Media - Nick North (GfK UK)
MWG Big Data & Media - Nick North (GfK UK)MWG Big Data & Media - Nick North (GfK UK)
MWG Big Data & Media - Nick North (GfK UK)MWG verbindt media
 
Vision - The Agile Data Center
Vision - The Agile Data CenterVision - The Agile Data Center
Vision - The Agile Data Centerincommoninc
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalVivastream
 
HCLT Brochure: E-Discovery and Document Review Solutions
HCLT Brochure: E-Discovery and Document Review SolutionsHCLT Brochure: E-Discovery and Document Review Solutions
HCLT Brochure: E-Discovery and Document Review SolutionsHCL Technologies
 
Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Yannick Quentel
 
Open Video Customer Presentation
Open Video Customer PresentationOpen Video Customer Presentation
Open Video Customer PresentationMetroFiber
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2Wilfried Hoge
 
Enterprise Security Architecture: From Access to Audit
Enterprise Security Architecture: From Access to AuditEnterprise Security Architecture: From Access to Audit
Enterprise Security Architecture: From Access to AuditBob Rhubart
 
Striving for an Outstanding IT Organization
Striving for an Outstanding IT OrganizationStriving for an Outstanding IT Organization
Striving for an Outstanding IT OrganizationHuberto Garza
 

Was ist angesagt? (20)

Search2012 ibm vf
Search2012 ibm vfSearch2012 ibm vf
Search2012 ibm vf
 
BI Forum 2009 - BI Mega Trends
BI Forum 2009 - BI Mega TrendsBI Forum 2009 - BI Mega Trends
BI Forum 2009 - BI Mega Trends
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
OWF12/Java Michael hirt
OWF12/Java Michael hirtOWF12/Java Michael hirt
OWF12/Java Michael hirt
 
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
 
Mike Stolz Dramatic Scalability
Mike Stolz Dramatic ScalabilityMike Stolz Dramatic Scalability
Mike Stolz Dramatic Scalability
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
The Best Analytics Tools
The Best Analytics ToolsThe Best Analytics Tools
The Best Analytics Tools
 
Big Data for Everyman
Big Data for EverymanBig Data for Everyman
Big Data for Everyman
 
Katrina marques presentation
Katrina marques   presentationKatrina marques   presentation
Katrina marques presentation
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
 
MWG Big Data & Media - Nick North (GfK UK)
MWG Big Data & Media - Nick North (GfK UK)MWG Big Data & Media - Nick North (GfK UK)
MWG Big Data & Media - Nick North (GfK UK)
 
Vision - The Agile Data Center
Vision - The Agile Data CenterVision - The Agile Data Center
Vision - The Agile Data Center
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience Final
 
HCLT Brochure: E-Discovery and Document Review Solutions
HCLT Brochure: E-Discovery and Document Review SolutionsHCLT Brochure: E-Discovery and Document Review Solutions
HCLT Brochure: E-Discovery and Document Review Solutions
 
Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion
 
Open Video Customer Presentation
Open Video Customer PresentationOpen Video Customer Presentation
Open Video Customer Presentation
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2
 
Enterprise Security Architecture: From Access to Audit
Enterprise Security Architecture: From Access to AuditEnterprise Security Architecture: From Access to Audit
Enterprise Security Architecture: From Access to Audit
 
Striving for an Outstanding IT Organization
Striving for an Outstanding IT OrganizationStriving for an Outstanding IT Organization
Striving for an Outstanding IT Organization
 

Andere mochten auch

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Spark Streaming
Spark StreamingSpark Streaming
Spark StreamingEdureka!
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBaseJosh Elser
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query ServerJosh Elser
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道Etu Solution
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景Etu Solution
 
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...
Interface fonctionnelle, Lambda expression, méthode par défaut,  référence de...Interface fonctionnelle, Lambda expression, méthode par défaut,  référence de...
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...MICHRAFY MUSTAFA
 
Scala: Pattern matching, Concepts and Implementations
Scala: Pattern matching, Concepts and ImplementationsScala: Pattern matching, Concepts and Implementations
Scala: Pattern matching, Concepts and ImplementationsMICHRAFY MUSTAFA
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013Kai Wähner
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
Scala : programmation fonctionnelle
Scala : programmation fonctionnelleScala : programmation fonctionnelle
Scala : programmation fonctionnelleMICHRAFY MUSTAFA
 
Mobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Mobile to Mainframe - the Challenges of Enterprise DevOps AdoptionMobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Mobile to Mainframe - the Challenges of Enterprise DevOps AdoptionSanjeev Sharma
 
Spark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsSpark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsMICHRAFY MUSTAFA
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例Fred Chiang
 

Andere mochten auch (20)

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
Spark Streaming
Spark StreamingSpark Streaming
Spark Streaming
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景
 
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...
Interface fonctionnelle, Lambda expression, méthode par défaut,  référence de...Interface fonctionnelle, Lambda expression, méthode par défaut,  référence de...
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...
 
Scala: Pattern matching, Concepts and Implementations
Scala: Pattern matching, Concepts and ImplementationsScala: Pattern matching, Concepts and Implementations
Scala: Pattern matching, Concepts and Implementations
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Scala : programmation fonctionnelle
Scala : programmation fonctionnelleScala : programmation fonctionnelle
Scala : programmation fonctionnelle
 
Mobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Mobile to Mainframe - the Challenges of Enterprise DevOps AdoptionMobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Mobile to Mainframe - the Challenges of Enterprise DevOps Adoption
 
Spark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsSpark RDD : Transformations & Actions
Spark RDD : Transformations & Actions
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例
 

Ähnlich wie Tackling big data with hadoop and open source integration

Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotInside Analysis
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Hortonworks
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)Will Gardella
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams치민 최
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityDatabase Architechs
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Casesdmurph4
 

Ähnlich wie Tackling big data with hadoop and open source integration (20)

Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Tackling big data with hadoop and open source integration

  • 1. Tackling Big Data with Hadoop and Open Source Integration Ciaran Dynes Remy Dubois
  • 2. Agenda 1. Talend’s Goal: Democratizing Integration 2. What is Big Data (integration)? 3. Big Data for the Masses: Talend’s strategy and vision © Talend 2011 2
  • 4. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration ¾  Commercial license ¾  Subscription model Studio Repository Deployment Execution Monitoring ¾  Open source license Talend Open Studio for ¾  Free of charge ¾  Optional support Data Data Quality Integration MDM ESB Recognized as the open source leader in each of its market category by all industry analysts © Talend 2011 4
  • 5. Who uses Talend? A high adoption rate § 20 million downloads § 950,000 users § 3,500 customers 1 product download 150 new customers every 30 seconds per month © Talend 2011 5
  • 6. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 6
  • 7. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  • 8. Big data is…. Hans Rosling – uses big data to analyze world health trends Key Takeaway #1 transactions, interactions, observations © Talend 2011 – Stri2y Private & Confidential © Talend 2011 8
  • 9. Big Data = Transactions + Interactions + Observations Sensors/RFID/Devices User Generated Content Big Data Mega, Giga, Tera, Peta bytes Sentiment Social Interactions & Feeds Mobile Web Spatial & GPS coordinates User Clicks External Demographics Web logs WEB Business Data Feeds Offer history A/B testing Video, Audio, Images Dynamic pricing SMS/MMS CRM Segmentation Affiliate Networks Search Marketing ERP Offer details Purchase detail Customer Touchpoints Behavioral Targeting Purchase record Support Contacts Dynamic Funnels Payment record Increasing Data Variety and Complexity Source: Hortonworks © Talend 2011 – Stri2y Private & Confidential © Talend 2011 9
  • 10. What is Big Data integration?
  • 11. Traditional Data Flows CRM ETL Normalized Traditional Data ERP Data Data Warehouse Quality Finance •  Scheduled–daily or weekly, sometimes more frequently. Business Business Analyst User •  Volumes rarely exceed terabytes Warehouse Administrator Executives © Talend 2011 – Stri2y Private & Confidential © Talend 2011 11
  • 12. The new world of big data Social Networking CRM ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 12
  • 13. The new world of big data Social Networking CRM Mobile Devices ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 13
  • 14. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Big Data © Talend 2011 – Stri2y Private & Confidential © Talend 2011 14
  • 15. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Network Devices Big Data Sensors © Talend 2011 – Stri2y Private & Confidential © Talend 2011 15
  • 16. Key Takeaway #2 Forces us to think © Talend 2011 differently © Talend 2011 – Stri2y Private & Confidential 16
  • 17. But for Talend…. Big data is… …everything that is old, is new again! © Talend 2011 – Stri2y Private & Confidential © Talend 2011 17
  • 18. Data driven business enables data governance supports information decisions drives Information provides value to the business If you can't rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 18
  • 19. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 19
  • 20. “Big Data for the Masses”
  • 21. Goal: Democratize Big Data Talend Open Studio for Big Data ¾  “Big Data for the Masses” ¾  Improves efficiency of big data job design with graphic interface ¾  Abstracts and generates code ¾  Run transforms inside Hadoop Pig ¾  Native support for HDFS, Pig, HBase, Sqoop and Hive ¾  Apache License 2.0 ¾  Embedded in Hortonworks Data …an open source Platform ecosystem © Talend 2011 – Stri2y Private & Confidential © Talend 2011 21
  • 22. Let us show you… © Talend 2012
  • 23. Where to next? © Talend 2012
  • 24. How is big data integration being used? Use Cases •  Recommendation Engine •  Sentiment Analysis •  Risk Modeling •  Fraud Detection •  Marketing Campaign Analysis •  Customer Churn Analysis •  Social Graph Analysis •  Customer Experience Analytics •  Network Monitoring •  Research And Development BUT: to what level is DQ required for your use case? © Talend 2011 – Stri2y Private & Confidential © Talend 2011 24
  • 25. Poor Data Quality + Big Data = Big Problems Poor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale © Talend 2011 25
  • 26. Two methods for inserting data quality into a big data job 1.  Pipelining: as part of the load process 2.  Load the cluster than implement and execute a data quality map reduce job © Talend 2011 26
  • 27. E-T-L - Load Extract – Transform © Talend 2011 – Stri2y Private & Confidential © Talend 2011 27
  • 28. E- DQ -L Extract – Improve/Cleanse - Load © Talend 2011 – Stri2y Private & Confidential © Talend 2011 28
  • 29. Pipelining: data quality with big data CRM DQ ERP DQ Finance Big Data Social Networking •  Use traditional data quality tools •  No new programming, no PHDs •  Once and done Mobile Devices © Talend 2011 – Stri2y Private & Confidential © Talend 2011 29
  • 30. Big data alternative: Load and improve within the cluster CRM DQ ERP DQ Finance Big Data Social Networking •  Load first, improve later •  Really complex to build, limited tools •  Constant on, increments Mobile Devices •  Insane performance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 30
  • 31. big 2012 data now Q4 2013 Talend Open Studio for Big Data ¾ Packaged within Hortonworks Data Platform …Eclipse tools for HIVE, HDFS, PIG, SCOOP …supports Oozie, Hcatalog, Kerberos ¾ Free to download and use under the Apache license …democratizing big data through intuitive tools © Talend 2011 – Stri2y Private & Confidential © Talend 2011 31
  • 33. Sessions will resume at 11:25am Page 33