SlideShare a Scribd company logo
1 of 35
Download to read offline
(Efficient) Data Exchange with
"Foreign" Ecosystems
Uwe Korn – QuantCo – 2nd July 2019
About me
• Engineering at QuantCo

• Apache {Arrow, Parquet} PMC

• Focus on Python but interact with
R, Java, SAS, …
@xhochy
@xhochy
mail@uwekorn.com
https://uwekorn.com
Python vs R
👊
Python vs R
👊
Python & R
Python & R
… & Java & Rust &
Javascript & C# & Matlab
& …
Do we have a problem?
Do we have a problem?
• Yes, there are different ecosystems!
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
• Two weeks ago: Berlin Buzzwords

• Java / Scala

• Flink / ElasticSearch / Kafka

• Scala-Spark / Kubernetes
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
• Two weeks ago: Berlin Buzzwords

• Java / Scala

• Flink / ElasticSearch / Kafka

• Scala-Spark / Kubernetes
• SQL-based databases

• ODBC / JDBC

• Custom protocols (e.g. Postgres)
Why solve this?
• We build pipelines to move data

• Goal: end-to-end data products

Somewhere along the path we need to talk

• Avoid duplicate work / work on converters

• We don’t want Python vs R but use each of them where they’re best.
Apache Arrow at its core
• Main idea: common columnar representation of data in memory
• Provide libraries to access the data structures

• Broad support for many languages

• Create building blocks to form an ecosystem around it

• Implement adaptors for existing structures
Columnar Data
Previous Work
• CSV works really everywhere 

• Slow, untyped and row-wise

• Parquet is gaining traction in all ecosystems

• one of the major features and interaction points of Arrow

• Still, this serializes data

• RAM-Copy: 10GB/s on a Laptop

• DataFrame implementations look similar but still are incompatible
Languages
• C++, C(glib), Python, Ruby, R, Matlab

• C#

• Go

• Java

• JavaScript

• Rust
There’s a social component
• It’s not only APIs you need to bring together

• Communities are also quite distinct

• Get them talking!
Shipped with batteries
• There is more than just data structures

• Batteries in Arrow

• Vectorized Parquet reader: C++, Rust, Java(WIP)

C++ also supports ORC

• Gandiva: LLVM-based expression kernels

• Plasma: Shared-memory object store

• DataFusion: Rust-based query engine

• Flight: RPC protocol built on top of gRPC with zero-copy optimizations
Ecosystem
• RAPIDS: Analytics on the GPU

• Dremio: Data platform

• Turbodbc: columnar ODBC access in C++/Python

• Spark: fast Python and R bridge

• fletcher (pandas): Use Arrow instead of NumPy as backing storage

• fletcher (FPGA): Use Arrow on FPGAs

• Many more … https://arrow.apache.org/powered_by/
Ecosystem
Kartothek: 

• Heavily relies on Parquet adapter

• Uses Arrow’s type system which is more sophisticated than pandas’

• Using Arrow instead of building some components on their own allows
us to provide Kartothek access in other languages easily in the future
Does it work?
Does it work?
Everything is amazing on slides …
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
• ETL and Data Cleaning in Python
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
• ETL and Data Cleaning in Python
• Analysis in R
Does it work?
Does it work?
Does it work?
Does it work?
WIP
Get started easily?
Up Next
• Build more adaptors, e.g. Postgres

• Building blocks for query engines on top of Arrow

• Datasets

• Analytical kernels

• DataFrame implementations directly on top of Arrow
Thanks
Slides at https://twitter.com/xhochy

Question here!

More Related Content

What's hot

C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)Rick Beerendonk
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChandan Goopta
 
Not Everything is an Object - Rocksolid Tour 2013
Not Everything is an Object  - Rocksolid Tour 2013Not Everything is an Object  - Rocksolid Tour 2013
Not Everything is an Object - Rocksolid Tour 2013Gary Short
 
PharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille PolitoPharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille PolitoPharo
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOSSam Rijs
 
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...Flink Forward
 
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff AllenSri Ambati
 
IWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror StoriesIWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror StoriesIWMW
 
파이콘한국2017 - Years with Python
파이콘한국2017 - Years with Python파이콘한국2017 - Years with Python
파이콘한국2017 - Years with PythonYounggun Kim
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...InfinIT - Innovationsnetværket for it
 
Road to Dynamic LINQ - Part 2
 Road to Dynamic LINQ - Part 2 Road to Dynamic LINQ - Part 2
Road to Dynamic LINQ - Part 2Axilis
 

What's hot (13)

Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali Language
 
Not Everything is an Object - Rocksolid Tour 2013
Not Everything is an Object  - Rocksolid Tour 2013Not Everything is an Object  - Rocksolid Tour 2013
Not Everything is an Object - Rocksolid Tour 2013
 
PharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille PolitoPharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille Polito
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOS
 
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
 
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
 
IWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror StoriesIWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror Stories
 
파이콘한국2017 - Years with Python
파이콘한국2017 - Years with Python파이콘한국2017 - Years with Python
파이콘한국2017 - Years with Python
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
 
Road to Dynamic LINQ - Part 2
 Road to Dynamic LINQ - Part 2 Road to Dynamic LINQ - Part 2
Road to Dynamic LINQ - Part 2
 

Similar to PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems

PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Hunting for anglerfish in datalakes
Hunting for anglerfish in datalakesHunting for anglerfish in datalakes
Hunting for anglerfish in datalakesDominic Egger
 
The Final Frontier
The Final FrontierThe Final Frontier
The Final FrontierjClarity
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jKevin Watters
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato ReviewHang Li
 
Rust is for "Big Data"
Rust is for "Big Data"Rust is for "Big Data"
Rust is for "Big Data"Andy Grove
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem Claudson Oliveira
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...ScyllaDB
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Gozhubert
 

Similar to PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems (20)

PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Hunting for anglerfish in datalakes
Hunting for anglerfish in datalakesHunting for anglerfish in datalakes
Hunting for anglerfish in datalakes
 
The Final Frontier
The Final FrontierThe Final Frontier
The Final Frontier
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Rust is for "Big Data"
Rust is for "Big Data"Rust is for "Big Data"
Rust is for "Big Data"
 
Frontend as a first class citizen
Frontend as a first class citizenFrontend as a first class citizen
Frontend as a first class citizen
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem
 
Ruby - The Hard Bits
Ruby - The Hard BitsRuby - The Hard Bits
Ruby - The Hard Bits
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
cadec-2017-golang
cadec-2017-golangcadec-2017-golang
cadec-2017-golang
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
groovy & grails - lecture 1
groovy & grails - lecture 1groovy & grails - lecture 1
groovy & grails - lecture 1
 

More from Uwe Korn

Going beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settingsGoing beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settingsUwe Korn
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastUwe Korn
 
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...Uwe Korn
 
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detailApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detailUwe Korn
 
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyFulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyUwe Korn
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with DaskUwe Korn
 
Extending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and NumbaExtending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and NumbaUwe Korn
 
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...Uwe Korn
 
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...Uwe Korn
 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityUwe Korn
 

More from Uwe Korn (10)

Going beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settingsGoing beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settings
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
 
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
 
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detailApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
 
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyFulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Extending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and NumbaExtending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and Numba
 
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
 
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems

  • 1. (Efficient) Data Exchange with "Foreign" Ecosystems Uwe Korn – QuantCo – 2nd July 2019
  • 2. About me • Engineering at QuantCo • Apache {Arrow, Parquet} PMC • Focus on Python but interact with R, Java, SAS, … @xhochy @xhochy mail@uwekorn.com https://uwekorn.com
  • 6. Python & R … & Java & Rust & Javascript & C# & Matlab & …
  • 7. Do we have a problem?
  • 8. Do we have a problem? • Yes, there are different ecosystems!
  • 9. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker
  • 10. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker • Two weeks ago: Berlin Buzzwords • Java / Scala • Flink / ElasticSearch / Kafka • Scala-Spark / Kubernetes
  • 11. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker • Two weeks ago: Berlin Buzzwords • Java / Scala • Flink / ElasticSearch / Kafka • Scala-Spark / Kubernetes • SQL-based databases • ODBC / JDBC • Custom protocols (e.g. Postgres)
  • 12. Why solve this? • We build pipelines to move data • Goal: end-to-end data products
 Somewhere along the path we need to talk • Avoid duplicate work / work on converters • We don’t want Python vs R but use each of them where they’re best.
  • 13.
  • 14. Apache Arrow at its core • Main idea: common columnar representation of data in memory • Provide libraries to access the data structures • Broad support for many languages • Create building blocks to form an ecosystem around it • Implement adaptors for existing structures
  • 16. Previous Work • CSV works really everywhere • Slow, untyped and row-wise • Parquet is gaining traction in all ecosystems • one of the major features and interaction points of Arrow • Still, this serializes data • RAM-Copy: 10GB/s on a Laptop • DataFrame implementations look similar but still are incompatible
  • 17. Languages • C++, C(glib), Python, Ruby, R, Matlab • C# • Go • Java • JavaScript • Rust
  • 18. There’s a social component • It’s not only APIs you need to bring together • Communities are also quite distinct • Get them talking!
  • 19. Shipped with batteries • There is more than just data structures • Batteries in Arrow • Vectorized Parquet reader: C++, Rust, Java(WIP)
 C++ also supports ORC • Gandiva: LLVM-based expression kernels • Plasma: Shared-memory object store • DataFusion: Rust-based query engine • Flight: RPC protocol built on top of gRPC with zero-copy optimizations
  • 20. Ecosystem • RAPIDS: Analytics on the GPU • Dremio: Data platform • Turbodbc: columnar ODBC access in C++/Python • Spark: fast Python and R bridge • fletcher (pandas): Use Arrow instead of NumPy as backing storage • fletcher (FPGA): Use Arrow on FPGAs • Many more … https://arrow.apache.org/powered_by/
  • 21. Ecosystem Kartothek: • Heavily relies on Parquet adapter • Uses Arrow’s type system which is more sophisticated than pandas’ • Using Arrow instead of building some components on their own allows us to provide Kartothek access in other languages easily in the future
  • 23. Does it work? Everything is amazing on slides …
  • 24. Does it work? Everything is amazing on slides … … so does this Arrow actually work?
  • 25. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with:
  • 26. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client)
  • 27. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client) • ETL and Data Cleaning in Python
  • 28. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client) • ETL and Data Cleaning in Python • Analysis in R
  • 34. Up Next • Build more adaptors, e.g. Postgres • Building blocks for query engines on top of Arrow • Datasets • Analytical kernels • DataFrame implementations directly on top of Arrow