Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.
This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.
2. How familiar are you with Spark?
1. What is Apache Spark?
2. I have used Spark
3. I am using Spark in production or I
contribute to its development
2
3. How familiar are you with TensorFlow?
1. What is TensorFlow?
2. I have heard about it
3. I am training my own neural networks
3
4. Founded by the team who
created Apache Spark
Offers a hosted service:
- Apache Spark in the
cloud
- Notebooks
- Cluster management
- Production environment
About Databricks
4
5. Software engineer at Databricks
Apache Spark contributor
Ph.D. UC Berkeley in Machine
Learning
(and Spark user since Spark 0.5)
About me
5
6. Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
6
7. Numerical computing for Data
Science
• Queries are data-heavy
• However algorithms are computation-heavy
• They operate on simple data types: integers,
floats, doubles, vectors, matrices
7
8. The case for speed
• Numerical bottlenecks are good targets for
optimization
• Let data scientists get faster results
• Faster turnaround for experimentations
• How can we run these numerical algorithms
faster?
8
9. Evolution of computing power
9
Failure is not an option:
it is a fact
When you can afford your dedicated chip
GPGPU
Scale out
Scaleup
11. Evolution of computing power
• Processor speed cannot keep up with memory
and network improvements
• Access to the processor is the new bottleneck
• Project Tungsten in Spark: leverage the
processor’s heuristics for executing code and
fetching memory
• Does not account for the fact that the problem is
numerical
11
12. Asynchronous vs. synchronous
• Asynchronous algorithms perform updates concurrently
• Spark is synchronous model, deep learning frameworks
usually asynchronous
• A large number of ML computations are synchronous
• Even deep learning may benefit from synchronous
updates
12
13. Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
13
14. GPGPUs
14
• Graphics Processing Units for General Purpose
computations
6000
Theoretical peak
throughput
GPU CPU
Theoretical peak
bandwidth
GPU CPU
15. • Library for writing “machine intelligence”
algorithms
• Very popular for deep learning and neural
networks
• Can also be used for general purpose
numerical computations
• Interface in C++ and Python
15
Google TensorFlow
16. Numerical dataflow with Tensorflow
16
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
session = tf.Session()
output_value = session.run(output,
{x: 3, y: 5})
x:
int32
y:
int32
mul 3
z
17. Numerical dataflow with Spark
df = sqlContext.createDataFrame(…)
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
output_df = tfs.map_rows(output, df)
output_df.collect()
df: DataFrame[x: int, y: int]
output_df:
DataFrame[x: int, y: int, z: int]
x:
int32
y:
int32
mul 3
z
19. Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
19
20. 20
It is a communication problem
Spark worker process Worker python process
C++
buffer
Python
pickle
Tungsten
binary
format
Python
pickle
Java
object
22. • Estimation of
distribution from
samples
• Non-parametric
• Unknown bandwidth
parameter
• Can be evaluated with
goodness of fit
An example: kernel density scoring
22
23. • In practice, compute:
with:
• In a nutshell: a complex numerical function
An example: kernel density scoring
23
32. The future
• Integration with Tungsten:
• Direct memory copy
• Columnar storage
• Better integration with MLlib data types
• GPU instances in Databricks:
Official support coming this fall
32
33. Recap
• Spark: an efficient framework for running
computations on thousands of computers
• TensorFlow: high-performance numerical
framework
• Get the best of both with TensorFrames:
• Simple API for distributed numerical computing
• Can leverage the hardware of the cluster
33
34. Try these demos yourself
• TensorFrames source code and documentation:
github.com/databricks/tensorframes
spark-packages.org/package/databricks/tensorframes
• Demo notebooks available on Databricks
• The official TensorFlow website:
www.tensorflow.org
34
Explain that TensorFlow is a library for deep learning
list a few algorithms: deep learning, clustering, classification, etc.
business logic and analysis more concerned usually with complex structures: text, lists, associations like dictionaries
The bread and butter of data science can be told in 3 words: integers, floats and doubles.
Slicing and dicing data: matrices, vectors, reals
not everybody is a fortran or C++ programmer.
There is considerable friction in writing optimized algorithms.
How can we lower the barrier?
scale up or scale
The Holy Grail:a large number of specialized processors
you have 2 options: better computers or more computers
For all these configurations of hardware, there are even more frameworks and libraries to access them, and each of them has strengths and weaknesses
the classics for single machine use
the distributed frameworks: Spark, Mahout, MapReduce
the libraries to access specialized hardware: CUDA and OpenCL for parallel programming
in the middle, MPI: it is hard to program and it is not very resilient to hardware failures
Then frameworks built on top of these in the recent years for deep learning and computer vision
The trend is to have multiple graphic cards communicate
MLlib has KDE, but how about making it work for other data types like floats, or other kernels?
my phd adviser used to tell me that you always have to include one equation to show that you mean serious business
do not talk about UDF, simply say you can wrap scala function inside the SQL engine
UDF: it is a scala function and you can run it inside a SQL query
start from login,homepage
disable debug menu
go more slowly for demo