What are neural networks? How to use the neural networks algorithm in Apache Spark MLlib? What is Deep Learning? Presented at Data Science Meetup at Galvanize on 2/17/2016.
For code see IPython/Jupyter/Toree notebook at http://nbviewer.jupyter.org/gist/asimjalis/4f911882a1ab963859ce
3. ASIM JALIS
Galvanize/Zipfian, Data Engineering
Cloudera, Microso!, Salesforce
MS in Computer Science from University of Virginia
https://www.linkedin.com/in/asimjalis
6. DO YOU WANT TO . . .
Play with terabytes of data
Build data applications using Spark, Hadoop, Hive, Kafka,
Storm, HBase
Use Data Science algorithms at scale
7. WHAT IS INVOLVED?
Learn concepts in interactive lectures
Develop skills in hands-on labs
Design and build your Capstone Project
Show project to SF tech companies at Hiring Day
10. WHAT IS THIS TALK ABOUT?
What are Neural Networks and how do they work?
What is Deep Learning?
What is the difference?
How can we build neural networks in Apache Spark?
19. Neuron is a mathematical function
Adds up (weighted) inputs
Applies the sigmoid function
This determines if it fires or not
20. WHAT ARE NEURAL NETWORKS?
Biologically inspired machine learning algorithm
Mathematical neurons arranged in layers
Accumulate signals from the previous layer
Fire when signal reaches threshold
30. WHAT ARE THE DOWNSIDES OF
NO HIDDEN LAYERS?
Only works if data is linearly separable.
Identical to logistic regression.
31. MULTILAYER PERCEPTRON
For most realistic classification tasks you will need a
hidden layer.
Rule of thumb:
Number of hidden layers equals one
Number of neurons in hidden layer is mean of size of
input and output layers.
35. FEED FORWARD
Also called forward propagation or forward prop
Initialize inputs
Weigh inputs into hidden layer, sum, apply sigmoid
Calculate activation of hidden layer
Weight inputs into output layer, sum, apply sigmoid
Calculate activation of output layer
36. BACK PROPAGATION
Use forward prop to calculate the error
Error is function of all network weights
Adjust weights using gradient descent
Repeat with next record
Keep going over training set until convergence
43. DOWNSIDE OF MULTIPLE LAYERS
Number of weights is a product of the layer sizes
The mathematics quickly becomes intractable
Particularly when your input is an image with tens of
thousands of pixels
47. Framework for processing data across a cluster
By sending the code to the data
And executing the code where the data lives
48. WHAT IS MLLIB?
Library for Machine Learning.
Builds on top of Spark RDDs.
Provides RDDs for Machine Learning.
Implements common Machine Learning algorithms.
51. WHAT IS APACHE TOREE?
Like IPython Notebook but for Spark/Scala.
Jupyter kernel for Spark/Scala.
52. HOW CAN I INSTALL TOREE?
Use pip to install IPython or Jupyter.
Install Apache Spark by downloading tgz file and
expanding.
SPARK_HOME=$HOME/spark-1.6.0
pip install toree
jupyter toree install
--spark_home=$SPARK_HOME
53. HOW CAN I RUN A TOREE
NOTEBOOK
jupyter notebook
Visit
Create new notebook.
Set kernel to Toree.
sc in notebook should print Spark Context.
http://localhost:8888
55. HOW CAN I FIGURE OUT HOW
MANY LAYERS?
To figure out how many layers to use and what topology
to use you have to rely on standard machine learning
techniques.
Use cross-validation.
In general k-fold cross validation.
10-fold cross validation is popular.
56. WHAT IS 10-FOLD CROSS
VALIDATION OR K-FOLD CROSS
VALIDATION?
57. Split your data into 10 (or in general k) equal-sized
subsets.
Train model on 9 of them, set one aside for cross-
validation.
Validate model on 10th and remember your error rate.
Repeat by setting aside each one of the 10.
Average the 10 error rates.
Then repeat for the next model.
Choose the model with the lowest error rate.
58. HOW DO I DEPLOY MY NEURAL
NETWORK INTO PRODUCTION?
There are two phases.
The training phase can be run on the back-end servers.
Cross-validate your model and its hyper-parameters on
the back-end.
Then deploy the model to the front-end servers, browsers,
devices.
The front-end only uses forward prop and is always fast.
60. WHAT IS DEEP LEARNING?
Deep Learning is a learning method that can train the
system with more than 2 or 3 non-linear hidden layers.
61. WHAT IS DEEP LEARNING?
Machine learning techniques which enable unsupervised
feature learning and pattern analysis/classification.
The essence of deep learning is to compute
representations of the data.
Higher-level features are defined from lower-level ones.
62. HOW IS DEEP LEARNING
DIFFERENT FROM REGULAR
NEURAL NETWORKS?
Training neural networks requires applying gradient
descent on millions of dimensions.
This is intractable for large networks.
Deep learning places constraints on neural networks.
This allows them to be solvable iteratively.
The constraints are generic.
63. WHAT IS THE BIG DEAL ABOUT
IT?
AlexNet submitted to the ImageNet ILSVRC challenge in
2012 is partly responsible for the renaissance.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used
Deep Learning techniques.
They combined this with GPUs, some other techniques.
The result was a neural network that could classify images
of cats and dogs.
It had an error 16% compared to 26% for the runner up.
66. WHAT ARE THE DIFFERENT KINDS
OF DEEP ARCHITECTURES?
Generative
Discriminative
Hybrid
67. WHAT ARE GENERATIVE
ARCHITECTURES
Extract features from data
Find common features in unlabelled data
Like Principal Component Analysis
Unsupervised: no labels required
69. WHAT ARE HYBRID
ARCHITECTURES?
STEP 1
Combination of generative and discriminative
Extract features using generative network
Use unsupervised learning
STEP 2
Train discriminative network on extracted features
Use supervised learning
70. WHAT ARE AUTO-ENCODERS?
An auto-encoder is a learning algorithm.
It applies backpropagation and sets the target values to
be equal to its inputs.
In other words it trains itself to do the identity
transformation.
71.
72. WHY DOES IT DO THIS?
By placing constraints on it, like restricting the number of
hidden neurons, it can find a good representation of the
data.
74. It is unsupervised.
The data is unlabeled.
Auto-encoders are similar to PCA (Principal Component
Analysis).
PCA is a technique for reducing the dimensions of data.
75. WHAT ARE CONVOLUTION
NEURAL NETWORKS?
Feedforward neural networks.
Connection pattern inspired by visual cortex.
76.
77. CONVOLUTION NEURAL
NETWORKS
The convolution layer’s parameters are a set of learnable
filters.
Every filter is small along width and height.
During the forward pass, each filter slides across the width
and height of the input, producing a 2-dimensional
activation map.
As we slide across the input we compute the dot product
between the filter and the input.
78. CONVOLUTION NEURAL
NETWORKS
Intuitively, the network learns filters that activate when
they see a specific type of feature anywhere.
In this way it creates translation invariance.
79. WHAT IS A POOLING LAYER?
The pooling layer reduces the resolution of the image
further.
It tiles the output area with 2x2 mask and takes the
maximum activation value of the area.
80.
81. DOES SPARK SUPPORT DEEP
LEARNING?
Not directly yet
https://issues.apache.org/jira/browse/SPARK-2352
83. Theano: Low-level GPU-enabled tensor library.
Lasagne, Blocks: NN libraries that make Theano easier to
use.
Torch7: NN library. Uses Lua for binding. Used by
Facebook and Google.
Caffe: NN library by Berkeley AMPLab.
Pylearn2: ML library based on Theano by University of
Toronto. Google DeepMind.
cuDNN: NN library by Nvidia based on CUDA. Can be used
with Torch7, Caffe.
Chainer: NN library that uses CUDA.
TensorFlow: NN library from Google.
84. WHAT LANGUAGE ARE THESE IN?
All the frameworks support Python.
Except Torch7 which uses Lua for its binding language.
85. WHAT CAN I DO ON SPARK?
SparkNet: Integrates running Caffe with Spark.
Sparkling Water: Integrates H2O with Spark.
DeepLearning4J: Built on top of Spark.
TensorFlow on Spark (experimental)