SlideShare a Scribd company logo
1 of 118
Reproducibility in
Artificial
Intelligence
By Carlos Toxtli
Introduction
Some AI projects that I've done
● Hum2Song : Compose the musical accompaniment of a melody produced by a human voice.
● MultiAffect : Reproducible Research Framework for Multimodal Affect and Action Recognition
● AutomEditor: AutomEditor is an AI-based video editor.
● DeepStab: Real-time Video Object Stabilization tool by using Deep Learning
● DeepPiracy: Video piracy detection system by using Longest Common Subsequence and DL
● VR-360-musi: Transforms a Youtube video into five stems by using AI and place them into a room.
● ReputationAgent: System that detects inaccurate and unfair reviews given to gig workers.
● TaskBot: Research and development of a bot that helps teams to delegate tasks
● ExpertTwin: Enhanced workspace by an AI agent that provides content to knowledge workers
● LivenessDetection: Design and development of Machine VIsion algorithms to validate identity
● QuantumDrugDiscovery: Drug discovery by using Quantum Computing.
● Awesome Machine Learning Jupyter Notebooks for Colab: Curated list of notebooks
● Awesome Robotic Process Automation: Curated list of notebooks
● Artificial Intelligence By Example Second Edition, Book
● Explainable AI, Book
● Among others ...
Index
● Overview
● Reproducibility problems
● Solutions for reproducibility
● Understanding techniques
● Conclusions
What is Reproducibility?
Reproducibility means obtaining consistent computational
results using the same input data, computational steps,
methods, code, and conditions of analysis.
Replicability means obtaining consistent results across
studies aimed at answering the same scientific question,
each of which has obtained its own data.
Causes
Researchers over the years have investigated the factors that affect reproducibility in
data science related studies. Some common findings point that non-reproducible
studies:
● Lack information or access to the dataset in its original form and order
● The software environment used
● Randomization control
● The actual implementation of the proposed techniques
● Some studies require a large number of computational resources that not
everybody can afford.
Looking for solutions ...
During my work on academia I have explored three different solutions
● Reproducibility framework
● Reproducible benchmarking
● Reproducible standalone methods
Reproducible Framework for Multimodal Tasks
http://bit.ly/multiaffect
Text Classification Benchmark
http://bit.ly/ai-text-workshop
Machine Learning notebooks (~100)
http://bit.ly/awesome-ai
My journey
I will explain what is needed
to produce and use any of
these approaches.
Reproducibility framework
A reproducible research framework standarizes:
● Data processing
● Feature engineering
● Training methods
● Evaluation methods
● Research document formatting
● Administration interface
Inclusiveness
Additionally it should be accessible to have a
broader impact, some of the desired features may
be:
● No client requirements (online)
● No special hardware requirements
● No extra configuration
● Free of charge
MultiAffect: Reproducible Research
Framework for Multimodal Video
Classification and Regression Tasks at
utterance-level with spatio-temporal
feature fusion by using Face, Body,
Audio, Text, and Emotion features
So with this in mind, I created MultiAffect
MultiAffect
framework
The main goal of MultiAffect is to give guidance on how to
reproduce research experiments in a fixed setting.
These are the 5 main components:
● Platform Setup: Ensures that the machine is
properly configured
● Feature Extractor: Monitors the feature extraction
and manage the extracted features
● Model Trainer: Defines, trains, and fine-tunes the
model
● Evaluator: Calculates and reports the performance
metrics.
● Research Paper Template: Defines the minimum set
of sections and mandatory citations
Platform Setup
Preparing a host machine to replicate machine learning research is usually
challenging, time-consuming, and expensive. One of the reasons is that
most of the models available today require a large scale dataset for
training. Hence, multimedia datasets have a high storage requirement. In
machine learning tasks, the feature extraction step helps algorithms to
reduce the dimensionality of the data and aids the model to focus on their
most significant or discriminative parameters. However, extracting
features from multimedia samples is a highly demanding task in terms of
computation.
Dealing with faulty code and
compiled libraries
Some of the tools that are required to perform the data extraction
need to be compiled for the host operating system. Scientific tools
are commonly built from multiple libraries and sometimes depend on
specific versions of certain libraries for certain operating systems;
this makes them prone to throw compilation errors. Sometimes the
code is not given, and there is an extra effort to code the
instructions described in the publication. Even if the code is available,
sometimes the code is not ready to reproduce, and important
efforts should be performed to make it work when works.
The solution is a
virtual machine
The software challenges can be mitigated by using virtual machines or
containers. Virtual machines and containers give a base operating system
that can contain the proper configuration built-in. These approaches can
run in the top of the host operating system or in online infrastructure. The
hardware challenges can be overcome by investing in powerful enough
infrastructure in-site or by using online on-demand infrastructure.
Conventional research paper replication depends on multiple factors as we
have explored.
MultiAffect over
Google
Colaboratory
The MultiAffect framework uses Google
Colaboratory to publish the Jupyter interactive
notebook and to perform the computation in
the attached virtual machine. Google
Colaboratory is a free research tool that enables
users with a Google account to host and run code
over Google's infrastructure. Google
Colaboratory offers users the ability to execute
their code segments in CPUs, GPUs, and TPUs
(an AI accelerator application-specific integrated
circuit). By the time this work is published, Google
Colaboratory offers a virtual machine with a Tesla
K80 GPU, 12 GB of RAM, and 350 GB of storage.
This platform provides enough resources to
perform video action recognition.
Ubuntu as
Operating
System
This platform includes a Debian based operating setting, so the provided
instructions are platform-specific. Local replication of our framework
requires an Ubuntu 18.04 operating system in order to install all the
libraries successfully. Our platform is agnostic to the Python version, all
the code executed in the notebook is written in Python, and it can be
executed in the versions 2 or 3 of the interpreter. Our framework is able
to set up and run the experiment from the online platform, enabling
users to deploy and execute the code in a free of charge environment
and without special requirements in the client-side.
Fine tuning the setup
process
The definition of the setup was an incremental
process of three main steps: (1) Initial setup:
The first functional version; (2) Packing
components: Uploading components in
batches to cloud storage; and (3) Optimal
setup: A version that loads faster.
Initialsetup
In this step, the libraries were downloaded and compiled
directly from the notebook by running shell commands
from the notebook cells. Pre-requisites, missing
dependencies, and additional packages were installed in
the same notebook. The dataset and the pre-trained
models were downloaded from their original sources to
the virtual machine. The feature extraction, training,
and evaluation code were directly inserted into the
notebook in separate cells. The first version was tested
until it successfully extracted the features, trained, and
evaluated the models from the notebook. A backup of this
notebook was documented and set as the initial version.
Packing components
Each individual compiled library was packaged into a zip file that contains
the binary files as well as the configuration files. The pre-trained models
that were individually downloaded from their original sources were packed
together into a single file. Sometimes the latency is reduced by
downloading a single large file from a high-speed source and increased
when downloading multiple large files from different bandwidths. The
outcome of this task is a collection of zip files that were uploaded to a
Google Drive account. The files were shared with public access to be able
to be downloaded in Google Colaboratory notebooks logged with different
accounts.
Optimal
setup
After packaging and storing the files from the initial setup
to the cloud, we started a branch of the initial setup that
loads these files. The optimal setup notebook was a
simplified version of the initial notebook, instead of having a
long section documenting the setup process, it was
replaced with a download pre-requisites section. The files
were downloaded by using a Python tool called GDown
that is already installed in Google Colaborary. It is important
to mention that the virtual machine attached to the Google
Colaboratory notebooks has already an Ubuntu
distribution with the most common machine learning
tools and libraries already installed. This optimal version is
tailored to Google Colaboratory only.
Optimizing the loading time
Per each of the libraries installed, we measured the time that takes to
install the prerequisites plus the compilation time. In average, the
overall setup of each library was five times slower than downloading
and extracting a previously compiled and zipped version of the library.
The total setup time for the Google Colaboratory environment was
reduced from 43 minutes to 6 minutes after implementing the pre-
compiled tools strategy and by downloading the files from the same
Google infrastructure.
Feature extractors
MultiAffect includes a feature extraction module as an independent component.
Multimodal feature extraction is often a highly demanding task, as it requires a
certain pre-processing of the videos before being able to extract features. Some
common pre-processing tasks are: separating the audio, extracting frames,
identifying faces, cropping faces, removing the background, skelethon detection
(pose), emotion detection, among many other procedures. Our feature extraction
methodology is based on the common ground found in submissions. Our feature
extraction process aims to maintain as invariant factors features such as the person
descriptors (i.e., gender, age, race), scale, position, background, and language. Our
approach considers ten features from five different modalities: face, body, audio,
text, and emotions.
Audio features
OpenSMILE (1582 features): The audio is
extracted from the videos and are processed
by OpenSMILE that extract audio features such
as loudness, pitch, jitter, etc.
It was tested on video-clip length (general) and
20 fragments (temporal).
Text features
Opinion Lexicon (6 features): depends on the ratio of
sentiment words (adjectives, adverbs, verbs and
nouns), which express positive or negative sentiments.
Subjective Lexicon (4 features): They used the
subjective Lexicon from MPQA (Multi-Perspective
Question Answering) that models the sentiment by its
type and intensity.
Word vectors GloVe, and BERT embeddings
Face features
OpenFace (709 features): Facial behavior analysis tool
that provides accurate facial landmark detection,
head pose estimation, facial action unit recognition,
and eye-gaze estimation. We get points that
represents the face.
VGG16 FC6 (4096 features): The faces are cropped
(224×224×3), aligned, zero out the background, and
passed through a pretrained VGG16 to get a take a
dimensional feature vector from FC6 layer.
Body Features
OpenPose (BODY_25) (11
features): The normalized angles
between the joints.I did not use
the calculated features because
were 25x224x224
VGG16 FC6 Skelethon image (4096
features): I drew the skeleton (neck
in the center) on a black
background and feed a VGG16 and
extracted a feature vector of the
FC6 layer.
Emotion features
EmoPy (7 features): A deep neural net toolkit for
emotion analysis via Facial Expression Recognition
(FER).
Other (28 features): Other 4 models from different FER
contest participants.
7 categories per model, 35 features in total
20 samples per video clip were predicted (temporal)
from there I computed its normalized sum (general)
Model trainer
The MultiAffect models use different deep
learning models to recognize affect. Among
them we find RNNs (Recurrent Neural
Networks), CNNs (Convolutional Neural
Networks), and simple DNNs (Deep Neural
Networks) as MLPs (Multilayer Perceptrons).
Evaluator
The MultiAffect framework is designed to perform classification and regression tasks.
Depending on the performed task, the platform is adjusted to display meaningful
evaluations. The classification task gives accuracy, F1-score, recall, precision, AUC
and other metrics for the training, validation, and testing sets. In the case of a
regression task, the framework computes the MSE (Mean Square Error) and CCC
(Concordance Correlation Coefficient) that describes how well a new test or
measurement reproduces a gold standard test.
Plotting the
results
The results obtained from our reproducible framework for
the classification task are two plots, one to visualize the
accuracy while training and one for the training and testing
loss; and a confusion matrix obtained while evaluating the
model on the test data. On the other hand, for the
regression tasks the results are displayed in a scatter plot
that shows the correlation between the predicted and gold
standard labels.
Experimentation
In order to test its generalizability, we performed experiments on
two main tasks: affect recognition and video action recognition.
The video action and affect recognition tasks are attacked through the
training and testing of classification and regression models,
respectively. One of the main goals of the proposed framework is to
be able to perform both actions by only configuring a new set of
variable without performing any change to the code. Another goal was
to deliver results comparable to existing work
Video Action Recognition for Automatic Video Editing
All (Quadmodal): BodyTF+FaceTF+AudioG+EmoT
acc_val acc_train acc_test f1_score f1_test Loss
All 1.00 1.00 0.90 1.00 0.90 0.01
Train Validation Test
Confusion matrices of the Quadmodal model
Affect recognition experiment
Metrics
arousal 0.3730994852
Valence 0.2109641637
Emotion: "Surprise"
Results (it shows an almost 45 degrees line)
Let's switch
approaches to
Benchmarking
You can use MultiAffect as a tool for any video
categorization and regression tasks. You can try it out
from this URL: http://bit.ly/multiaffect
Now if we want to compare which of the existing
techniques work better for your problem, then you will
need a tool that benchmarks all the methods. This is
why I adapted an existing Text Classification
Benchmarking tool to be used as a tool in the cloud,
you can find it out here: http://bit.ly/ai-text-workshop
Text Classification
Benchmarking tool
This is a Google Colaboratory notebook
with instructions that has these
methods:
● Word ngram + LR (Logistic
regression)
● Char ngram + LR
● (Word + Char ngram) + LR
● RNN no embedding
● RNN + GloVe embedding
● CNN (multi-channel):
● RNN + CNN
● Google BERT
Experiment, learning fairness from reviews
It promoted to improve fairness in reviews
The last approach, Independent ML,DL methods
Sometime you may know what is the best algorithm to
use for your requirements. In that case I adapted >100
notebooks to be able to use them as a tool and to train
models from the cloud by only uploading your data. You
can find it out here: http://bit.ly/awesome-ai
The process of adapting a notebook is 1) open in colab
from github 2) Add extra libraries 3) Download the data
from Drive.
Let’s do a quick recap of all the ML/DL/RL methods to
identify which method fit better to your problem.
Frameworks
● Sklearn
● Weka
● Matlab
Linear Regression
When to use it?
● Simple regression problems
○ How much the rent should cost in certain area
○ How much should I charge for specific amount of work
● Problems where we want to define a rule that separates two
categories that are similar, i.e. Premium or Basic price for customers
under certain parameters (number of rooms vs number of cars)
https://colab.research.google.com/drive/1-dTb2vCiZHa-DnyqlVFGOnMSNjvkIOTP
https://colab.research.google.com/drive/1Z20iJspQm2Y_wLI51wgE6nXGOSu1kG4W
https://colab.research.google.com/drive/1-yk3m6p3ylNLtTaEf3nya6exO_wv8f_L
XOR problem - Not linear
Decision Tree
Code
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import pydotplus
iris = load_iris()
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
dot_data = export_graphviz(clf, out_file=None, filled=True, rounded=True,
feature_names=iris.feature_names,
class_names=['Versicolor','Setosa','Virginica'])
graph = pydotplus.graph_from_dot_data(dot_data)
Decisions
map
When to use it?
● When we need to know what decisions the machine is taking
● When we need to explain to others how the features are evaluated
● When there are no much features
https://colab.research.google.com/drive/1Fc8qs1fwdcpoZ_-tTj32OBl-tCGlAe5c
Random Forest
Code
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
clf = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print('accuracy is',accuracy_score(y_pred, y_test))
When to use it?
● When we want to know alternatives of how to evaluate a problem.
● When we want to manually discard flows that are biased
● When we want to manage ensembles from one single method.
https://colab.research.google.com/drive/1WMOOtaHAMZPi-enVM8RRM_CC-grEtm9P
https://colab.research.google.com/drive/1jDdWp-CJybMJDX17jBmG5qoPPg9qj1sm
https://colab.research.google.com/drive/1-uDIRl1aYqmJX59rAJumHY1T20QqBJiQ
https://colab.research.google.com/drive/1-uDIRl1aYqmJX59rAJumHY1T20QqBJiQ
Naive Bayes
When to use it?
● When we want to know the probabilities of the different cases.
● When we need a probabilistic model.
● When we need an easy way to prove in paper
https://colab.research.google.com/drive/1qOCllKsBBrLeUnP-XAXHefXCtbuBWl69
https://colab.research.google.com/drive/11FiWH00vzygQp1T_pD0MCfMFg6FYsd01
k-Nearest Neighbor
When to use it?
● When intuition says that the problem can be solved from getting thee
most similar option.
● When the information is no exhaustive.
● When we want to justify the decision of the algorithm in a common
human reasoning.
https://colab.research.google.com/drive/1GeUVjDW74SxFxz2Nh3rqOlte-S2dblYv
https://colab.research.google.com/drive/1X12qds10ZfN7QCrmpRR2OXxa--PTyS5e
k-Means
When to use it?
● When we don’t know how to understand the data
● When we want to optimize resources by grouping related elements.
● When we want that the computer creates the labels for us.
https://colab.research.google.com/drive/1RL3oZm6LgnEChI1aOQZoMn1WDk-DQJiV
https://colab.research.google.com/drive/1yvy1scktjcDyydG2fZz2OJfRFAer0SEO
https://colab.research.google.com/drive/1CzEf6giBXPSQI5UJOhZrZfYKAJcH68wg
Support Vector Machine
When to use it?
● It was the most effective technique before Neural Networks, it can
achieve excellent results with less processing.
● Mathematically speaking, it is based in very strong math principles, it
creates complex multidimensional hyperplanes that separates the classes
precisely.
● It is not a white box technique, but may be the best option for problems
where we want to get the best of Machine Learning approach without
dealing with Neural Networks.
https://colab.research.google.com/drive/13PRk-GKeSivp4R-FIdjmYBQS7xWUco9C
Logistic Regression
No regularization
L2 regularization
When to use it?
● When we want to optimize a regression
● When we want to binarize the output
● As a preliminary analysis before implementing neural networks
https://colab.research.google.com/drive/1PWmvsZRaj3JQ8rtj6vlwhJhJpOrIAamT
https://colab.research.google.com/drive/1p8rcrSQB-thLSakUmCHjSbqI6vd-NkCq
https://colab.research.google.com/drive/1jhrAtmPgg6Uu0WzMzV-VakWlncQAvk-D
Perceptron
When to use it?
● When we have very few features and there is no extra details that can be
extracted from hidden layers.
● There are in fact neural networks, and we do not need alway to use
them for deep learning these can be used for machine learning when we
benchmark with other machine learning techniques.
● When we want to get the power of neural networks and we don’t have
much computational power.
https://colab.research.google.com/drive/10PvUh-8ZsVqQADqXSmRIDHGiCH9iypyO
ML & DL frameworks
It’s time for Deep Learning
Deep Learning
Artificial Neural Networks = Multi-Layer Perceptron
Frameworks
● Tensorflow
● Keras
● Pythorch
● Sklearn
When to use it?
● Classifiers when common machine Learning Algorithms performs
poorly.
● Models with much features.
● Multiple classes projects.
https://colab.research.google.com/drive/1GAYf5yMNBkVrag0z2Q4MPSwuqfRN1Wz
https://colab.research.google.com/drive/12YBDQFYXN8VruxKTfzDpbPsYFAEQceQP
https://colab.research.google.com/drive/1pyRqGmMG4-Mj8Wis5XrQ_a4dUJvYln1
https://colab.research.google.com/drive/1wHjugM56k0ay5QCmRVMBfAMF96EY7A5k
https://colab.research.google.com/drive/1Ly0BtKBphUdeqMQBO8Xjweku62Vq3UAX
Convolutional Neural Networks (CNN)
MNIST
Frameworks
● Tensorflow
● Keras
● Pytorch
● Caffe
When to use it?
● When we want to process images
● When we want to process videos
● When we have highly dimensional data
https://colab.research.google.com/drive/1jN8oswBOds4XuRbnQMxxDXDssmDD_rD9
https://colab.research.google.com/drive/1iEYJs75hat_URxshmCBMGzHQo5VgdRvN
https://colab.research.google.com/drive/1YHKZgpJuriGYjEzFDNGz2Hf0widu-exx
https://colab.research.google.com/drive/1gi2_Or0rDz5Gg9FkGJjFDxgeiwt5-lXm
https://colab.research.google.com/drive/1QcnY-LOZU9c7Sp2DsDVeYxLNBx87VNhn
Recurrent Neural Networks (RNN)
Frameworks
● Tensorflow
● Keras
● Pytorch
When to use it?
● When sequences are provided
○ Text sequences
○ Image sequences (videos)
○ Time series
● When we need to provide an ordered output
https://colab.research.google.com/drive/1twc5dBjgFLFuv8p-gPfnrscTPcBlkx5q
https://colab.research.google.com/drive/10-ou-Za75bFgwArvgP3QfNJ4cWuwY-eF
https://colab.research.google.com/drive/1PEOqq8mBcmc-FMj8lpbVF93cQI4RLgVJ
https://colab.research.google.com/drive/1XUEAFxxKVmdgC7oPOzVpGInXfUeTcgIQ
https://colab.research.google.com/drive/1tfDDriSDUh_J9OHwjt-NzT8xRiEDQF7x
Mixed approaches
Ensembles
Mixed Deep learning features
When to use it?
● When we want to benchmark models
● When different models are stronger when these are evaluated together
● When the individual processing is not exhaustive
https://colab.research.google.com/drive/1Kg_nHBmUGQ1zepU-wZlDwMyM-YrlMTUX
https://colab.research.google.com/drive/1U86EVD-6ulYMxTzDX8-m6nEptYq0yaej
AutoML
Frameworks
● TPOT
● MLBox
● H2O
● Google AutoML
When to use it?
● On every new model
● When we have enough time to train multiple models
● When we don’t know wich hyperparameters are better.
https://colab.research.google.com/drive/1gTBDfbJy9SsgbUPRhL_mrujw6HC2BjxN
https://colab.research.google.com/drive/17Ii6Nw89gZT8l_XrvSQhNWaa_VfcdLBn
https://colab.research.google.com/drive/1xe4G_dqsPMq0n3w_Mqlm-39j5TMUqHJR
Reinforcement Learning
Frameworks
● OpenAI Gym
● Google Dopamine
● RLLib
● Keras-RL
● Tensorforce
● Facebook Horizon
When to use it?
● When a robot explores a place and needs to learn from the environment.
● When we can try as much as we can in a simulator.
● When we want to find the most optimal path
https://colab.research.google.com/drive/1fgv5UWhHR7xSwZfwwltF4OFDYqtWdlQD
https://colab.research.google.com/drive/14aYmND2LKtaPTW3JWS7scKGwU9baxHeE
https://colab.research.google.com/drive/16Scl43smvcXGZFEGITs15_SN_7-EidZd
Techniques to improve the learning process
Principal Component Analysis (PCA)
Feature selection
When to use it?
● When we have too much features and we do not know which of them
are useful.
● When we want to reduce the dimensionality of our model.
● When we want to plot our decision boundaries.
https://colab.research.google.com/drive/1CO6BACds6J8hGPYlEU2INnSTpT0EmS74
https://colab.research.google.com/drive/1VU2SO3IfklPkK1EPMnwiO7trJslt79OZ
Data Augmentation
When to use it?
● When we have limited data
● When we want to help our model to generalize more
● When our unseen data comes in very different formats.
https://colab.research.google.com/drive/1ANIc7tXrggPT2I9JzpBlZQ3BBhCpbJUJ
https://colab.research.google.com/drive/1cQRVdiDc9xraHZYLu3VrXxX4FKXoaS8U
https://colab.research.google.com/drive/1O5far2FC4GlAc9pkLPZqsjKreCpI4S_-
Generative models
Discriminative: Predicts from Data
Generative: Generates from data distribution
Generative models
● Autoencoders
● Adversarial Networks
● Sequence Models
● Transformers
Frameworks
● Tensorflow
● Keras
● Pytorch
Autoencoders
When to use it?
● When we want to compress data.
● When we need to change one type of input to other type of output.
● When we don’t need much variability in the generated data.
https://colab.research.google.com/drive/1QxXqnhyqIZrrGtor2tVa4jY63adS4yc0
Generative Adversarial Networks
When to use it?
● When we need to transfer a style
● When we need more variability in the generated output
● When we need to keep context in the generation.
https://colab.research.google.com/drive/1YOYH78YQAgPBRIpUPhh_e0cFLNu-BPVo
https://colab.research.google.com/drive/1POZpWN-2M5hy3D2ATWzJs2LC5sk7hpts
https://colab.research.google.com/drive/1aKywiJ5p0eCwDIIWKe8Q205rcKqmR_VX
https://colab.research.google.com/drive/1QxXqnhyqIZrrGtor2tVa4jY63adS4yc0
https://colab.research.google.com/drive/1Lw7BqKABvtiSyUHg9DeM5f90_WFGB7uz
Sequence models
When to use it?
● When we generate text
● When we generate the next sequence from a serie
● When the order in the generated output matters.
https://colab.research.google.com/drive/1ZB-oueLvBgltXshb1lDV2EpqbqV6FC5x
When to use it?
● When context is an essential part of the generated output
● When we need to keep consistency in the frequency space.
● When we have enough computational resources.
https://colab.research.google.com/drive/1jWaRkii6xLkxxAPyfudeGJsHf_jokqXG
Put notebooks into production
It seems that running code from a notebook in the cloud is just for testing
purposes, but actually you can run it as a service by running from a Docker
container locally.
I created a script that automatically prepares a container and execute it
every time you need as a command line application.
Example:
docker run psykohack/google-colab https://colab.research.google.com/drive/133DIr7lvkuaNU_X2JN5id3XmtSXQspy9
Code: https://github.com/toxtli/google-colab-docker
Resources
More and more AI research is being distributed nowadays in redistributable
format. Some valuable resources can be found in:
https://www.paperswithcode.com/
https://www.kaggle.com/
Conclusions
● Nowadays we can reproduce state-of-the-art AI algorithms from a web
based platform.
● Complex tasks can be executed in notebooks structured as
frameworks
● Our main job is to prepare the data to feed the algorithm that fits the
most to our needs.
● AI prototyping is drastically accelerated by using this technologies.
● Since these technologies are between pure-code and pure-tool
approaches, that gives the flexibility to iterate faster.
Thanks
@ctoxtli
http://www.carlostoxtli.com
http://facebook.com/carlos.toxtli

More Related Content

What's hot

Qtp Interview Questions
Qtp Interview QuestionsQtp Interview Questions
Qtp Interview Questionskspanigra
 
JPQL/ JPA Activity 1
JPQL/ JPA Activity 1JPQL/ JPA Activity 1
JPQL/ JPA Activity 1SFI
 
Asynchronous Programming in Android
Asynchronous Programming in AndroidAsynchronous Programming in Android
Asynchronous Programming in AndroidJohn Pendexter
 
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2Pascal Rapicault
 
Peering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityPeering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityVMware Tanzu
 
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...0xdaryl
 
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)Niraj Solanke
 
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbSenthilKumar Selvaraj
 
Micronaut: A new way to build microservices
Micronaut: A new way to build microservicesMicronaut: A new way to build microservices
Micronaut: A new way to build microservicesLuram Archanjo
 
Reverse engineering android apps
Reverse engineering android appsReverse engineering android apps
Reverse engineering android appsPranay Airan
 
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank WierzbickiJython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank Wierzbickifwierzbicki
 
Qtp interview questions and answers
Qtp interview questions and answersQtp interview questions and answers
Qtp interview questions and answersITeLearn
 
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenIBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenPaul Withers
 
Domino OSGi Development
Domino OSGi DevelopmentDomino OSGi Development
Domino OSGi DevelopmentPaul Fiore
 
Toward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareToward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareZongXian Shen
 
Post-mortem Debugging of Windows Applications
Post-mortem Debugging of  Windows ApplicationsPost-mortem Debugging of  Windows Applications
Post-mortem Debugging of Windows ApplicationsGlobalLogic Ukraine
 
p2, modular provisioning for OSGi
p2, modular provisioning for OSGip2, modular provisioning for OSGi
p2, modular provisioning for OSGiPascal Rapicault
 
Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Doug Hawkins
 

What's hot (20)

Qtp Interview Questions
Qtp Interview QuestionsQtp Interview Questions
Qtp Interview Questions
 
JPQL/ JPA Activity 1
JPQL/ JPA Activity 1JPQL/ JPA Activity 1
JPQL/ JPA Activity 1
 
Asynchronous Programming in Android
Asynchronous Programming in AndroidAsynchronous Programming in Android
Asynchronous Programming in Android
 
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2
 
Peering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityPeering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for Observability
 
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
 
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)
 
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with Gdb
 
Micronaut: A new way to build microservices
Micronaut: A new way to build microservicesMicronaut: A new way to build microservices
Micronaut: A new way to build microservices
 
Reverse engineering android apps
Reverse engineering android appsReverse engineering android apps
Reverse engineering android apps
 
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank WierzbickiJython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
 
Qtp interview questions and answers
Qtp interview questions and answersQtp interview questions and answers
Qtp interview questions and answers
 
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenIBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
 
Domino OSGi Development
Domino OSGi DevelopmentDomino OSGi Development
Domino OSGi Development
 
Introduction to jython
Introduction to jythonIntroduction to jython
Introduction to jython
 
Toward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareToward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malware
 
Post-mortem Debugging of Windows Applications
Post-mortem Debugging of  Windows ApplicationsPost-mortem Debugging of  Windows Applications
Post-mortem Debugging of Windows Applications
 
p2, modular provisioning for OSGi
p2, modular provisioning for OSGip2, modular provisioning for OSGi
p2, modular provisioning for OSGi
 
Gwt portlet
Gwt portletGwt portlet
Gwt portlet
 
Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011
 

Similar to Reproducibility in artificial intelligence

Continuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleContinuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleGary Lefman
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
 
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!Soluto
 
Advantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksAdvantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksKaty Slemon
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsFedir RYKHTIK
 
Making software development processes to work for you
Making software development processes to work for youMaking software development processes to work for you
Making software development processes to work for youAmbientia
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with PythonBrian Lyttle
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
Don't Fear the Autotools
Don't Fear the AutotoolsDon't Fear the Autotools
Don't Fear the AutotoolsScott Garman
 
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Bootandrewmurraympc
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsJules Pierre-Louis
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Demi Ben-Ari
 
Summer project- Jack Fletcher
Summer project- Jack Fletcher Summer project- Jack Fletcher
Summer project- Jack Fletcher Jack Fletcher
 

Similar to Reproducibility in artificial intelligence (20)

Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Continuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleContinuous Localisation On A Massive Scale
Continuous Localisation On A Massive Scale
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
 
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!
 
Advantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksAdvantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworks
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and Projects
 
Making software development processes to work for you
Making software development processes to work for youMaking software development processes to work for you
Making software development processes to work for you
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
 
Docker in Production at the Aurora Team
Docker in Production at the Aurora TeamDocker in Production at the Aurora Team
Docker in Production at the Aurora Team
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Mannu_Kumar_CV
Mannu_Kumar_CVMannu_Kumar_CV
Mannu_Kumar_CV
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
DevOps-Roadmap
DevOps-RoadmapDevOps-Roadmap
DevOps-Roadmap
 
Don't Fear the Autotools
Don't Fear the AutotoolsDon't Fear the Autotools
Don't Fear the Autotools
 
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
 
Summer project- Jack Fletcher
Summer project- Jack Fletcher Summer project- Jack Fletcher
Summer project- Jack Fletcher
 

More from Carlos Toxtli

Artificial intelligence and open source
Artificial intelligence and open sourceArtificial intelligence and open source
Artificial intelligence and open sourceCarlos Toxtli
 
Bots in robotic process automation
Bots in robotic process automationBots in robotic process automation
Bots in robotic process automationCarlos Toxtli
 
How to implement artificial intelligence solutions
How to implement artificial intelligence solutionsHow to implement artificial intelligence solutions
How to implement artificial intelligence solutionsCarlos Toxtli
 
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Carlos Toxtli
 
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototypingCarlos Toxtli
 
Inteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroInteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroCarlos Toxtli
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Carlos Toxtli
 
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCarlos Toxtli
 
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - MegatendenciasCarlos Toxtli
 
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConCarlos Toxtli
 
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementCarlos Toxtli
 
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completaCarlos Toxtli
 
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaCarlos Toxtli
 
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Carlos Toxtli
 
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)Carlos Toxtli
 
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Carlos Toxtli
 
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsCarlos Toxtli
 
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompCarlos Toxtli
 

More from Carlos Toxtli (20)

Artificial intelligence and open source
Artificial intelligence and open sourceArtificial intelligence and open source
Artificial intelligence and open source
 
Bots in robotic process automation
Bots in robotic process automationBots in robotic process automation
Bots in robotic process automation
 
How to implement artificial intelligence solutions
How to implement artificial intelligence solutionsHow to implement artificial intelligence solutions
How to implement artificial intelligence solutions
 
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...
 
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototyping
 
Inteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroInteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to Hero
 
Bots for Crowds
Bots for CrowdsBots for Crowds
Bots for Crowds
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
 
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificial
 
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - Megatendencias
 
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsCon
 
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task Management
 
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completa
 
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en Latinoamérica
 
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
 
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)
 
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)
 
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startups
 
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiComp
 

Recently uploaded

VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 

Recently uploaded (20)

VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 

Reproducibility in artificial intelligence

  • 3. Some AI projects that I've done ● Hum2Song : Compose the musical accompaniment of a melody produced by a human voice. ● MultiAffect : Reproducible Research Framework for Multimodal Affect and Action Recognition ● AutomEditor: AutomEditor is an AI-based video editor. ● DeepStab: Real-time Video Object Stabilization tool by using Deep Learning ● DeepPiracy: Video piracy detection system by using Longest Common Subsequence and DL ● VR-360-musi: Transforms a Youtube video into five stems by using AI and place them into a room. ● ReputationAgent: System that detects inaccurate and unfair reviews given to gig workers. ● TaskBot: Research and development of a bot that helps teams to delegate tasks ● ExpertTwin: Enhanced workspace by an AI agent that provides content to knowledge workers ● LivenessDetection: Design and development of Machine VIsion algorithms to validate identity ● QuantumDrugDiscovery: Drug discovery by using Quantum Computing. ● Awesome Machine Learning Jupyter Notebooks for Colab: Curated list of notebooks ● Awesome Robotic Process Automation: Curated list of notebooks ● Artificial Intelligence By Example Second Edition, Book ● Explainable AI, Book ● Among others ...
  • 4. Index ● Overview ● Reproducibility problems ● Solutions for reproducibility ● Understanding techniques ● Conclusions
  • 5. What is Reproducibility? Reproducibility means obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis. Replicability means obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
  • 6.
  • 7. Causes Researchers over the years have investigated the factors that affect reproducibility in data science related studies. Some common findings point that non-reproducible studies: ● Lack information or access to the dataset in its original form and order ● The software environment used ● Randomization control ● The actual implementation of the proposed techniques ● Some studies require a large number of computational resources that not everybody can afford.
  • 8. Looking for solutions ... During my work on academia I have explored three different solutions ● Reproducibility framework ● Reproducible benchmarking ● Reproducible standalone methods
  • 9. Reproducible Framework for Multimodal Tasks http://bit.ly/multiaffect
  • 11. Machine Learning notebooks (~100) http://bit.ly/awesome-ai
  • 12. My journey I will explain what is needed to produce and use any of these approaches.
  • 13. Reproducibility framework A reproducible research framework standarizes: ● Data processing ● Feature engineering ● Training methods ● Evaluation methods ● Research document formatting ● Administration interface
  • 14. Inclusiveness Additionally it should be accessible to have a broader impact, some of the desired features may be: ● No client requirements (online) ● No special hardware requirements ● No extra configuration ● Free of charge
  • 15. MultiAffect: Reproducible Research Framework for Multimodal Video Classification and Regression Tasks at utterance-level with spatio-temporal feature fusion by using Face, Body, Audio, Text, and Emotion features So with this in mind, I created MultiAffect
  • 16. MultiAffect framework The main goal of MultiAffect is to give guidance on how to reproduce research experiments in a fixed setting. These are the 5 main components: ● Platform Setup: Ensures that the machine is properly configured ● Feature Extractor: Monitors the feature extraction and manage the extracted features ● Model Trainer: Defines, trains, and fine-tunes the model ● Evaluator: Calculates and reports the performance metrics. ● Research Paper Template: Defines the minimum set of sections and mandatory citations
  • 17. Platform Setup Preparing a host machine to replicate machine learning research is usually challenging, time-consuming, and expensive. One of the reasons is that most of the models available today require a large scale dataset for training. Hence, multimedia datasets have a high storage requirement. In machine learning tasks, the feature extraction step helps algorithms to reduce the dimensionality of the data and aids the model to focus on their most significant or discriminative parameters. However, extracting features from multimedia samples is a highly demanding task in terms of computation.
  • 18. Dealing with faulty code and compiled libraries Some of the tools that are required to perform the data extraction need to be compiled for the host operating system. Scientific tools are commonly built from multiple libraries and sometimes depend on specific versions of certain libraries for certain operating systems; this makes them prone to throw compilation errors. Sometimes the code is not given, and there is an extra effort to code the instructions described in the publication. Even if the code is available, sometimes the code is not ready to reproduce, and important efforts should be performed to make it work when works.
  • 19. The solution is a virtual machine The software challenges can be mitigated by using virtual machines or containers. Virtual machines and containers give a base operating system that can contain the proper configuration built-in. These approaches can run in the top of the host operating system or in online infrastructure. The hardware challenges can be overcome by investing in powerful enough infrastructure in-site or by using online on-demand infrastructure. Conventional research paper replication depends on multiple factors as we have explored.
  • 20. MultiAffect over Google Colaboratory The MultiAffect framework uses Google Colaboratory to publish the Jupyter interactive notebook and to perform the computation in the attached virtual machine. Google Colaboratory is a free research tool that enables users with a Google account to host and run code over Google's infrastructure. Google Colaboratory offers users the ability to execute their code segments in CPUs, GPUs, and TPUs (an AI accelerator application-specific integrated circuit). By the time this work is published, Google Colaboratory offers a virtual machine with a Tesla K80 GPU, 12 GB of RAM, and 350 GB of storage. This platform provides enough resources to perform video action recognition.
  • 21.
  • 22. Ubuntu as Operating System This platform includes a Debian based operating setting, so the provided instructions are platform-specific. Local replication of our framework requires an Ubuntu 18.04 operating system in order to install all the libraries successfully. Our platform is agnostic to the Python version, all the code executed in the notebook is written in Python, and it can be executed in the versions 2 or 3 of the interpreter. Our framework is able to set up and run the experiment from the online platform, enabling users to deploy and execute the code in a free of charge environment and without special requirements in the client-side.
  • 23. Fine tuning the setup process The definition of the setup was an incremental process of three main steps: (1) Initial setup: The first functional version; (2) Packing components: Uploading components in batches to cloud storage; and (3) Optimal setup: A version that loads faster.
  • 24. Initialsetup In this step, the libraries were downloaded and compiled directly from the notebook by running shell commands from the notebook cells. Pre-requisites, missing dependencies, and additional packages were installed in the same notebook. The dataset and the pre-trained models were downloaded from their original sources to the virtual machine. The feature extraction, training, and evaluation code were directly inserted into the notebook in separate cells. The first version was tested until it successfully extracted the features, trained, and evaluated the models from the notebook. A backup of this notebook was documented and set as the initial version.
  • 25. Packing components Each individual compiled library was packaged into a zip file that contains the binary files as well as the configuration files. The pre-trained models that were individually downloaded from their original sources were packed together into a single file. Sometimes the latency is reduced by downloading a single large file from a high-speed source and increased when downloading multiple large files from different bandwidths. The outcome of this task is a collection of zip files that were uploaded to a Google Drive account. The files were shared with public access to be able to be downloaded in Google Colaboratory notebooks logged with different accounts.
  • 26. Optimal setup After packaging and storing the files from the initial setup to the cloud, we started a branch of the initial setup that loads these files. The optimal setup notebook was a simplified version of the initial notebook, instead of having a long section documenting the setup process, it was replaced with a download pre-requisites section. The files were downloaded by using a Python tool called GDown that is already installed in Google Colaborary. It is important to mention that the virtual machine attached to the Google Colaboratory notebooks has already an Ubuntu distribution with the most common machine learning tools and libraries already installed. This optimal version is tailored to Google Colaboratory only.
  • 27. Optimizing the loading time Per each of the libraries installed, we measured the time that takes to install the prerequisites plus the compilation time. In average, the overall setup of each library was five times slower than downloading and extracting a previously compiled and zipped version of the library. The total setup time for the Google Colaboratory environment was reduced from 43 minutes to 6 minutes after implementing the pre- compiled tools strategy and by downloading the files from the same Google infrastructure.
  • 28. Feature extractors MultiAffect includes a feature extraction module as an independent component. Multimodal feature extraction is often a highly demanding task, as it requires a certain pre-processing of the videos before being able to extract features. Some common pre-processing tasks are: separating the audio, extracting frames, identifying faces, cropping faces, removing the background, skelethon detection (pose), emotion detection, among many other procedures. Our feature extraction methodology is based on the common ground found in submissions. Our feature extraction process aims to maintain as invariant factors features such as the person descriptors (i.e., gender, age, race), scale, position, background, and language. Our approach considers ten features from five different modalities: face, body, audio, text, and emotions.
  • 29. Audio features OpenSMILE (1582 features): The audio is extracted from the videos and are processed by OpenSMILE that extract audio features such as loudness, pitch, jitter, etc. It was tested on video-clip length (general) and 20 fragments (temporal).
  • 30. Text features Opinion Lexicon (6 features): depends on the ratio of sentiment words (adjectives, adverbs, verbs and nouns), which express positive or negative sentiments. Subjective Lexicon (4 features): They used the subjective Lexicon from MPQA (Multi-Perspective Question Answering) that models the sentiment by its type and intensity. Word vectors GloVe, and BERT embeddings
  • 31. Face features OpenFace (709 features): Facial behavior analysis tool that provides accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. We get points that represents the face. VGG16 FC6 (4096 features): The faces are cropped (224×224×3), aligned, zero out the background, and passed through a pretrained VGG16 to get a take a dimensional feature vector from FC6 layer.
  • 32. Body Features OpenPose (BODY_25) (11 features): The normalized angles between the joints.I did not use the calculated features because were 25x224x224 VGG16 FC6 Skelethon image (4096 features): I drew the skeleton (neck in the center) on a black background and feed a VGG16 and extracted a feature vector of the FC6 layer.
  • 33. Emotion features EmoPy (7 features): A deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER). Other (28 features): Other 4 models from different FER contest participants. 7 categories per model, 35 features in total 20 samples per video clip were predicted (temporal) from there I computed its normalized sum (general)
  • 34. Model trainer The MultiAffect models use different deep learning models to recognize affect. Among them we find RNNs (Recurrent Neural Networks), CNNs (Convolutional Neural Networks), and simple DNNs (Deep Neural Networks) as MLPs (Multilayer Perceptrons).
  • 35.
  • 36. Evaluator The MultiAffect framework is designed to perform classification and regression tasks. Depending on the performed task, the platform is adjusted to display meaningful evaluations. The classification task gives accuracy, F1-score, recall, precision, AUC and other metrics for the training, validation, and testing sets. In the case of a regression task, the framework computes the MSE (Mean Square Error) and CCC (Concordance Correlation Coefficient) that describes how well a new test or measurement reproduces a gold standard test.
  • 37. Plotting the results The results obtained from our reproducible framework for the classification task are two plots, one to visualize the accuracy while training and one for the training and testing loss; and a confusion matrix obtained while evaluating the model on the test data. On the other hand, for the regression tasks the results are displayed in a scatter plot that shows the correlation between the predicted and gold standard labels.
  • 38. Experimentation In order to test its generalizability, we performed experiments on two main tasks: affect recognition and video action recognition. The video action and affect recognition tasks are attacked through the training and testing of classification and regression models, respectively. One of the main goals of the proposed framework is to be able to perform both actions by only configuring a new set of variable without performing any change to the code. Another goal was to deliver results comparable to existing work
  • 39. Video Action Recognition for Automatic Video Editing
  • 40. All (Quadmodal): BodyTF+FaceTF+AudioG+EmoT acc_val acc_train acc_test f1_score f1_test Loss All 1.00 1.00 0.90 1.00 0.90 0.01
  • 41. Train Validation Test Confusion matrices of the Quadmodal model
  • 44. Results (it shows an almost 45 degrees line)
  • 45. Let's switch approaches to Benchmarking You can use MultiAffect as a tool for any video categorization and regression tasks. You can try it out from this URL: http://bit.ly/multiaffect Now if we want to compare which of the existing techniques work better for your problem, then you will need a tool that benchmarks all the methods. This is why I adapted an existing Text Classification Benchmarking tool to be used as a tool in the cloud, you can find it out here: http://bit.ly/ai-text-workshop
  • 46. Text Classification Benchmarking tool This is a Google Colaboratory notebook with instructions that has these methods: ● Word ngram + LR (Logistic regression) ● Char ngram + LR ● (Word + Char ngram) + LR ● RNN no embedding ● RNN + GloVe embedding ● CNN (multi-channel): ● RNN + CNN ● Google BERT
  • 48. It promoted to improve fairness in reviews
  • 49. The last approach, Independent ML,DL methods Sometime you may know what is the best algorithm to use for your requirements. In that case I adapted >100 notebooks to be able to use them as a tool and to train models from the cloud by only uploading your data. You can find it out here: http://bit.ly/awesome-ai The process of adapting a notebook is 1) open in colab from github 2) Add extra libraries 3) Download the data from Drive. Let’s do a quick recap of all the ML/DL/RL methods to identify which method fit better to your problem.
  • 50.
  • 53. When to use it? ● Simple regression problems ○ How much the rent should cost in certain area ○ How much should I charge for specific amount of work ● Problems where we want to define a rule that separates two categories that are similar, i.e. Premium or Basic price for customers under certain parameters (number of rooms vs number of cars) https://colab.research.google.com/drive/1-dTb2vCiZHa-DnyqlVFGOnMSNjvkIOTP https://colab.research.google.com/drive/1Z20iJspQm2Y_wLI51wgE6nXGOSu1kG4W https://colab.research.google.com/drive/1-yk3m6p3ylNLtTaEf3nya6exO_wv8f_L
  • 54. XOR problem - Not linear
  • 56. Code from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier, export_graphviz import pydotplus iris = load_iris() clf = DecisionTreeClassifier().fit(iris.data, iris.target) dot_data = export_graphviz(clf, out_file=None, filled=True, rounded=True, feature_names=iris.feature_names, class_names=['Versicolor','Setosa','Virginica']) graph = pydotplus.graph_from_dot_data(dot_data)
  • 58. When to use it? ● When we need to know what decisions the machine is taking ● When we need to explain to others how the features are evaluated ● When there are no much features https://colab.research.google.com/drive/1Fc8qs1fwdcpoZ_-tTj32OBl-tCGlAe5c
  • 60. Code from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2) clf = RandomForestClassifier(n_estimators=100) classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test) print('accuracy is',accuracy_score(y_pred, y_test))
  • 61. When to use it? ● When we want to know alternatives of how to evaluate a problem. ● When we want to manually discard flows that are biased ● When we want to manage ensembles from one single method. https://colab.research.google.com/drive/1WMOOtaHAMZPi-enVM8RRM_CC-grEtm9P https://colab.research.google.com/drive/1jDdWp-CJybMJDX17jBmG5qoPPg9qj1sm https://colab.research.google.com/drive/1-uDIRl1aYqmJX59rAJumHY1T20QqBJiQ https://colab.research.google.com/drive/1-uDIRl1aYqmJX59rAJumHY1T20QqBJiQ
  • 63. When to use it? ● When we want to know the probabilities of the different cases. ● When we need a probabilistic model. ● When we need an easy way to prove in paper https://colab.research.google.com/drive/1qOCllKsBBrLeUnP-XAXHefXCtbuBWl69 https://colab.research.google.com/drive/11FiWH00vzygQp1T_pD0MCfMFg6FYsd01
  • 65. When to use it? ● When intuition says that the problem can be solved from getting thee most similar option. ● When the information is no exhaustive. ● When we want to justify the decision of the algorithm in a common human reasoning. https://colab.research.google.com/drive/1GeUVjDW74SxFxz2Nh3rqOlte-S2dblYv https://colab.research.google.com/drive/1X12qds10ZfN7QCrmpRR2OXxa--PTyS5e
  • 67. When to use it? ● When we don’t know how to understand the data ● When we want to optimize resources by grouping related elements. ● When we want that the computer creates the labels for us. https://colab.research.google.com/drive/1RL3oZm6LgnEChI1aOQZoMn1WDk-DQJiV https://colab.research.google.com/drive/1yvy1scktjcDyydG2fZz2OJfRFAer0SEO https://colab.research.google.com/drive/1CzEf6giBXPSQI5UJOhZrZfYKAJcH68wg
  • 69. When to use it? ● It was the most effective technique before Neural Networks, it can achieve excellent results with less processing. ● Mathematically speaking, it is based in very strong math principles, it creates complex multidimensional hyperplanes that separates the classes precisely. ● It is not a white box technique, but may be the best option for problems where we want to get the best of Machine Learning approach without dealing with Neural Networks. https://colab.research.google.com/drive/13PRk-GKeSivp4R-FIdjmYBQS7xWUco9C
  • 71. When to use it? ● When we want to optimize a regression ● When we want to binarize the output ● As a preliminary analysis before implementing neural networks https://colab.research.google.com/drive/1PWmvsZRaj3JQ8rtj6vlwhJhJpOrIAamT https://colab.research.google.com/drive/1p8rcrSQB-thLSakUmCHjSbqI6vd-NkCq https://colab.research.google.com/drive/1jhrAtmPgg6Uu0WzMzV-VakWlncQAvk-D
  • 73. When to use it? ● When we have very few features and there is no extra details that can be extracted from hidden layers. ● There are in fact neural networks, and we do not need alway to use them for deep learning these can be used for machine learning when we benchmark with other machine learning techniques. ● When we want to get the power of neural networks and we don’t have much computational power. https://colab.research.google.com/drive/10PvUh-8ZsVqQADqXSmRIDHGiCH9iypyO
  • 74. ML & DL frameworks
  • 75. It’s time for Deep Learning
  • 77. Artificial Neural Networks = Multi-Layer Perceptron
  • 79. When to use it? ● Classifiers when common machine Learning Algorithms performs poorly. ● Models with much features. ● Multiple classes projects. https://colab.research.google.com/drive/1GAYf5yMNBkVrag0z2Q4MPSwuqfRN1Wz https://colab.research.google.com/drive/12YBDQFYXN8VruxKTfzDpbPsYFAEQceQP https://colab.research.google.com/drive/1pyRqGmMG4-Mj8Wis5XrQ_a4dUJvYln1 https://colab.research.google.com/drive/1wHjugM56k0ay5QCmRVMBfAMF96EY7A5k https://colab.research.google.com/drive/1Ly0BtKBphUdeqMQBO8Xjweku62Vq3UAX
  • 81. MNIST
  • 82.
  • 84. When to use it? ● When we want to process images ● When we want to process videos ● When we have highly dimensional data https://colab.research.google.com/drive/1jN8oswBOds4XuRbnQMxxDXDssmDD_rD9 https://colab.research.google.com/drive/1iEYJs75hat_URxshmCBMGzHQo5VgdRvN https://colab.research.google.com/drive/1YHKZgpJuriGYjEzFDNGz2Hf0widu-exx https://colab.research.google.com/drive/1gi2_Or0rDz5Gg9FkGJjFDxgeiwt5-lXm https://colab.research.google.com/drive/1QcnY-LOZU9c7Sp2DsDVeYxLNBx87VNhn
  • 87. When to use it? ● When sequences are provided ○ Text sequences ○ Image sequences (videos) ○ Time series ● When we need to provide an ordered output https://colab.research.google.com/drive/1twc5dBjgFLFuv8p-gPfnrscTPcBlkx5q https://colab.research.google.com/drive/10-ou-Za75bFgwArvgP3QfNJ4cWuwY-eF https://colab.research.google.com/drive/1PEOqq8mBcmc-FMj8lpbVF93cQI4RLgVJ https://colab.research.google.com/drive/1XUEAFxxKVmdgC7oPOzVpGInXfUeTcgIQ https://colab.research.google.com/drive/1tfDDriSDUh_J9OHwjt-NzT8xRiEDQF7x
  • 91. When to use it? ● When we want to benchmark models ● When different models are stronger when these are evaluated together ● When the individual processing is not exhaustive https://colab.research.google.com/drive/1Kg_nHBmUGQ1zepU-wZlDwMyM-YrlMTUX https://colab.research.google.com/drive/1U86EVD-6ulYMxTzDX8-m6nEptYq0yaej
  • 93. Frameworks ● TPOT ● MLBox ● H2O ● Google AutoML
  • 94. When to use it? ● On every new model ● When we have enough time to train multiple models ● When we don’t know wich hyperparameters are better. https://colab.research.google.com/drive/1gTBDfbJy9SsgbUPRhL_mrujw6HC2BjxN https://colab.research.google.com/drive/17Ii6Nw89gZT8l_XrvSQhNWaa_VfcdLBn https://colab.research.google.com/drive/1xe4G_dqsPMq0n3w_Mqlm-39j5TMUqHJR
  • 96. Frameworks ● OpenAI Gym ● Google Dopamine ● RLLib ● Keras-RL ● Tensorforce ● Facebook Horizon
  • 97. When to use it? ● When a robot explores a place and needs to learn from the environment. ● When we can try as much as we can in a simulator. ● When we want to find the most optimal path https://colab.research.google.com/drive/1fgv5UWhHR7xSwZfwwltF4OFDYqtWdlQD https://colab.research.google.com/drive/14aYmND2LKtaPTW3JWS7scKGwU9baxHeE https://colab.research.google.com/drive/16Scl43smvcXGZFEGITs15_SN_7-EidZd
  • 98. Techniques to improve the learning process
  • 99. Principal Component Analysis (PCA) Feature selection
  • 100. When to use it? ● When we have too much features and we do not know which of them are useful. ● When we want to reduce the dimensionality of our model. ● When we want to plot our decision boundaries. https://colab.research.google.com/drive/1CO6BACds6J8hGPYlEU2INnSTpT0EmS74 https://colab.research.google.com/drive/1VU2SO3IfklPkK1EPMnwiO7trJslt79OZ
  • 102. When to use it? ● When we have limited data ● When we want to help our model to generalize more ● When our unseen data comes in very different formats. https://colab.research.google.com/drive/1ANIc7tXrggPT2I9JzpBlZQ3BBhCpbJUJ https://colab.research.google.com/drive/1cQRVdiDc9xraHZYLu3VrXxX4FKXoaS8U https://colab.research.google.com/drive/1O5far2FC4GlAc9pkLPZqsjKreCpI4S_-
  • 104. Discriminative: Predicts from Data Generative: Generates from data distribution
  • 105. Generative models ● Autoencoders ● Adversarial Networks ● Sequence Models ● Transformers
  • 108. When to use it? ● When we want to compress data. ● When we need to change one type of input to other type of output. ● When we don’t need much variability in the generated data. https://colab.research.google.com/drive/1QxXqnhyqIZrrGtor2tVa4jY63adS4yc0
  • 110. When to use it? ● When we need to transfer a style ● When we need more variability in the generated output ● When we need to keep context in the generation. https://colab.research.google.com/drive/1YOYH78YQAgPBRIpUPhh_e0cFLNu-BPVo https://colab.research.google.com/drive/1POZpWN-2M5hy3D2ATWzJs2LC5sk7hpts https://colab.research.google.com/drive/1aKywiJ5p0eCwDIIWKe8Q205rcKqmR_VX https://colab.research.google.com/drive/1QxXqnhyqIZrrGtor2tVa4jY63adS4yc0 https://colab.research.google.com/drive/1Lw7BqKABvtiSyUHg9DeM5f90_WFGB7uz
  • 112. When to use it? ● When we generate text ● When we generate the next sequence from a serie ● When the order in the generated output matters. https://colab.research.google.com/drive/1ZB-oueLvBgltXshb1lDV2EpqbqV6FC5x
  • 113.
  • 114. When to use it? ● When context is an essential part of the generated output ● When we need to keep consistency in the frequency space. ● When we have enough computational resources. https://colab.research.google.com/drive/1jWaRkii6xLkxxAPyfudeGJsHf_jokqXG
  • 115. Put notebooks into production It seems that running code from a notebook in the cloud is just for testing purposes, but actually you can run it as a service by running from a Docker container locally. I created a script that automatically prepares a container and execute it every time you need as a command line application. Example: docker run psykohack/google-colab https://colab.research.google.com/drive/133DIr7lvkuaNU_X2JN5id3XmtSXQspy9 Code: https://github.com/toxtli/google-colab-docker
  • 116. Resources More and more AI research is being distributed nowadays in redistributable format. Some valuable resources can be found in: https://www.paperswithcode.com/ https://www.kaggle.com/
  • 117. Conclusions ● Nowadays we can reproduce state-of-the-art AI algorithms from a web based platform. ● Complex tasks can be executed in notebooks structured as frameworks ● Our main job is to prepare the data to feed the algorithm that fits the most to our needs. ● AI prototyping is drastically accelerated by using this technologies. ● Since these technologies are between pure-code and pure-tool approaches, that gives the flexibility to iterate faster.

Editor's Notes

  1. This is how neural networks process the images to predict an output