SlideShare ist ein Scribd-Unternehmen logo
1 von 164
Downloaden Sie, um offline zu lesen
Quick Tour of Machine Learning
(機器學習速遊)  
Hsuan-Tien Lin (林軒田)
Chief Data Scientist @ Appier (沛星互動科技)
資料科學愛好者年會系列活動, 2017/03/11
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 0/128
Learning from Data
Disclaimer
• just super-condensed and shuffled version of
• my co-authored textbook “Learning from Data: A Short Course”
• my two NTU-Coursera Mandarin-teaching ML Massive Open
Online Courses
• Machine Learning Foundations
• Machine Learning Techniques
http://www.csie.ntu.edu.tw/~htlin/mooc/
—impossible to be complete, with most math details removed
• decorated with my live experience at Appier, a rapidly-growing AI
company
• audience interaction is important
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 1/128
Learning from Data
Roadmap
Learning from Data
What is Machine Learning
Components of Machine Learning
Types of Machine Learning
Step-by-step Machine Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 2/128
Learning from Data What is Machine Learning
Learning from Data ::
What is Machine Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 3/128
Learning from Data What is Machine Learning
From Learning to Machine Learning
learning: acquiring skill
with experience accumulated from observations
observations learning skill
machine learning: acquiring skill
with experience accumulated/computed from data
data ML skill
What is skill?
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 4/128
Learning from Data What is Machine Learning
A More Concrete Definition
skill
⇔ improve some performance measure (e.g. prediction accuracy)
machine learning: improving some performance measure
with experience computed from data
data ML
improved
performance
measure
An Application in Computational Finance
stock data ML more investment gain
Why use machine learning?
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 5/128
Learning from Data What is Machine Learning
Yet Another Application: Tree Recognition
• ‘define’ trees and hand-program: difficult
• learn from data (observations) and
recognize: a 3-year-old can do so
• ‘ML-based tree recognition system’ can be
easier to build than hand-programmed
system
ML: an alternative route to
build complicated systems
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 6/128
Learning from Data What is Machine Learning
The Machine Learning Route
ML: an alternative route to build complicated systems
Some Use Scenarios
• when human cannot program the system manually
—navigating on Mars
• when human cannot ‘define the solution’ easily
—speech/visual recognition
• when needing large-scale decisions that humans cannot do
—cross-device unification Appier
• when needing rapid decisions that humans cannot do
—real-time bidding advertisement Appier
Give a computer a fish, you feed it for a day;
teach it how to fish, you feed it for a lifetime. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 7/128
Learning from Data What is Machine Learning
Machine Learning and Artificial Intelligence
Machine Learning
use data to compute something
that improves performance
Artificial Intelligence
compute something
that shows intelligent behavior
• improving performance is something that shows intelligent
behavior
—ML can realize AI, among other routes
• e.g. chess playing
• traditional AI: game tree
• ML for AI: ‘learning from board data’
ML is one possible
and arguably most popular route to realize AI
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 8/128
Learning from Data Components of Machine Learning
Learning from Data ::
Components of Machine Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 9/128
Learning from Data Components of Machine Learning
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 0.5 year
current debt 200,000
what to learn? (for improving performance):
‘approve credit card good for bank?’
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 10/128
Learning from Data Components of Machine Learning
Formalize the Learning Problem
Basic Notations
• input: x ∈ X (customer application)
• output: y ∈ Y (good/bad after approving credit card)
• unknown underlying pattern to be learned ⇔ target function:
f : X → Y (ideal credit approval formula)
• data ⇔ training examples: D = {(x1, y1), (x2, y2), · · · , (xN, yN)}
(historical records in bank)
• hypothesis ⇔ skill with hopefully good performance:
g : X → Y (‘learned’ formula to be used), i.e. approve if
• h1: annual salary > NTD 800,000
• h2: debt > NTD 100,000 (really?)
• h3: year in job ≤ 2 (really?)
—all candidate formula being considered: hypothesis set H
—procedure to learn best formula: algorithm A
{(xn, yn)} from f ML (A, H) g
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 11/128
Learning from Data Components of Machine Learning
Practical Definition of Machine Learning
unknown target function
f : X → Y
(ideal credit approval formula)
training examples
D: (x1, y1), · · · , (xN , yN )
(historical records in bank)
learning
algorithm
A
final hypothesis
g ≈ f
(‘learned’ formula to be used)
hypothesis set
H
(set of candidate formula)
machine learning (A and H):
use data to compute hypothesis g
that approximates target f
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 12/128
Learning from Data Components of Machine Learning
Key Essence of Machine Learning
machine learning:
use data to compute hypothesis g that approximates target f
data ML
improved
performance
measure
1 exists some ‘underlying pattern’ to be learned
—so ‘performance measure’ can be improved
2 but no programmable (easy) definition
—so ‘ML’ is needed
3 somehow there is data about the pattern
—so ML has some ‘inputs’ to learn from
key essence: help decide whether to use ML
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 13/128
Learning from Data Types of Machine Learning
Learning from Data ::
Types of Machine Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 14/128
Learning from Data Types of Machine Learning
Visualizing Credit Card Problem
• customer features x: points on the plane (or points in Rd )
• labels y: ◦ (+1), × (-1)
called binary classification
• hypothesis h: lines here, but possibly other curves
• different curve classifies customers differently
binary classification algorithm:
find good decision boundary curve g
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 15/128
Learning from Data Types of Machine Learning
More Binary Classification Problems
• credit approve/disapprove
• email spam/non-spam
• patient sick/not sick
• ad profitable/not profitable
core and important problem with
many tools as building block of other tools
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 16/128
Learning from Data Types of Machine Learning
Binary Classification for Education
data ML skill
• data: students’ records on quizzes on a Math tutoring system
• skill: predict whether a student can give a correct answer to
another quiz question
A Possible ML Solution
answer correctly ≈ recent strength of student > difficulty of question
• give ML 9 million records from 3000 students
• ML determines (reverse-engineers) strength and difficulty
automatically
key part of the world-champion system from
National Taiwan Univ. in KDDCup 2010
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 17/128
Learning from Data Types of Machine Learning
Multiclass Classification: Coin Recognition Problem
25
5
1
Mass
Size
10
• classify US coins (1c, 5c, 10c, 25c)
by (size, mass)
• Y = {1c, 5c, 10c, 25c}, or
Y = {1, 2, · · · , K} (abstractly)
• binary classification: special case
with K = 2
Other Multiclass Classification Problems
• written digits ⇒ 0, 1, · · · , 9
• pictures ⇒ apple, orange, strawberry
• emails ⇒ spam, primary, social, promotion, update (Google)
many applications in practice,
especially for ‘recognition’
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 18/128
Learning from Data Types of Machine Learning
Regression: Patient Recovery Prediction Problem
• binary classification: patient features ⇒ sick or not
• multiclass classification: patient features ⇒ which type of cancer
• regression: patient features ⇒ how many days before recovery
• Y = R or Y = [lower, upper] ⊂ R (bounded regression)
—deeply studied in statistics
Other Regression Problems
• company data ⇒ stock price
• climate data ⇒ temperature
also core and important with many ‘statistical’
tools as building block of other tools
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 19/128
Learning from Data Types of Machine Learning
Regression for Recommender System (1/2)
data ML skill
• data: how many users have rated some movies
• skill: predict how a user would rate an unrated movie
A Hot Problem
• competition held by Netflix in 2006
• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
• similar competition (movies → songs) held by Yahoo! in KDDCup
2011
• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machines learn our preferences?
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 20/128
Learning from Data Types of Machine Learning
Regression for Recommender System (2/2)
Match movie and
viewer factors
predicted
rating
comedy
content
action
content
blockbuster?
Tom
Cruisein
it?
likes Tom
Cruise?
prefers blockbusters?
likes action?
likes comedy?
movie
viewer
add contributions
from each factor
A Possible ML Solution
• pattern:
rating ← viewer/movie factors
• learning:
known rating
→ learned factors
→ unknown rating prediction
key part of the world-champion (again!)
system from National Taiwan Univ.
in KDDCup 2011
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 21/128
Learning from Data Types of Machine Learning
Supervised versus Unsupervised
coin recognition with yn
25
5
1
Mass
Size
10
supervised multiclass classification
coin recognition without yn
Mass
Size
unsupervised multiclass classification
⇐⇒ ‘clustering’
Other Clustering Problems
• articles ⇒ topics
• consumer profiles ⇒ consumer groups
clustering: a challenging but useful problem
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 22/128
Learning from Data Types of Machine Learning
Supervised versus Unsupervised
coin recognition with yn
25
5
1
Mass
Size
10
supervised multiclass classification
coin recognition without yn
Mass
Size
unsupervised multiclass classification
⇐⇒ ‘clustering’
Other Clustering Problems
• articles ⇒ topics
• consumer profiles ⇒ consumer groups
clustering: a challenging but useful problem
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 22/128
Learning from Data Types of Machine Learning
Semi-supervised: Coin Recognition with Some yn
25
5
1
Mass
Size
10
supervised
25
5
1
Mass
Size
10
semi-supervised
Mass
Size
unsupervised (clustering)
Other Semi-supervised Learning Problems
• face images with a few labeled ⇒ face identifier (Facebook)
• medicine data with a few labeled ⇒ medicine effect predictor
semi-supervised learning: leverage
unlabeled data to avoid ‘expensive’ labeling
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 23/128
Learning from Data Types of Machine Learning
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog pees on the ground.
BAD DOG. THAT’S A VERY WRONG ACTION.
• cannot easily show the dog that yn = sit
when xn = ‘sit down’
• but can ‘punish’ to say ˜yn = pee is wrong
Other Reinforcement Learning Problems Using (x, ˜y, goodness)
• (customer, ad choice, ad click earning) ⇒ ad system
• (cards, strategy, winning amount) ⇒ black jack agent
reinforcement: learn with ‘partial/implicit
information’ (often sequentially)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 24/128
Learning from Data Types of Machine Learning
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog sits down.
Good Dog. Let me give you some cookies.
• still cannot show yn = sit
when xn = ‘sit down’
• but can ‘reward’ to say ˜yn = sit is good
Other Reinforcement Learning Problems Using (x, ˜y, goodness)
• (customer, ad choice, ad click earning) ⇒ ad system
• (cards, strategy, winning amount) ⇒ black jack agent
reinforcement: learn with ‘partial/implicit
information’ (often sequentially)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 24/128
Learning from Data Step-by-step Machine Learning
Learning from Data ::
Step-by-step Machine Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 25/128
Learning from Data Step-by-step Machine Learning
Step-by-step Machine Learning
unknown target function
f : X → Y
(ideal credit approval formula)
training examples
D: (x1, y1), · · · , (xN , yN )
(historical records in bank)
learning
algorithm
A
final hypothesis
g ≈ f
(‘learned’ formula to be used)
hypothesis set
H
(set of candidate formula)
1 choose error measure: how g(x) ≈ f(x)
2 decide hypothesis set H
3 optimize error and more on D as A
4 pray for generalization:
whether g(x) ≈ f(x) for unseen x
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 26/128
Learning from Data Step-by-step Machine Learning
Choose Error Measure
g ≈ f can often evaluate by
averaged err (g(x), f(x)), called pointwise error measure
in-sample (within data)
Ein(g) =
1
N
N
n=1
err(g(xn), f(xn)
yn
)
out-of-sample (future data)
Eout(g) = E
future x
err(g(x), f(x))
will start from 0/1 error err(˜y, y) = ˜y = y
for classification
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 27/128
Learning from Data Step-by-step Machine Learning
Choose Hypothesis Set (for Credit Approval)
age 23 years
annual salary NTD 1,000,000
year in job 0.5 year
current debt 200,000
• For x = (x1, x2, · · · , xd ) ‘features of customer’, compute a
weighted ‘score’ and
approve credit if
d
i=1
wixi > threshold
deny credit if
d
i=1
wixi < threshold
• Y: +1(good), −1(bad) , 0 ignored—linear formula h ∈ H are
h(x) = sign
d
i=1
wixi − threshold
linear (binary) classifier,
called ‘perceptron’ historically
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 28/128
Learning from Data Step-by-step Machine Learning
Optimize Error (and More) on Data
H = all possible perceptrons, g =?
• want: g ≈ f (hard when f unknown)
• almost necessary: g ≈ f on D, ideally
g(xn) = f(xn) = yn
• difficult: H is of infinite size
• idea: start from some g0, and ‘correct’ its
mistakes on D
let’s visualize without math
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 29/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
initially
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x
1
w(t+1)
update: 1
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x9
w(t)
w(t+1)
update: 2
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x
14
w(t)
w(t+1)
update: 3
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x3
w(t)
w(t+1)
update: 4
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x9
w(t)
w(t+1)
update: 5
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x
14
w(t)
w(t+1)
update: 6
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x9
w(t)
w(t+1)
update: 7
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x
14
w(t)
w(t+1)
update: 8
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
x9
w(t)
w(t+1)
update: 9
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Seeing is Believing
wPLA
finally
worked like a charm with < 20 lines!!
—A fault confessed is half redressed. :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
Learning from Data Step-by-step Machine Learning
Pray for Generalization
(pictures from Google Image Search)
Parent
?
(picture, label) pairs
?
Kid’s good
hypothesisbrain
'
&
$
%
-
6
alternatives
Target f(x) + noise
?
examples (picture xn, label yn)
?
learning good
hypothesis
g(x) ≈ f(x)
algorithm
'
&
$
%
-
6
hypothesis set H
challenge:
see only {(xn, yn)} without knowing f nor noise
?
=⇒ generalize to unseen (x, y) w.r.t. f(x)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 31/128
Learning from Data Step-by-step Machine Learning
Generalization Is Non-trivial
Bob impresses Alice by memorizing every given (movie, rank);
but too nervous about a new movie and guesses randomly
(pictures from Google Image Search)
memorize = generalize
perfect from Bob’s view = good for Alice
perfect during training = good when testing
take-home message: if H is simple (like lines),
generalization is usually possible
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 32/128
Learning from Data Step-by-step Machine Learning
Mini-Summary
Learning from Data
What is Machine Learning
use data to approximate target
Components of Machine Learning
algorithm A takes data D and hypotheses H to get hypothesis g
Types of Machine Learning
variety of problems almost everywhere
Step-by-step Machine Learning
error, hypotheses, optimize, generalize
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 33/128
Fundamental Machine Learning Models
Roadmap
Fundamental Machine Learning Models
Linear Regression
Logistic Regression
Nonlinear Transform
Decision Tree
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 34/128
Fundamental Machine Learning Models Linear Regression
Fundamental Machine Learning Models ::
Linear Regression
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 35/128
Fundamental Machine Learning Models Linear Regression
Credit Limit Problem
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 0.5 year
current debt 200,000
credit limit? 100,000
unknown target function
f : X → Y
(ideal credit limit formula)
training examples
D: (x1, y1), · · · , (xN , yN )
(historical records in bank)
learning
algorithm
A
final hypothesis
g ≈ f
(‘learned’ formula to be used)
hypothesis set
H
(set of candidate formula)
Y = R: regression
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 36/128
Fundamental Machine Learning Models Linear Regression
Linear Regression Hypothesis
age 23 years
annual salary NTD 1,000,000
year in job 0.5 year
current debt 200,000
• For x = (x0, x1, x2, · · · , xd ) ‘features of customer’,
approximate the desired credit limit with a weighted sum:
y ≈
d
i=0
wixi
• linear regression hypothesis: h(x) = wT x
h(x): like perceptron, but without the sign
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 37/128
Fundamental Machine Learning Models Linear Regression
Illustration of Linear Regression
x = (x) ∈ R
x
y
x = (x1, x2) ∈ R2
x1
x2
y
x1
x2
y
linear regression:
find lines/hyperplanes with small residuals
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 38/128
Fundamental Machine Learning Models Linear Regression
The Error Measure
popular/historical error measure:
squared error err(ˆy, y) = (ˆy − y)2
in-sample
Ein(hw) =
1
N
N
n=1
(h(xn)
wT xn
− yn)2
out-of-sample
Eout(w) = E
(x,y)∼P
(wT
x − y)2
next: how to minimize Ein(w)?
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 39/128
Fundamental Machine Learning Models Linear Regression
Minimize Ein
min
w
Ein(w) =
1
N
N
n=1
(wT
xn − yn)2
w
Ein
• Ein(w): continuous, differentiable, convex
• necessary condition of ‘best’ w
Ein(w) ≡





∂Ein
∂w0
(w)
∂Ein
∂w1
(w)
. . .
∂Ein
∂wd
(w)





=





0
0
. . .
0





—not possible to ‘roll down’
task: find wLIN such that Ein(wLIN) = 0
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 40/128
Fundamental Machine Learning Models Linear Regression
Linear Regression Algorithm
1 from D, construct input matrix X and output vector y by
X =




− − xT
1 − −
− − xT
2 − −
· · ·
− − xT
N − −




N×(d+1)
y =




y1
y2
· · ·
yN




N×1
2 calculate pseudo-inverse X†
(d+1)×N
3 return wLIN
(d+1)×1
= X†y
simple and efficient
with good † routine
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 41/128
Fundamental Machine Learning Models Linear Regression
Is Linear Regression a ‘Learning Algorithm’?
wLIN = X†
y
No!
• analytic (closed-form)
solution, ‘instantaneous’
• not improving Ein nor
Eout iteratively
Yes!
• good Ein?
yes, optimal!
• good Eout?
yes, ‘simple’ like perceptrons
• improving iteratively?
somewhat, within an iterative
pseudo-inverse routine
if Eout(wLIN) is good, learning ‘happened’!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 42/128
Fundamental Machine Learning Models Logistic Regression
Fundamental Machine Learning Models ::
Logistic Regression
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 43/128
Fundamental Machine Learning Models Logistic Regression
Heart Attack Prediction Problem (1/2)
age 40 years
gender male
blood pressure 130/85
cholesterol level 240
weight 70
heart disease? yes
unknown target
distribution P(y|x)
containing f(x) + noise
training examples
D: (x1, y1), · · · , (xN , yN )
learning
algorithm
A
final hypothesis
g ≈ f
hypothesis set
H
error measure
err
err
binary classification:
ideal f(x) = sign P(+1|x) − 1
2 ∈ {−1, +1}
because of classification err
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 44/128
Fundamental Machine Learning Models Logistic Regression
Heart Attack Prediction Problem (2/2)
age 40 years
gender male
blood pressure 130/85
cholesterol level 240
weight 70
heart attack? 80% risk
unknown target
distribution P(y|x)
containing f(x) + noise
training examples
D: (x1, y1), · · · , (xN , yN )
learning
algorithm
A
final hypothesis
g ≈ f
hypothesis set
H
error measure
err
err
‘soft’ binary classification:
f(x) = P(+1|x) ∈ [0, 1]
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 45/128
Fundamental Machine Learning Models Logistic Regression
Soft Binary Classification
target function f(x) = P(+1|x) ∈ [0, 1]
ideal (noiseless) data
x1, y1 = 0.9 = P(+1|x1)
x2, y2 = 0.2 = P(+1|x2)
...
xN, yN = 0.6 = P(+1|xN)
actual (noisy) data
x1, y1 = ◦ ∼ P(y|x1)
x2, y2 = × ∼ P(y|x2)
...
xN, yN = × ∼ P(y|xN)
same data as hard binary classification,
different target function
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 46/128
Fundamental Machine Learning Models Logistic Regression
Soft Binary Classification
target function f(x) = P(+1|x) ∈ [0, 1]
ideal (noiseless) data
x1, y1 = 0.9 = P(+1|x1)
x2, y2 = 0.2 = P(+1|x2)
...
xN, yN = 0.6 = P(+1|xN)
actual (noisy) data
x1, y1 = 1 = ◦
?
∼ P(y|x1)
x2, y2 = 0 = ◦
?
∼ P(y|x2)
...
xN, yN = 0 = ◦
?
∼ P(y|xN)
same data as hard binary classification,
different target function
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 46/128
Fundamental Machine Learning Models Logistic Regression
Logistic Hypothesis
age 40 years
gender male
blood pressure 130/85
cholesterol level 240
• For x = (x0, x1, x2, · · · , xd ) ‘features of
patient’, calculate a weighted ‘risk score’:
s =
d
i=0
wixi
• convert the score to estimated probability
by logistic function θ(s)
θ(s)
1
0 s
logistic hypothesis:
h(x) = θ(wT x) = 1
1+exp(−wT x)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 47/128
Fundamental Machine Learning Models Logistic Regression
Minimizing Ein(w)
a popular error: Ein(w) = 1
N
N
n=1 ln 1 + exp(−ynwT xn) called cross-
entropy derived from maximum likelihood
w
Ein
• Ein(w): continuous, differentiable,
twice-differentiable, convex
• how to minimize? locate valley
want Ein(w) = 0
most basic algorithm:
gradient descent (roll downhill)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 48/128
Fundamental Machine Learning Models Logistic Regression
Gradient Descent
For t = 0, 1, . . .
wt+1 ← wt + ηv
when stop, return last w as g
• PLA: v comes from mistake correction
• smooth Ein(w) for logistic regression:
choose v to get the ball roll ‘downhill’?
• direction v:
(assumed) of unit length
• step size η:
(assumed) positive Weights, w
In-sampleError,Ein
gradient descent: v ∝ − Ein(wt )
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 49/128
Fundamental Machine Learning Models Logistic Regression
Putting Everything Together
Logistic Regression Algorithm
initialize w0
For t = 0, 1, · · ·
1 compute
Ein(wt ) =
1
N
N
n=1
θ −ynwT
t xn −ynxn
2 update by
wt+1 ← wt − η Ein(wt )
...until Ein(wt+1) ≈ 0 or enough iterations
return last wt+1 as g
can use more sophisticated tools to speed up
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 50/128
Fundamental Machine Learning Models Logistic Regression
Linear Models Summarized
linear scoring function: s = wT
x
linear classification
h(x) = sign(s)
s
x
x
x
x0
1
2
d
h x( )
plausible err = 0/1
discrete Ein(w):
solvable in special case
linear regression
h(x) = s
s
x
x
x
x0
1
2
d
h x( )
friendly err = squared
quadratic convex Ein(w):
closed-form solution
logistic regression
h(x) = θ(s)
s
x
x
x
x0
1
2
d
h x( )
plausible err = cross-entropy
smooth convex Ein(w):
gradient descent
my ‘secret’: linear first!!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 51/128
Fundamental Machine Learning Models Nonlinear Transform
Fundamental Machine Learning Models ::
Nonlinear Transform
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 52/128
Fundamental Machine Learning Models Nonlinear Transform
Linear Hypotheses
up to now: linear hypotheses
• visually: ‘line’-like
boundary
• mathematically: linear
scores s = wT x
but limited . . .
−1 0 1
−1
0
1
• theoretically: complexity
under control :-)
• practically: on some D,
large Ein for every line :-(
how to break the limit of linear hypotheses
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 53/128
Fundamental Machine Learning Models Nonlinear Transform
Circular Separable
−1 0 1
−1
0
1
−1 0 1
−1
0
1
• D not linear separable
• but circular separable by a circle of
radius
√
0.6 centered at origin:
hSEP(x) = sign −x2
1 − x2
2 + 0.6
re-derive Circular-PLA, Circular-Regression,
blahblah . . . all over again? :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 54/128
Fundamental Machine Learning Models Nonlinear Transform
Circular Separable and Linear Separable
h(x) = sign


 0.6
˜w0
· 1
z0
+(−1)
˜w1
· x2
1
z1
+(−1)
˜w2
· x2
2
z2



= sign ˜wT
z
x1
x2
−1 0 1
−1
0
1
• {(xn, yn)} circular separable
=⇒ {(zn, yn)} linear separable
• x ∈ X
Φ
−→ z ∈ Z:
(nonlinear) feature
transform Φ z1
z2
0 0.5 1
0
0.5
1
circular separable in X =⇒ linear separable in Z
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 55/128
Fundamental Machine Learning Models Nonlinear Transform
General Quadratic Hypothesis Set
a ‘bigger’ Z-space with Φ2(x) = (1, x1, x2, x2
1 , x1x2, x2
2 )
perceptrons in Z-space ⇐⇒ quadratic hypotheses in X-space
HΦ2
= h(x): h(x) = ˜h(Φ2(x)) for some linear ˜h on Z
• can implement all possible quadratic curve boundaries:
circle, ellipse, rotated ellipse, hyperbola, parabola, . . .
ellipse 2(x1 + x2 − 3)2 + (x1 − x2 − 4)2 = 1
⇐= ˜wT = [33, −20, −4, 3, 2, 3]
include lines and constants as degenerate
cases
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 56/128
Fundamental Machine Learning Models Nonlinear Transform
Good Quadratic Hypothesis
Z-space X-space
perceptrons ⇐⇒ quadratic hypotheses
good perceptron ⇐⇒ good quadratic hypothesis
separating perceptron ⇐⇒ separating quadratic hypothesis
z1
z2
0 0.5 1
0
0.5
1
⇐⇒
x1
x2
−1 0 1
−1
0
1
• want: get good perceptron in Z-space
• known: get good perceptron in X-space with data {(xn, yn)}
solution: get good perceptron in Z-space with data
{(zn = Φ2(xn), yn)}
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 57/128
Fundamental Machine Learning Models Nonlinear Transform
The Nonlinear Transform Steps
−1 0 1
−1
0
1
Φ
−→
0 0.5 1
0
0.5
1
↓ A
−1 0 1
−1
0
1
Φ−1
←−
Φ
−→
0 0.5 1
0
0.5
1
1 transform original data {(xn, yn)} to {(zn = Φ(xn), yn)} by Φ
2 get a good perceptron ˜w using {(zn, yn)}
and your favorite linear algorithm A
3 return g(x) = sign ˜wT Φ(x)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 58/128
Fundamental Machine Learning Models Nonlinear Transform
Nonlinear Model via Nonlinear Φ + Linear Models
−1 0 1
−1
0
1
Φ
−→
0 0.5 1
0
0.5
1
↓ A
−1 0 1
−1
0
1
Φ−1
←−
Φ
−→
0 0.5 1
0
0.5
1
two choices:
• feature transform
Φ
• linear model A,
not just binary
classification
Pandora’s box :-):
can now freely do quadratic PLA, quadratic regression,
cubic regression, . . ., polynomial regression
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 59/128
Fundamental Machine Learning Models Nonlinear Transform
Feature Transform Φ
Φ
−→
Average Intensity
Symmetry
not 1
1
↓ A
Φ−1
←−
Φ
−→
Average Intensity
Symmetry
more generally, not just polynomial:
raw (pixels)
domain knowledge
−→ concrete (intensity, symmetry)
the force, too good to be true? :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 60/128
Fundamental Machine Learning Models Nonlinear Transform
Computation/Storage Price
Q-th order polynomial transform: ΦQ(x) = 1,
x1, x2, . . . , xd ,
x2
1 , x1x2, . . . , x2
d ,
. . . ,
xQ
1 , xQ−1
1 x2, . . . , xQ
d
1
˜w0
+ ˜d
others
dimensions
= # ways of ≤ Q-combination from d kinds with repetitions
= Q+d
Q = Q+d
d = O Qd
= efforts needed for computing/storing z = ΦQ(x) and ˜w
Q large =⇒ difficult to compute/store
AND curve too complicated
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 61/128
Fundamental Machine Learning Models Nonlinear Transform
Generalization Issue
Φ1 (original x)
which one do you prefer? :-)
• Φ1 ‘visually’ preferred
• Φ4: Ein(g) = 0 but overkill
Φ4
how to pick Q?
model selection (to be discussed) important
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 62/128
Fundamental Machine Learning Models Decision Tree
Fundamental Machine Learning Models ::
Decision Tree
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 63/128
Fundamental Machine Learning Models Decision Tree
Decision Tree for Watching MOOC Lectures
G(x) =
T
t=1
qt (x) · gt (x)
• base hypothesis gt (x):
leaf at end of path t,
a constant here
• condition qt (x):
is x on path t?
• usually with simple
internal nodes
quitting
time?
has a
date?
N
true
Y
false
< 18:30
Y
between
deadline?
N
> 2 days
Y
between
N
< −2 days
> 21:30
decision tree: arguably one of the most
human-mimicking models
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 64/128
Fundamental Machine Learning Models Decision Tree
Recursive View of Decision Tree
Path View: G(x) = T
t=1 x on path t · leaft (x)
quitting
time?
has a
date?
N
true
Y
false
< 18:30
Y
between
deadline?
N
> 2 days
Y
between
N
< −2 days
> 21:30
Recursive View
G(x) =
C
c=1
b(x) = c · Gc(x)
• G(x): full-tree hypothesis
• b(x): branching criteria
• Gc(x): sub-tree hypothesis at
the c-th branch
tree = (root, sub-trees), just like what
your data structure instructor would say :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 65/128
Fundamental Machine Learning Models Decision Tree
A Basic Decision Tree Algorithm
G(x) =
C
c=1
b(x) = c Gc(x)
function DecisionTree data D = {(xn, yn)}N
n=1
if termination criteria met
return base hypothesis gt (x)
else
1 learn branching criteria b(x)
2 split D to C parts Dc = {(xn, yn): b(xn) = c}
3 build sub-tree Gc ← DecisionTree(Dc)
4 return G(x) =
C
c=1
b(x) = c Gc(x)
four choices: number of branches, branching
criteria, termination criteria, & base hypothesis
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 66/128
Fundamental Machine Learning Models Decision Tree
Classification and Regression Tree (C&RT)
function DecisionTree(data D = {(xn, yn)}N
n=1)
if termination criteria met
return base hypothesis gt (x)
else ...
2 split D to C parts Dc = {(xn, yn): b(xn) = c}
choices
• C = 2 (binary tree)
• gt (x) = Ein-optimal constant
• binary/multiclass classification (0/1 error): majority of {yn}
• regression (squared error): average of {yn}
• branching: threshold some selected dimension
• termination: fully-grown, or better pruned
disclaimer:
C&RT here is based on selected components
of CARTTM
of California Statistical Software
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 67/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
A Simple Data Set
C&RT
C&RT: ‘divide-and-conquer’
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
Fundamental Machine Learning Models Decision Tree
Practical Specialties of C&RT
• human-explainable
• multiclass easily
• categorical features easily
• missing features easily
• efficient non-linear training (and testing)
—almost no other learning model share all such specialties,
except for other decision trees
another popular decision tree algorithm:
C4.5, with different choices of heuristics
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 69/128
Fundamental Machine Learning Models Decision Tree
Mini-Summary
Fundamental Machine Learning Models
Linear Regression
analytic solution by pseudo inverse
Logistic Regression
minimize cross-entropy error with gradient descent
Nonlinear Transform
the secrete ‘force’ to enrich your model
Decision Tree
human-like explainable model learned recursively
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 70/128
Hazard of Overfitting
Roadmap
Hazard of Overfitting
Overfitting
Data Manipulation and Regularization
Validation
Principles of Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 71/128
Hazard of Overfitting Overfitting
Hazard of Overfitting ::
Overfitting
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 72/128
Hazard of Overfitting Overfitting
Theoretical Foundation of Statistical Learning
if training and testing from same distribution, with a high probability,
Eout(g)
test error
≤ Ein(g)
training error
+ 8
N ln 4(2N)dVC(H)
δ
Ω:price of using H
in-sample error
model complexity
out-of-sample error
VC dimension, dvc
Error
d∗
vc
• dVC(H): complexity of H,
≈ # of parameters to
describe H
• dVC ↑: Ein ↓ but Ω ↑
• dVC ↓: Ω ↓ but Ein ↑
• best d∗
VC in the middle
powerful H not always good!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 73/128
Hazard of Overfitting Overfitting
Bad Generalization
• regression for x ∈ R with N = 5
examples
• target f(x) = 2nd order polynomial
• label yn = f(xn) + very small noise
• linear regression in Z-space +
Φ = 4th order polynomial
• unique solution passing all examples
=⇒ Ein(g) = 0
• Eout(g) huge
x
y
Data
Target
Fit
bad generalization: low Ein, high Eout
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 74/128
Hazard of Overfitting Overfitting
Bad Generalization and Overfitting
• take dVC = 1126 for learning:
bad generalization
—(Eout - Ein) large
• switch from dVC = d∗
VC to dVC = 1126:
overfitting
—Ein ↓, Eout ↑
• switch from dVC = d∗
VC to dVC = 1:
underfitting
—Ein ↑, Eout ↑
in-sample error
model complexity
out-of-sample error
VC dimension, dvc
Error
d∗
vc
bad generalization: low Ein, high Eout;
overfitting: lower Ein, higher Eout
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 75/128
Hazard of Overfitting Overfitting
Cause of Overfitting: A Driving Analogy
x
y
‘good fit’ =⇒
x
y
Data
Target
Fit
overfit
learning driving
overfit commit a car accident
use excessive dVC ‘drive too fast’
noise bumpy road
limited data size N limited observations about road condition
let’s ‘visualize’ overfitting
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 76/128
Hazard of Overfitting Overfitting
Impact of Noise and Data Size
impact of σ2
versus N:
stochastic noise
Number of Data Points, N
NoiseLevel,σ2
80 100 120
-0.2
-0.1
0
0.1
0.2
0
1
2
reasons of serious overfitting:
data size N ↓ overfit ↑
stochastic noise ↑ overfit ↑
overfitting ‘easily’ happens
(more on ML Foundations, Lecture 13)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 77/128
Hazard of Overfitting Overfitting
Linear Model First
in-sample error
model complexity
out-of-sample error
VC dimension, dvc
Error
d∗
vc
• tempting sin: use H1126, low Ein(g1126) to fool your boss
—really? :-( a dangerous path of no return
• safe route: H1 first
• if Ein(g1) good enough, live happily thereafter :-)
• otherwise, move right of the curve
with nothing lost except ‘wasted’ computation
linear model first:
simple, efficient, safe, and workable!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 78/128
Hazard of Overfitting Overfitting
Driving Analogy Revisited
learning driving
overfit commit a car accident
use excessive dVC ‘drive too fast’
noise bumpy road
limited data size N limited observations about road condition
start from simple model drive slowly
data cleaning/pruning use more accurate road information
data hinting exploit more road information
regularization put the brakes
validation monitor the dashboard
all very practical techniques
to combat overfitting
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 79/128
Hazard of Overfitting Data Manipulation and Regularization
Hazard of Overfitting ::
Data Manipulation and Regularization
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 80/128
Hazard of Overfitting Data Manipulation and Regularization
Data Cleaning/Pruning
• if ‘detect’ the outlier 5 at the top by
• too close to other ◦, or too far from other ×
• wrong by current classifier
• . . .
• possible action 1: correct the label (data cleaning)
• possible action 2: remove the example (data pruning)
possibly helps, but effect varies
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 81/128
Hazard of Overfitting Data Manipulation and Regularization
Data Hinting
• slightly shifted/rotated digits carry the same meaning
• possible action: add virtual examples by shifting/rotating the
given digits (data hinting)
possibly helps, but watch out
not to steal the thunder
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 82/128
Hazard of Overfitting Data Manipulation and Regularization
Regularization: The Magic
x
y
‘regularized fit’ ⇐=
x
y
Data
Target
Fit
overfit
• idea: ‘step back’ from 10-th order polynomials to 2-nd order ones
H0 H1 H2 H3 · · ·
• name history: function approximation for ill-posed problems
how to step back?
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 83/128
Hazard of Overfitting Data Manipulation and Regularization
Step Back by Minimizing the Augmented Error
Augmented Error
Eaug(w) = Ein(w) + λ
N wT
w
VC Bound
Eout(w) ≤ Ein(w) + Ω(H)
• regularizer wT w : complexity of a single hypothesis
• generalization price Ω(H): complexity of a hypothesis set
• if λ
N Ω(w) ‘represents’ Ω(H) well,
Eaug is a better proxy of Eout than Ein
minimizing Eaug:
(heuristically) operating with the better proxy;
(technically) enjoying flexibility of whole H
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 84/128
Hazard of Overfitting Data Manipulation and Regularization
The Optimal λ
stochastic noise
Regularization Parameter, λ
ExpectedEout
σ2
= 0
σ2
= 0.25
σ2
= 0.5
0.5 1 1.5 2
0.25
0.5
0.75
1
• more noise ⇐⇒ more regularization needed
—more bumpy road ⇐⇒ putting brakes more
• noise unknown—important to make proper choices
how to choose?
validation!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 85/128
Hazard of Overfitting Validation
Hazard of Overfitting ::
Validation
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 86/128
Hazard of Overfitting Validation
Model Selection Problem
H1
which one do you prefer? :-)
H2
• given: M models H1, H2, . . . , HM, each with corresponding
algorithm A1, A2, . . . , AM
• goal: select Hm∗ such that gm∗ = Am∗ (D) is of low Eout(gm∗ )
• unknown Eout, as always :-)
• arguably the most important practical problem of ML
how to select? visually?
—no, can you really visualize R1126? :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 87/128
Hazard of Overfitting Validation
Validation Set Dval
Ein(h) Eval(h)
↑ ↑
D
size N
→ Dtrain
size N−K
∪ Dval
size K
↓ ↓
gm = Am(D) g−
m = Am(Dtrain)
• Dval ⊂ D: called validation set—‘on-hand’ simulation of test set
• to connect Eval with Eout:
select K examples from D at random
• to make sure Dval ‘clean’:
feed only Dtrain to Am for model selection
Eout(g−
m) ≤ Eval(g−
m) + ‘small
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 88/128
Hazard of Overfitting Validation
Model Selection by Best Eval
m∗
= argmin
1≤m≤M
(Em = Eval(Am(Dtrain)))
• generalization guarantee for all m:
Eout(g−
m) ≤ Eval(g−
m) + ‘small
• heuristic gain from N − K to N:
Eout


 gm∗
Am∗ (D)


 ≤ Eout


 g−
m∗
Am∗ (Dtrain)



H1 H2 HM
g1 g2 gM· · ·
· · ·
E1 · · · EM
Dval
Dtrain
gm∗
E2
(Hm∗ , Em∗ )
pick the best
D
Eout(gm∗ ) ≤ Eout(g−
m∗ ) ≤ Eval(g−
m∗ ) + ‘small
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 89/128
Hazard of Overfitting Validation
V-fold Cross Validation
making validation more stable
• V-fold cross-validation: random-partition of D to V equal parts,
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
train trainvalidate
D
take V − 1 for training and 1 for validation orderly
Ecv(H, A) =
1
V
V
v=1
E
(v)
val (g−
v )
• selection by Ecv: m∗ = argmin
1≤m≤M
(Em = Ecv(Hm, Am))
practical rule of thumb: V = 10
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 90/128
Hazard of Overfitting Validation
Final Words on Validation
‘Selecting’ Validation Tool
• V-Fold generally preferred over single validation if computation
allows
• 5-Fold or 10-Fold generally works well
Nature of Validation
• all training models: select among hypotheses
• all validation schemes: select among finalists
• all testing methods: just evaluate
validation still more optimistic than testing
do not fool yourself and others :-),
report test result, not best validation result
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 91/128
Hazard of Overfitting Principles of Learning
Hazard of Overfitting ::
Principles of Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 92/128
Hazard of Overfitting Principles of Learning
Occam’s Razor for Learning
The simplest model that fits the data is also the most
plausible.
which one do you prefer? :-)
My KISS Principle:
Keep It Simple, XXXXStupid Safe
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 93/128
Hazard of Overfitting Principles of Learning
Sampling Bias
If the data is sampled in a biased way, learning will pro-
duce a similarly biased outcome.
philosophical explanation:
study Math hard but test English: no strong test guarantee
A True Personal Story
• Netflix competition for movie recommender system:
10% improvement = 1M US dollars
• on my own validation data, first shot, showed 13% improvement
• why am I still teaching here? :-)
validation: random examples within data;
test: “last” user records “after” data
practical rule of thumb: match test scenario
as much as possible
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 94/128
Hazard of Overfitting Principles of Learning
Visual Data Snooping
If a data set has affected any step in the learning pro-
cess, its ability to assess the outcome has been com-
promised.
Visualize X = R2
• full Φ2: z = (1, x1, x2, x2
1 , x1x2, x2
2 ), dVC = 6
• or z = (1, x2
1 , x2
2 ), dVC = 3, after visualizing?
• or better z = (1, x2
1 + x2
2 ) , dVC = 2?
• or even better z = sign(0.6 − x2
1 − x2
2 ) ?
—careful about your brain’s ‘model complexity’
−1 0 1
−1
0
1
if you torture the data long enough, it will
confess :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 95/128
Hazard of Overfitting Principles of Learning
Dealing with Data Snooping
• truth—very hard to avoid, unless being extremely honest
• extremely honest: lock your test data in safe
• less honest: reserve validation and use cautiously
• be blind: avoid making modeling decision by data
• be suspicious: interpret research results (including your own) by
proper feeling of contamination
one secret to winning KDDCups:
careful balance between
data-driven modeling (snooping) and
validation (no-snooping)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 96/128
Hazard of Overfitting Principles of Learning
Mini-Summary
Hazard of Overfitting
Overfitting
the ‘accident’ that is everywhere in learning
Data Manipulation and Regularization
clean data, synthetic data, or augmented error
Validation
honestly simulate testing procedure for proper model selection
Principles of Learning
simple model, matching test scenario, and no snooping
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 97/128
Modern Machine Learning Models
Roadmap
Modern Machine Learning Models
Support Vector Machine
Random Forest
Adaptive (or Gradient) Boosting
Deep Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 98/128
Modern Machine Learning Models Support Vector Machine
Modern Machine Learning Models ::
Support Vector Machine
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 99/128
Modern Machine Learning Models Support Vector Machine
Motivation: Large-Margin Separating Hyperplane
max
w
fatness(w)
subject to w classifies every (xn, yn) correctly
fatness(w) = min
n=1,...,N
distance(xn, w)
• fatness: formally called margin
• correctness: yn = sign(wT xn)
initial goal: find largest-margin
separating hyperplane
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 100/128
Modern Machine Learning Models Support Vector Machine
Motivation: Large-Margin Separating Hyperplane
max
w
margin(w)
subject to every ynwT
xn  0
margin(w) = min
n=1,...,N
distance(xn, w)
• fatness: formally called margin
• correctness: yn = sign(wT xn)
initial goal: find largest-margin
separating hyperplane
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 100/128
Modern Machine Learning Models Support Vector Machine
Soft-Margin Support Vector Machine
initial goal: find largest-margin separating hyperplane
• soft-margin (practical) SVM: not insisting on separating:
• minimize large-margin regularizer + C· separation error,
• just like regularization with augmented error
min Eaug(w) = Ein(w) + λ
N wT
w
• two forms:
• finding hyperplane in original space (linear first!!)
LIBLINEAR www.csie.ntu.edu.tw/~cjlin/liblinear
• or in mysterious transformed space hidden in ‘kernels’
LIBSVM www.csie.ntu.edu.tw/~cjlin/libsvm
linear: ‘best’ linear classification model;
non-linear: ‘leading’ non-linear classification model for mid-sized data
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 101/128
Modern Machine Learning Models Support Vector Machine
Hypothesis of Gaussian SVM
Gaussian kernel K(x, x ) = exp −γ x − x 2
gSVM(x) = sign
SV
αnynK(xn, x) + b
= sign
SV
αnynexp −γ x − xn
2
+ b
• linear combination of Gaussians centered at SVs xn
• also called Radial Basis Function (RBF) kernel
Gaussian SVM:
find αn to combine Gaussians centered at xn
 achieve large margin in infinite-dim. space
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 102/128
Modern Machine Learning Models Support Vector Machine
Support Vector Mechanism
large-margin
hyperplanes
+ higher-order transforms with kernel trick
+ noise tolerance of soft-margin
# not many
boundary sophisticated
• transformed vector z = Φ(x) =⇒ efficient kernel K(x, x )
• store optimal w =⇒ store a few SVs and αn
new possibility by Gaussian SVM:
infinite-dimensional linear classification, with
generalization ‘guarded by’ large-margin :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 103/128
Modern Machine Learning Models Support Vector Machine
Practical Need: Model Selection
replacemen
• large γ =⇒ sharp
Gaussians =⇒ ‘overfit’?
• complicated even for (C, γ)
of Gaussian SVM
• more combinations if
including other kernels or
parameters
how to select? validation :-)
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 104/128
Modern Machine Learning Models Support Vector Machine
Step-by-step Use of SVM
strongly recommended: ‘A Practical Guide to Support Vector
Classification’
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
1 scale each feature of your data to a suitable range (say, [−1, 1])
2 use a Gaussian RBF kernel
3 use cross validation and grid search to choose good (γ, C)
4 use the best (γ, C) on your data
5 do testing with the learned SVM classifier
all included in easy.py of LIBSVM
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 105/128
Modern Machine Learning Models Random Forest
Modern Machine Learning Models ::
Random Forest
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 106/128
Modern Machine Learning Models Random Forest
Random Forest (RF)
random forest (RF) =
bagging (random sampling) + fully-grown CRT decision tree
function RandomForest(D)
For t = 1, 2, . . . , T
1 request size-N data ˜Dt by
bootstrapping with D
2 obtain tree gt by DTree( ˜Dt )
return G = Uniform({gt })
function DTree(D)
if termination return base gt
else
1 learn b(x) and split D to
Dc by b(x)
2 build Gc ← DTree(Dc)
3 return G(x) =
C
c=1
b(x) = c Gc(x)
• highly parallel/efficient to learn
• inherit pros of CRT
• eliminate cons of fully-grown tree
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 107/128
Modern Machine Learning Models Random Forest
Feature Selection
for x = (x1, x2, . . . , xd ), want to remove
• redundant features: like keeping one of ‘age’ and ‘full birthday’
• irrelevant features: like insurance type for cancer prediction
and only ‘learn’ subset-transform Φ(x) = (xi1
, xi2
, xid
)
with d  d for g(Φ(x))
advantages:
• efficiency: simpler
hypothesis and shorter
prediction time
• generalization: ‘feature
noise’ removed
• interpretability
disadvantages:
• computation:
‘combinatorial’ optimization
in training
• overfit: ‘combinatorial’
selection
• mis-interpretability
decision tree: a rare model
with built-in feature selection
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 108/128
Modern Machine Learning Models Random Forest
Feature Selection by Importance
idea: if possible to calculate
importance(i) for i = 1, 2, . . . , d
then can select i1, i2, . . . , id of top-d importance
importance by linear model
score = wT
x =
d
i=1
wixi
• intuitive estimate: importance(i) = |wi| with some ‘good’ w
• getting ‘good’ w: learned from data
• non-linear models? often much harder
but ‘easy’ feature selection in RF
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 109/128
Modern Machine Learning Models Random Forest
Feature Importance by Permutation Test
idea: random test
—if feature i needed, ‘random’ values of xn,i degrades performance
permutation test:
importance(i) = performance(D) − performance(D(p)
)
with D(p) is D with {xn,i} replaced by permuted {xn,i}N
n=1
permutation test: a general statistical tool that
can be easily coupled with RF
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 110/128
Modern Machine Learning Models Random Forest
A Complicated Data Set
gt (N = N/2) G with first t trees
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
Modern Machine Learning Models Random Forest
A Complicated Data Set
gt (N = N/2) G with first t trees
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
Modern Machine Learning Models Random Forest
A Complicated Data Set
gt (N = N/2) G with first t trees
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
Modern Machine Learning Models Random Forest
A Complicated Data Set
gt (N = N/2) G with first t trees
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
Modern Machine Learning Models Random Forest
A Complicated Data Set
gt (N = N/2) G with first t trees
‘easy yet robust’ nonlinear model
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Modern Machine Learning Models ::
Adaptive (or Gradient) Boosting
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 112/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Apple Recognition Problem
• is this a picture of an apple?
• say, want to teach a class of 6 year olds
• gather photos under CC-BY-2.0 license on Flicker
(thanks to the authors below!)
(APAL stands for Apple and Pear Australia Ltd)
Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster
https:
//flic.
kr/p/jNQ55
https:
//flic.
kr/p/jzP1VB
https:
//flic.
kr/p/bdy2hZ
https:
//flic.
kr/p/51DKA8
https:
//flic.
kr/p/9C3Ybd
nachans APAL Jo Jakeman APAL APAL
https:
//flic.
kr/p/9XD7Ag
https:
//flic.
kr/p/jzRe4u
https:
//flic.
kr/p/7jwtGp
https:
//flic.
kr/p/jzPYNr
https:
//flic.
kr/p/jzScif
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 113/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Apple Recognition Problem
• is this a picture of an apple?
• say, want to teach a class of 6 year olds
• gather photos under CC-BY-2.0 license on Flicker
(thanks to the authors below!)
Mr. Roboto. Richard North Richard North Emilian Robert
Vicol
Nathaniel Mc-
Queen
https:
//flic.
kr/p/i5BN85
https:
//flic.
kr/p/bHhPkB
https:
//flic.
kr/p/d8tGou
https:
//flic.
kr/p/bpmGXW
https:
//flic.
kr/p/pZv1Mf
Crystal jfh686 skyseeker Janet Hudson Rennett Stowe
https:
//flic.
kr/p/kaPYp
https:
//flic.
kr/p/6vjRFH
https:
//flic.
kr/p/2MynV
https:
//flic.
kr/p/7QDBbm
https:
//flic.
kr/p/agmnrk
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 113/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Begins
• Teacher: Please look at the pictures of apples and non-apples
below. Based on those pictures, how would you describe an
apple? Michael?
• Michael: I think apples are circular.
(Class): Apples are circular.
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 114/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Continues
• Teacher: Being circular is a good feature for the apples. However,
if you only say circular, you could make several mistakes. What
else can we say for an apple? Tina?
• Tina: It looks like apples are red.
(Class): Apples are somewhat circular and
somewhat red.
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 115/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Continues More
• Teacher: Yes. Many apples are red. However, you could still make
mistakes based on circular and red. Do you have any other
suggestions, Joey?
• Joey: Apples could also be green.
(Class): Apples are somewhat circular and
somewhat red and possibly green.
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 116/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Ends
• Teacher: Yes. It seems that apples might be circular, red, green.
But you may confuse them with tomatoes or peaches, right? Any
more suggestions, Jessica?
• Jessica: Apples have stems at the top.
(Class): Apples are somewhat circular, somewhat red, possibly green,
and may have stems at the top.
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 117/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Motivation
• students: simple hypotheses gt (like vertical/horizontal lines)
• (Class): sophisticated hypothesis G (like black curve)
• Teacher: a tactic learning algorithm that directs the students to
focus on key examples
next: demo of such an algorithm
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 118/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Putting Everything Together
Gradient Boosted Decision Tree (GBDT)
s1 = s2 = . . . = sN = 0
for t = 1, 2, . . . , T
1 obtain gt by A({(xn, yn − sn)}) where A is a (squared-error)
regression algorithm
—such as ‘weak’ CRT?
2 compute αt = OneVarLinearRegression({(gt (xn), yn − sn)})
3 update sn ← sn + αt gt (xn)
return G(x) = T
t=1 αt gt (x)
GBDT: ‘regression sibling’ of AdaBoost +
decision tree
—very popular in practice
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 120/128
Modern Machine Learning Models Deep Learning
Modern Machine Learning Models ::
Deep Learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 121/128
Modern Machine Learning Models Deep Learning
Physical Interpretation of Neural Network
x0 = 1
x1
x2
...
xd
+1
tanh
tanh
w
(1)
ij w
(2)
jk w
(3)
kq
+1
tanh
tanh
tanhs
(2)
3 x
(2)
3
• each layer: pattern feature extracted from data
• how many neurons? how many layers?
—more generally, what structure?
• subjectively, your design!
• objectively, validation, maybe?
structural decisions:
key issue for applying NNet
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 122/128
Modern Machine Learning Models Deep Learning
Shallow versus Deep Neural Networks
shallow: few (hidden) layers; deep: many layers
Shallow NNet
• more efficient to train ( )
• simpler structural
decisions ( )
• theoretically powerful
enough ( )
Deep NNet
• challenging to train (×)
• sophisticated structural
decisions (×)
• ‘arbitrarily’ powerful ( )
• more ‘meaningful’? (see
next slide)
deep NNet (deep learning)
became popular in recent years
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 123/128
Modern Machine Learning Models Deep Learning
Meaningfulness of Deep Learning
,
is it a ‘1’? ✲ ✛ is it a ‘5’?
✻
z1 z5
φ1
φ2 φ3 φ4 φ5
φ6
positive weight
negative weight
• ‘less burden’ for each layer: simple to complex features/patterns
• natural for difficult learning task with raw features, like vision
deep NNet: currently popular in
vision/speech/. . .
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 124/128
Modern Machine Learning Models Deep Learning
Challenges and Key Techniques for Deep Learning
• difficult structural decisions:
• sometimes subjective with domain knowledge
• high model complexity:
• no big worries if big enough data
• regularization towards noise-tolerant: like
• dropout (tolerant when network corrupted)
• denoising (tolerant when input corrupted)
• hard optimization problem:
• careful initialization to avoid bad local minimum, e.g. pre-trained
models
• huge computational complexity (worsen with big data):
• novel hardware/architecture: like mini-batch with GPU
structural and regularization techniques are
consistently being developed
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 125/128
Modern Machine Learning Models Deep Learning
Structural Decision in Convolutional Neural Network
node
node
node
node
node
NNet
• special nodes consisting of
• weighted sum on local inputs (called convolution)
• nonlinear transform function
• down-sampling (called polling)
• mimics ‘domain knowlege’ about human vision
arguably the earliest success of deep learning
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 126/128
Modern Machine Learning Models Deep Learning
Mini-Summary
Modern Machine Learning Models
Support Vector Machine
large-margin boundary ranging from linear to non-linear
Random Forest
uniform blending of many many decision trees
Adaptive (or Gradient) Boosting
keep adding simple hypotheses to gang
Deep Learning
neural network with deep architecture and careful design
Thank you!!
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 127/128
Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 128/128

Weitere ähnliche Inhalte

Was ist angesagt?

「全ての確率はコイン投げに通ず」 Japan.R 発表資料
「全ての確率はコイン投げに通ず」 Japan.R 発表資料「全ての確率はコイン投げに通ず」 Japan.R 発表資料
「全ての確率はコイン投げに通ず」 Japan.R 発表資料Ken'ichi Matsui
 
モデル最適化指標・評価指標の選び方
モデル最適化指標・評価指標の選び方モデル最適化指標・評価指標の選び方
モデル最適化指標・評価指標の選び方幹雄 小川
 
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
TensorFlow Liteを使った組み込みディープラーニング開発
TensorFlow Liteを使った組み込みディープラーニング開発TensorFlow Liteを使った組み込みディープラーニング開発
TensorFlow Liteを使った組み込みディープラーニング開発Makoto Koike
 
パターン認識 08 09 k-近傍法 lvq
パターン認識 08 09 k-近傍法 lvqパターン認識 08 09 k-近傍法 lvq
パターン認識 08 09 k-近傍法 lvqsleipnir002
 
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめDeep Learning JP
 
分析コンペティションの光と影
分析コンペティションの光と影分析コンペティションの光と影
分析コンペティションの光と影Ken'ichi Matsui
 
エンジニアのための機械学習の基礎
エンジニアのための機械学習の基礎エンジニアのための機械学習の基礎
エンジニアのための機械学習の基礎Daiyu Hatakeyama
 
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築Tatsuya Tojima
 
Pythonによる機械学習
Pythonによる機械学習Pythonによる機械学習
Pythonによる機械学習Kimikazu Kato
 
フリーソフトではじめるChIP-seq解析_第40回勉強会資料
フリーソフトではじめるChIP-seq解析_第40回勉強会資料フリーソフトではじめるChIP-seq解析_第40回勉強会資料
フリーソフトではじめるChIP-seq解析_第40回勉強会資料Amelieff
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSangath babu
 
機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計Takahiro Kubo
 
XAI (説明可能なAI) の必要性
XAI (説明可能なAI) の必要性XAI (説明可能なAI) の必要性
XAI (説明可能なAI) の必要性西岡 賢一郎
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령Namkug Kim
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineDr. Radhey Shyam
 
大規模な組合せ最適化問題に対する発見的解法
大規模な組合せ最適化問題に対する発見的解法大規模な組合せ最適化問題に対する発見的解法
大規模な組合せ最適化問題に対する発見的解法Shunji Umetani
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 

Was ist angesagt? (20)

「全ての確率はコイン投げに通ず」 Japan.R 発表資料
「全ての確率はコイン投げに通ず」 Japan.R 発表資料「全ての確率はコイン投げに通ず」 Japan.R 発表資料
「全ての確率はコイン投げに通ず」 Japan.R 発表資料
 
モデル最適化指標・評価指標の選び方
モデル最適化指標・評価指標の選び方モデル最適化指標・評価指標の選び方
モデル最適化指標・評価指標の選び方
 
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
TensorFlow Liteを使った組み込みディープラーニング開発
TensorFlow Liteを使った組み込みディープラーニング開発TensorFlow Liteを使った組み込みディープラーニング開発
TensorFlow Liteを使った組み込みディープラーニング開発
 
パターン認識 08 09 k-近傍法 lvq
パターン認識 08 09 k-近傍法 lvqパターン認識 08 09 k-近傍法 lvq
パターン認識 08 09 k-近傍法 lvq
 
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
 
分析コンペティションの光と影
分析コンペティションの光と影分析コンペティションの光と影
分析コンペティションの光と影
 
エンジニアのための機械学習の基礎
エンジニアのための機械学習の基礎エンジニアのための機械学習の基礎
エンジニアのための機械学習の基礎
 
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
 
Pythonによる機械学習
Pythonによる機械学習Pythonによる機械学習
Pythonによる機械学習
 
フリーソフトではじめるChIP-seq解析_第40回勉強会資料
フリーソフトではじめるChIP-seq解析_第40回勉強会資料フリーソフトではじめるChIP-seq解析_第40回勉強会資料
フリーソフトではじめるChIP-seq解析_第40回勉強会資料
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計
 
XAI (説明可能なAI) の必要性
XAI (説明可能なAI) の必要性XAI (説明可能なAI) の必要性
XAI (説明可能なAI) の必要性
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
大規模な組合せ最適化問題に対する発見的解法
大規模な組合せ最適化問題に対する発見的解法大規模な組合せ最適化問題に対する発見的解法
大規模な組合せ最適化問題に対する発見的解法
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 

Andere mochten auch

[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹台灣資料科學年會
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用台灣資料科學年會
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務台灣資料科學年會
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)台灣資料科學年會
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學台灣資料科學年會
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法台灣資料科學年會
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班台灣資料科學年會
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路台灣資料科學年會
 
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會台灣資料科學年會
 
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨台灣資料科學年會
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123台灣資料科學年會
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走台灣資料科學年會
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業台灣資料科學年會
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人台灣資料科學年會
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 

Andere mochten auch (20)

[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist
 
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
 
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
 
[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 

Ähnlich wie [系列活動] 機器學習速遊

A General Overview of Machine Learning
A General Overview of Machine LearningA General Overview of Machine Learning
A General Overview of Machine LearningAshish Sharma
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptxNaveenkushwaha18
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learningKiran Lonikar
 
Machine Learning GDSC DCE Darbhanga.pptx
Machine Learning GDSC DCE Darbhanga.pptxMachine Learning GDSC DCE Darbhanga.pptx
Machine Learning GDSC DCE Darbhanga.pptxDCETechnicalClub
 
ML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfShiwani Gupta
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
INTRO TO ML.pptx
INTRO TO ML.pptxINTRO TO ML.pptx
INTRO TO ML.pptxGDSCACEM
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning SolivarLabs
 
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...European Innovation Academy
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSSSSSS354882
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversionsSudeep Shukla
 

Ähnlich wie [系列活動] 機器學習速遊 (20)

機器學習速遊
機器學習速遊機器學習速遊
機器學習速遊
 
A General Overview of Machine Learning
A General Overview of Machine LearningA General Overview of Machine Learning
A General Overview of Machine Learning
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
 
lecture1.pptx
lecture1.pptxlecture1.pptx
lecture1.pptx
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 
Machine Learning GDSC DCE Darbhanga.pptx
Machine Learning GDSC DCE Darbhanga.pptxMachine Learning GDSC DCE Darbhanga.pptx
Machine Learning GDSC DCE Darbhanga.pptx
 
ML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
INTRO TO ML.pptx
INTRO TO ML.pptxINTRO TO ML.pptx
INTRO TO ML.pptx
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
ML basics.pptx
ML basics.pptxML basics.pptx
ML basics.pptx
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 

Mehr von 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

Mehr von 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Kürzlich hochgeladen

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Kürzlich hochgeladen (20)

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

[系列活動] 機器學習速遊

  • 1. Quick Tour of Machine Learning (機器學習速遊)   Hsuan-Tien Lin (林軒田) Chief Data Scientist @ Appier (沛星互動科技) 資料科學愛好者年會系列活動, 2017/03/11 Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 0/128
  • 2. Learning from Data Disclaimer • just super-condensed and shuffled version of • my co-authored textbook “Learning from Data: A Short Course” • my two NTU-Coursera Mandarin-teaching ML Massive Open Online Courses • Machine Learning Foundations • Machine Learning Techniques http://www.csie.ntu.edu.tw/~htlin/mooc/ —impossible to be complete, with most math details removed • decorated with my live experience at Appier, a rapidly-growing AI company • audience interaction is important Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 1/128
  • 3. Learning from Data Roadmap Learning from Data What is Machine Learning Components of Machine Learning Types of Machine Learning Step-by-step Machine Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 2/128
  • 4. Learning from Data What is Machine Learning Learning from Data :: What is Machine Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 3/128
  • 5. Learning from Data What is Machine Learning From Learning to Machine Learning learning: acquiring skill with experience accumulated from observations observations learning skill machine learning: acquiring skill with experience accumulated/computed from data data ML skill What is skill? Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 4/128
  • 6. Learning from Data What is Machine Learning A More Concrete Definition skill ⇔ improve some performance measure (e.g. prediction accuracy) machine learning: improving some performance measure with experience computed from data data ML improved performance measure An Application in Computational Finance stock data ML more investment gain Why use machine learning? Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 5/128
  • 7. Learning from Data What is Machine Learning Yet Another Application: Tree Recognition • ‘define’ trees and hand-program: difficult • learn from data (observations) and recognize: a 3-year-old can do so • ‘ML-based tree recognition system’ can be easier to build than hand-programmed system ML: an alternative route to build complicated systems Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 6/128
  • 8. Learning from Data What is Machine Learning The Machine Learning Route ML: an alternative route to build complicated systems Some Use Scenarios • when human cannot program the system manually —navigating on Mars • when human cannot ‘define the solution’ easily —speech/visual recognition • when needing large-scale decisions that humans cannot do —cross-device unification Appier • when needing rapid decisions that humans cannot do —real-time bidding advertisement Appier Give a computer a fish, you feed it for a day; teach it how to fish, you feed it for a lifetime. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 7/128
  • 9. Learning from Data What is Machine Learning Machine Learning and Artificial Intelligence Machine Learning use data to compute something that improves performance Artificial Intelligence compute something that shows intelligent behavior • improving performance is something that shows intelligent behavior —ML can realize AI, among other routes • e.g. chess playing • traditional AI: game tree • ML for AI: ‘learning from board data’ ML is one possible and arguably most popular route to realize AI Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 8/128
  • 10. Learning from Data Components of Machine Learning Learning from Data :: Components of Machine Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 9/128
  • 11. Learning from Data Components of Machine Learning Components of Learning: Metaphor Using Credit Approval Applicant Information age 23 years gender female annual salary NTD 1,000,000 year in residence 1 year year in job 0.5 year current debt 200,000 what to learn? (for improving performance): ‘approve credit card good for bank?’ Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 10/128
  • 12. Learning from Data Components of Machine Learning Formalize the Learning Problem Basic Notations • input: x ∈ X (customer application) • output: y ∈ Y (good/bad after approving credit card) • unknown underlying pattern to be learned ⇔ target function: f : X → Y (ideal credit approval formula) • data ⇔ training examples: D = {(x1, y1), (x2, y2), · · · , (xN, yN)} (historical records in bank) • hypothesis ⇔ skill with hopefully good performance: g : X → Y (‘learned’ formula to be used), i.e. approve if • h1: annual salary > NTD 800,000 • h2: debt > NTD 100,000 (really?) • h3: year in job ≤ 2 (really?) —all candidate formula being considered: hypothesis set H —procedure to learn best formula: algorithm A {(xn, yn)} from f ML (A, H) g Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 11/128
  • 13. Learning from Data Components of Machine Learning Practical Definition of Machine Learning unknown target function f : X → Y (ideal credit approval formula) training examples D: (x1, y1), · · · , (xN , yN ) (historical records in bank) learning algorithm A final hypothesis g ≈ f (‘learned’ formula to be used) hypothesis set H (set of candidate formula) machine learning (A and H): use data to compute hypothesis g that approximates target f Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 12/128
  • 14. Learning from Data Components of Machine Learning Key Essence of Machine Learning machine learning: use data to compute hypothesis g that approximates target f data ML improved performance measure 1 exists some ‘underlying pattern’ to be learned —so ‘performance measure’ can be improved 2 but no programmable (easy) definition —so ‘ML’ is needed 3 somehow there is data about the pattern —so ML has some ‘inputs’ to learn from key essence: help decide whether to use ML Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 13/128
  • 15. Learning from Data Types of Machine Learning Learning from Data :: Types of Machine Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 14/128
  • 16. Learning from Data Types of Machine Learning Visualizing Credit Card Problem • customer features x: points on the plane (or points in Rd ) • labels y: ◦ (+1), × (-1) called binary classification • hypothesis h: lines here, but possibly other curves • different curve classifies customers differently binary classification algorithm: find good decision boundary curve g Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 15/128
  • 17. Learning from Data Types of Machine Learning More Binary Classification Problems • credit approve/disapprove • email spam/non-spam • patient sick/not sick • ad profitable/not profitable core and important problem with many tools as building block of other tools Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 16/128
  • 18. Learning from Data Types of Machine Learning Binary Classification for Education data ML skill • data: students’ records on quizzes on a Math tutoring system • skill: predict whether a student can give a correct answer to another quiz question A Possible ML Solution answer correctly ≈ recent strength of student > difficulty of question • give ML 9 million records from 3000 students • ML determines (reverse-engineers) strength and difficulty automatically key part of the world-champion system from National Taiwan Univ. in KDDCup 2010 Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 17/128
  • 19. Learning from Data Types of Machine Learning Multiclass Classification: Coin Recognition Problem 25 5 1 Mass Size 10 • classify US coins (1c, 5c, 10c, 25c) by (size, mass) • Y = {1c, 5c, 10c, 25c}, or Y = {1, 2, · · · , K} (abstractly) • binary classification: special case with K = 2 Other Multiclass Classification Problems • written digits ⇒ 0, 1, · · · , 9 • pictures ⇒ apple, orange, strawberry • emails ⇒ spam, primary, social, promotion, update (Google) many applications in practice, especially for ‘recognition’ Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 18/128
  • 20. Learning from Data Types of Machine Learning Regression: Patient Recovery Prediction Problem • binary classification: patient features ⇒ sick or not • multiclass classification: patient features ⇒ which type of cancer • regression: patient features ⇒ how many days before recovery • Y = R or Y = [lower, upper] ⊂ R (bounded regression) —deeply studied in statistics Other Regression Problems • company data ⇒ stock price • climate data ⇒ temperature also core and important with many ‘statistical’ tools as building block of other tools Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 19/128
  • 21. Learning from Data Types of Machine Learning Regression for Recommender System (1/2) data ML skill • data: how many users have rated some movies • skill: predict how a user would rate an unrated movie A Hot Problem • competition held by Netflix in 2006 • 100,480,507 ratings that 480,189 users gave to 17,770 movies • 10% improvement = 1 million dollar prize • similar competition (movies → songs) held by Yahoo! in KDDCup 2011 • 252,800,275 ratings that 1,000,990 users gave to 624,961 songs How can machines learn our preferences? Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 20/128
  • 22. Learning from Data Types of Machine Learning Regression for Recommender System (2/2) Match movie and viewer factors predicted rating comedy content action content blockbuster? Tom Cruisein it? likes Tom Cruise? prefers blockbusters? likes action? likes comedy? movie viewer add contributions from each factor A Possible ML Solution • pattern: rating ← viewer/movie factors • learning: known rating → learned factors → unknown rating prediction key part of the world-champion (again!) system from National Taiwan Univ. in KDDCup 2011 Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 21/128
  • 23. Learning from Data Types of Machine Learning Supervised versus Unsupervised coin recognition with yn 25 5 1 Mass Size 10 supervised multiclass classification coin recognition without yn Mass Size unsupervised multiclass classification ⇐⇒ ‘clustering’ Other Clustering Problems • articles ⇒ topics • consumer profiles ⇒ consumer groups clustering: a challenging but useful problem Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 22/128
  • 24. Learning from Data Types of Machine Learning Supervised versus Unsupervised coin recognition with yn 25 5 1 Mass Size 10 supervised multiclass classification coin recognition without yn Mass Size unsupervised multiclass classification ⇐⇒ ‘clustering’ Other Clustering Problems • articles ⇒ topics • consumer profiles ⇒ consumer groups clustering: a challenging but useful problem Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 22/128
  • 25. Learning from Data Types of Machine Learning Semi-supervised: Coin Recognition with Some yn 25 5 1 Mass Size 10 supervised 25 5 1 Mass Size 10 semi-supervised Mass Size unsupervised (clustering) Other Semi-supervised Learning Problems • face images with a few labeled ⇒ face identifier (Facebook) • medicine data with a few labeled ⇒ medicine effect predictor semi-supervised learning: leverage unlabeled data to avoid ‘expensive’ labeling Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 23/128
  • 26. Learning from Data Types of Machine Learning Reinforcement Learning a ‘very different’ but natural way of learning Teach Your Dog: Say ‘Sit Down’ The dog pees on the ground. BAD DOG. THAT’S A VERY WRONG ACTION. • cannot easily show the dog that yn = sit when xn = ‘sit down’ • but can ‘punish’ to say ˜yn = pee is wrong Other Reinforcement Learning Problems Using (x, ˜y, goodness) • (customer, ad choice, ad click earning) ⇒ ad system • (cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with ‘partial/implicit information’ (often sequentially) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 24/128
  • 27. Learning from Data Types of Machine Learning Reinforcement Learning a ‘very different’ but natural way of learning Teach Your Dog: Say ‘Sit Down’ The dog sits down. Good Dog. Let me give you some cookies. • still cannot show yn = sit when xn = ‘sit down’ • but can ‘reward’ to say ˜yn = sit is good Other Reinforcement Learning Problems Using (x, ˜y, goodness) • (customer, ad choice, ad click earning) ⇒ ad system • (cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with ‘partial/implicit information’ (often sequentially) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 24/128
  • 28. Learning from Data Step-by-step Machine Learning Learning from Data :: Step-by-step Machine Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 25/128
  • 29. Learning from Data Step-by-step Machine Learning Step-by-step Machine Learning unknown target function f : X → Y (ideal credit approval formula) training examples D: (x1, y1), · · · , (xN , yN ) (historical records in bank) learning algorithm A final hypothesis g ≈ f (‘learned’ formula to be used) hypothesis set H (set of candidate formula) 1 choose error measure: how g(x) ≈ f(x) 2 decide hypothesis set H 3 optimize error and more on D as A 4 pray for generalization: whether g(x) ≈ f(x) for unseen x Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 26/128
  • 30. Learning from Data Step-by-step Machine Learning Choose Error Measure g ≈ f can often evaluate by averaged err (g(x), f(x)), called pointwise error measure in-sample (within data) Ein(g) = 1 N N n=1 err(g(xn), f(xn) yn ) out-of-sample (future data) Eout(g) = E future x err(g(x), f(x)) will start from 0/1 error err(˜y, y) = ˜y = y for classification Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 27/128
  • 31. Learning from Data Step-by-step Machine Learning Choose Hypothesis Set (for Credit Approval) age 23 years annual salary NTD 1,000,000 year in job 0.5 year current debt 200,000 • For x = (x1, x2, · · · , xd ) ‘features of customer’, compute a weighted ‘score’ and approve credit if d i=1 wixi > threshold deny credit if d i=1 wixi < threshold • Y: +1(good), −1(bad) , 0 ignored—linear formula h ∈ H are h(x) = sign d i=1 wixi − threshold linear (binary) classifier, called ‘perceptron’ historically Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 28/128
  • 32. Learning from Data Step-by-step Machine Learning Optimize Error (and More) on Data H = all possible perceptrons, g =? • want: g ≈ f (hard when f unknown) • almost necessary: g ≈ f on D, ideally g(xn) = f(xn) = yn • difficult: H is of infinite size • idea: start from some g0, and ‘correct’ its mistakes on D let’s visualize without math Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 29/128
  • 33. Learning from Data Step-by-step Machine Learning Seeing is Believing initially worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 34. Learning from Data Step-by-step Machine Learning Seeing is Believing x 1 w(t+1) update: 1 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 35. Learning from Data Step-by-step Machine Learning Seeing is Believing x9 w(t) w(t+1) update: 2 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 36. Learning from Data Step-by-step Machine Learning Seeing is Believing x 14 w(t) w(t+1) update: 3 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 37. Learning from Data Step-by-step Machine Learning Seeing is Believing x3 w(t) w(t+1) update: 4 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 38. Learning from Data Step-by-step Machine Learning Seeing is Believing x9 w(t) w(t+1) update: 5 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 39. Learning from Data Step-by-step Machine Learning Seeing is Believing x 14 w(t) w(t+1) update: 6 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 40. Learning from Data Step-by-step Machine Learning Seeing is Believing x9 w(t) w(t+1) update: 7 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 41. Learning from Data Step-by-step Machine Learning Seeing is Believing x 14 w(t) w(t+1) update: 8 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 42. Learning from Data Step-by-step Machine Learning Seeing is Believing x9 w(t) w(t+1) update: 9 worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 43. Learning from Data Step-by-step Machine Learning Seeing is Believing wPLA finally worked like a charm with < 20 lines!! —A fault confessed is half redressed. :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 30/128
  • 44. Learning from Data Step-by-step Machine Learning Pray for Generalization (pictures from Google Image Search) Parent ? (picture, label) pairs ? Kid’s good hypothesisbrain ' & $ % - 6 alternatives Target f(x) + noise ? examples (picture xn, label yn) ? learning good hypothesis g(x) ≈ f(x) algorithm ' & $ % - 6 hypothesis set H challenge: see only {(xn, yn)} without knowing f nor noise ? =⇒ generalize to unseen (x, y) w.r.t. f(x) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 31/128
  • 45. Learning from Data Step-by-step Machine Learning Generalization Is Non-trivial Bob impresses Alice by memorizing every given (movie, rank); but too nervous about a new movie and guesses randomly (pictures from Google Image Search) memorize = generalize perfect from Bob’s view = good for Alice perfect during training = good when testing take-home message: if H is simple (like lines), generalization is usually possible Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 32/128
  • 46. Learning from Data Step-by-step Machine Learning Mini-Summary Learning from Data What is Machine Learning use data to approximate target Components of Machine Learning algorithm A takes data D and hypotheses H to get hypothesis g Types of Machine Learning variety of problems almost everywhere Step-by-step Machine Learning error, hypotheses, optimize, generalize Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 33/128
  • 47. Fundamental Machine Learning Models Roadmap Fundamental Machine Learning Models Linear Regression Logistic Regression Nonlinear Transform Decision Tree Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 34/128
  • 48. Fundamental Machine Learning Models Linear Regression Fundamental Machine Learning Models :: Linear Regression Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 35/128
  • 49. Fundamental Machine Learning Models Linear Regression Credit Limit Problem age 23 years gender female annual salary NTD 1,000,000 year in residence 1 year year in job 0.5 year current debt 200,000 credit limit? 100,000 unknown target function f : X → Y (ideal credit limit formula) training examples D: (x1, y1), · · · , (xN , yN ) (historical records in bank) learning algorithm A final hypothesis g ≈ f (‘learned’ formula to be used) hypothesis set H (set of candidate formula) Y = R: regression Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 36/128
  • 50. Fundamental Machine Learning Models Linear Regression Linear Regression Hypothesis age 23 years annual salary NTD 1,000,000 year in job 0.5 year current debt 200,000 • For x = (x0, x1, x2, · · · , xd ) ‘features of customer’, approximate the desired credit limit with a weighted sum: y ≈ d i=0 wixi • linear regression hypothesis: h(x) = wT x h(x): like perceptron, but without the sign Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 37/128
  • 51. Fundamental Machine Learning Models Linear Regression Illustration of Linear Regression x = (x) ∈ R x y x = (x1, x2) ∈ R2 x1 x2 y x1 x2 y linear regression: find lines/hyperplanes with small residuals Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 38/128
  • 52. Fundamental Machine Learning Models Linear Regression The Error Measure popular/historical error measure: squared error err(ˆy, y) = (ˆy − y)2 in-sample Ein(hw) = 1 N N n=1 (h(xn) wT xn − yn)2 out-of-sample Eout(w) = E (x,y)∼P (wT x − y)2 next: how to minimize Ein(w)? Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 39/128
  • 53. Fundamental Machine Learning Models Linear Regression Minimize Ein min w Ein(w) = 1 N N n=1 (wT xn − yn)2 w Ein • Ein(w): continuous, differentiable, convex • necessary condition of ‘best’ w Ein(w) ≡      ∂Ein ∂w0 (w) ∂Ein ∂w1 (w) . . . ∂Ein ∂wd (w)      =      0 0 . . . 0      —not possible to ‘roll down’ task: find wLIN such that Ein(wLIN) = 0 Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 40/128
  • 54. Fundamental Machine Learning Models Linear Regression Linear Regression Algorithm 1 from D, construct input matrix X and output vector y by X =     − − xT 1 − − − − xT 2 − − · · · − − xT N − −     N×(d+1) y =     y1 y2 · · · yN     N×1 2 calculate pseudo-inverse X† (d+1)×N 3 return wLIN (d+1)×1 = X†y simple and efficient with good † routine Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 41/128
  • 55. Fundamental Machine Learning Models Linear Regression Is Linear Regression a ‘Learning Algorithm’? wLIN = X† y No! • analytic (closed-form) solution, ‘instantaneous’ • not improving Ein nor Eout iteratively Yes! • good Ein? yes, optimal! • good Eout? yes, ‘simple’ like perceptrons • improving iteratively? somewhat, within an iterative pseudo-inverse routine if Eout(wLIN) is good, learning ‘happened’! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 42/128
  • 56. Fundamental Machine Learning Models Logistic Regression Fundamental Machine Learning Models :: Logistic Regression Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 43/128
  • 57. Fundamental Machine Learning Models Logistic Regression Heart Attack Prediction Problem (1/2) age 40 years gender male blood pressure 130/85 cholesterol level 240 weight 70 heart disease? yes unknown target distribution P(y|x) containing f(x) + noise training examples D: (x1, y1), · · · , (xN , yN ) learning algorithm A final hypothesis g ≈ f hypothesis set H error measure err err binary classification: ideal f(x) = sign P(+1|x) − 1 2 ∈ {−1, +1} because of classification err Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 44/128
  • 58. Fundamental Machine Learning Models Logistic Regression Heart Attack Prediction Problem (2/2) age 40 years gender male blood pressure 130/85 cholesterol level 240 weight 70 heart attack? 80% risk unknown target distribution P(y|x) containing f(x) + noise training examples D: (x1, y1), · · · , (xN , yN ) learning algorithm A final hypothesis g ≈ f hypothesis set H error measure err err ‘soft’ binary classification: f(x) = P(+1|x) ∈ [0, 1] Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 45/128
  • 59. Fundamental Machine Learning Models Logistic Regression Soft Binary Classification target function f(x) = P(+1|x) ∈ [0, 1] ideal (noiseless) data x1, y1 = 0.9 = P(+1|x1) x2, y2 = 0.2 = P(+1|x2) ... xN, yN = 0.6 = P(+1|xN) actual (noisy) data x1, y1 = ◦ ∼ P(y|x1) x2, y2 = × ∼ P(y|x2) ... xN, yN = × ∼ P(y|xN) same data as hard binary classification, different target function Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 46/128
  • 60. Fundamental Machine Learning Models Logistic Regression Soft Binary Classification target function f(x) = P(+1|x) ∈ [0, 1] ideal (noiseless) data x1, y1 = 0.9 = P(+1|x1) x2, y2 = 0.2 = P(+1|x2) ... xN, yN = 0.6 = P(+1|xN) actual (noisy) data x1, y1 = 1 = ◦ ? ∼ P(y|x1) x2, y2 = 0 = ◦ ? ∼ P(y|x2) ... xN, yN = 0 = ◦ ? ∼ P(y|xN) same data as hard binary classification, different target function Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 46/128
  • 61. Fundamental Machine Learning Models Logistic Regression Logistic Hypothesis age 40 years gender male blood pressure 130/85 cholesterol level 240 • For x = (x0, x1, x2, · · · , xd ) ‘features of patient’, calculate a weighted ‘risk score’: s = d i=0 wixi • convert the score to estimated probability by logistic function θ(s) θ(s) 1 0 s logistic hypothesis: h(x) = θ(wT x) = 1 1+exp(−wT x) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 47/128
  • 62. Fundamental Machine Learning Models Logistic Regression Minimizing Ein(w) a popular error: Ein(w) = 1 N N n=1 ln 1 + exp(−ynwT xn) called cross- entropy derived from maximum likelihood w Ein • Ein(w): continuous, differentiable, twice-differentiable, convex • how to minimize? locate valley want Ein(w) = 0 most basic algorithm: gradient descent (roll downhill) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 48/128
  • 63. Fundamental Machine Learning Models Logistic Regression Gradient Descent For t = 0, 1, . . . wt+1 ← wt + ηv when stop, return last w as g • PLA: v comes from mistake correction • smooth Ein(w) for logistic regression: choose v to get the ball roll ‘downhill’? • direction v: (assumed) of unit length • step size η: (assumed) positive Weights, w In-sampleError,Ein gradient descent: v ∝ − Ein(wt ) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 49/128
  • 64. Fundamental Machine Learning Models Logistic Regression Putting Everything Together Logistic Regression Algorithm initialize w0 For t = 0, 1, · · · 1 compute Ein(wt ) = 1 N N n=1 θ −ynwT t xn −ynxn 2 update by wt+1 ← wt − η Ein(wt ) ...until Ein(wt+1) ≈ 0 or enough iterations return last wt+1 as g can use more sophisticated tools to speed up Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 50/128
  • 65. Fundamental Machine Learning Models Logistic Regression Linear Models Summarized linear scoring function: s = wT x linear classification h(x) = sign(s) s x x x x0 1 2 d h x( ) plausible err = 0/1 discrete Ein(w): solvable in special case linear regression h(x) = s s x x x x0 1 2 d h x( ) friendly err = squared quadratic convex Ein(w): closed-form solution logistic regression h(x) = θ(s) s x x x x0 1 2 d h x( ) plausible err = cross-entropy smooth convex Ein(w): gradient descent my ‘secret’: linear first!! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 51/128
  • 66. Fundamental Machine Learning Models Nonlinear Transform Fundamental Machine Learning Models :: Nonlinear Transform Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 52/128
  • 67. Fundamental Machine Learning Models Nonlinear Transform Linear Hypotheses up to now: linear hypotheses • visually: ‘line’-like boundary • mathematically: linear scores s = wT x but limited . . . −1 0 1 −1 0 1 • theoretically: complexity under control :-) • practically: on some D, large Ein for every line :-( how to break the limit of linear hypotheses Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 53/128
  • 68. Fundamental Machine Learning Models Nonlinear Transform Circular Separable −1 0 1 −1 0 1 −1 0 1 −1 0 1 • D not linear separable • but circular separable by a circle of radius √ 0.6 centered at origin: hSEP(x) = sign −x2 1 − x2 2 + 0.6 re-derive Circular-PLA, Circular-Regression, blahblah . . . all over again? :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 54/128
  • 69. Fundamental Machine Learning Models Nonlinear Transform Circular Separable and Linear Separable h(x) = sign    0.6 ˜w0 · 1 z0 +(−1) ˜w1 · x2 1 z1 +(−1) ˜w2 · x2 2 z2    = sign ˜wT z x1 x2 −1 0 1 −1 0 1 • {(xn, yn)} circular separable =⇒ {(zn, yn)} linear separable • x ∈ X Φ −→ z ∈ Z: (nonlinear) feature transform Φ z1 z2 0 0.5 1 0 0.5 1 circular separable in X =⇒ linear separable in Z Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 55/128
  • 70. Fundamental Machine Learning Models Nonlinear Transform General Quadratic Hypothesis Set a ‘bigger’ Z-space with Φ2(x) = (1, x1, x2, x2 1 , x1x2, x2 2 ) perceptrons in Z-space ⇐⇒ quadratic hypotheses in X-space HΦ2 = h(x): h(x) = ˜h(Φ2(x)) for some linear ˜h on Z • can implement all possible quadratic curve boundaries: circle, ellipse, rotated ellipse, hyperbola, parabola, . . . ellipse 2(x1 + x2 − 3)2 + (x1 − x2 − 4)2 = 1 ⇐= ˜wT = [33, −20, −4, 3, 2, 3] include lines and constants as degenerate cases Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 56/128
  • 71. Fundamental Machine Learning Models Nonlinear Transform Good Quadratic Hypothesis Z-space X-space perceptrons ⇐⇒ quadratic hypotheses good perceptron ⇐⇒ good quadratic hypothesis separating perceptron ⇐⇒ separating quadratic hypothesis z1 z2 0 0.5 1 0 0.5 1 ⇐⇒ x1 x2 −1 0 1 −1 0 1 • want: get good perceptron in Z-space • known: get good perceptron in X-space with data {(xn, yn)} solution: get good perceptron in Z-space with data {(zn = Φ2(xn), yn)} Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 57/128
  • 72. Fundamental Machine Learning Models Nonlinear Transform The Nonlinear Transform Steps −1 0 1 −1 0 1 Φ −→ 0 0.5 1 0 0.5 1 ↓ A −1 0 1 −1 0 1 Φ−1 ←− Φ −→ 0 0.5 1 0 0.5 1 1 transform original data {(xn, yn)} to {(zn = Φ(xn), yn)} by Φ 2 get a good perceptron ˜w using {(zn, yn)} and your favorite linear algorithm A 3 return g(x) = sign ˜wT Φ(x) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 58/128
  • 73. Fundamental Machine Learning Models Nonlinear Transform Nonlinear Model via Nonlinear Φ + Linear Models −1 0 1 −1 0 1 Φ −→ 0 0.5 1 0 0.5 1 ↓ A −1 0 1 −1 0 1 Φ−1 ←− Φ −→ 0 0.5 1 0 0.5 1 two choices: • feature transform Φ • linear model A, not just binary classification Pandora’s box :-): can now freely do quadratic PLA, quadratic regression, cubic regression, . . ., polynomial regression Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 59/128
  • 74. Fundamental Machine Learning Models Nonlinear Transform Feature Transform Φ Φ −→ Average Intensity Symmetry not 1 1 ↓ A Φ−1 ←− Φ −→ Average Intensity Symmetry more generally, not just polynomial: raw (pixels) domain knowledge −→ concrete (intensity, symmetry) the force, too good to be true? :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 60/128
  • 75. Fundamental Machine Learning Models Nonlinear Transform Computation/Storage Price Q-th order polynomial transform: ΦQ(x) = 1, x1, x2, . . . , xd , x2 1 , x1x2, . . . , x2 d , . . . , xQ 1 , xQ−1 1 x2, . . . , xQ d 1 ˜w0 + ˜d others dimensions = # ways of ≤ Q-combination from d kinds with repetitions = Q+d Q = Q+d d = O Qd = efforts needed for computing/storing z = ΦQ(x) and ˜w Q large =⇒ difficult to compute/store AND curve too complicated Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 61/128
  • 76. Fundamental Machine Learning Models Nonlinear Transform Generalization Issue Φ1 (original x) which one do you prefer? :-) • Φ1 ‘visually’ preferred • Φ4: Ein(g) = 0 but overkill Φ4 how to pick Q? model selection (to be discussed) important Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 62/128
  • 77. Fundamental Machine Learning Models Decision Tree Fundamental Machine Learning Models :: Decision Tree Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 63/128
  • 78. Fundamental Machine Learning Models Decision Tree Decision Tree for Watching MOOC Lectures G(x) = T t=1 qt (x) · gt (x) • base hypothesis gt (x): leaf at end of path t, a constant here • condition qt (x): is x on path t? • usually with simple internal nodes quitting time? has a date? N true Y false < 18:30 Y between deadline? N > 2 days Y between N < −2 days > 21:30 decision tree: arguably one of the most human-mimicking models Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 64/128
  • 79. Fundamental Machine Learning Models Decision Tree Recursive View of Decision Tree Path View: G(x) = T t=1 x on path t · leaft (x) quitting time? has a date? N true Y false < 18:30 Y between deadline? N > 2 days Y between N < −2 days > 21:30 Recursive View G(x) = C c=1 b(x) = c · Gc(x) • G(x): full-tree hypothesis • b(x): branching criteria • Gc(x): sub-tree hypothesis at the c-th branch tree = (root, sub-trees), just like what your data structure instructor would say :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 65/128
  • 80. Fundamental Machine Learning Models Decision Tree A Basic Decision Tree Algorithm G(x) = C c=1 b(x) = c Gc(x) function DecisionTree data D = {(xn, yn)}N n=1 if termination criteria met return base hypothesis gt (x) else 1 learn branching criteria b(x) 2 split D to C parts Dc = {(xn, yn): b(xn) = c} 3 build sub-tree Gc ← DecisionTree(Dc) 4 return G(x) = C c=1 b(x) = c Gc(x) four choices: number of branches, branching criteria, termination criteria, & base hypothesis Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 66/128
  • 81. Fundamental Machine Learning Models Decision Tree Classification and Regression Tree (C&RT) function DecisionTree(data D = {(xn, yn)}N n=1) if termination criteria met return base hypothesis gt (x) else ... 2 split D to C parts Dc = {(xn, yn): b(xn) = c} choices • C = 2 (binary tree) • gt (x) = Ein-optimal constant • binary/multiclass classification (0/1 error): majority of {yn} • regression (squared error): average of {yn} • branching: threshold some selected dimension • termination: fully-grown, or better pruned disclaimer: C&RT here is based on selected components of CARTTM of California Statistical Software Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 67/128
  • 82. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 83. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 84. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 85. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 86. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 87. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 88. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 89. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 90. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 91. Fundamental Machine Learning Models Decision Tree A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 92. Fundamental Machine Learning Models Decision Tree A Simple Data Set C&RT C&RT: ‘divide-and-conquer’ Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 68/128
  • 93. Fundamental Machine Learning Models Decision Tree Practical Specialties of C&RT • human-explainable • multiclass easily • categorical features easily • missing features easily • efficient non-linear training (and testing) —almost no other learning model share all such specialties, except for other decision trees another popular decision tree algorithm: C4.5, with different choices of heuristics Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 69/128
  • 94. Fundamental Machine Learning Models Decision Tree Mini-Summary Fundamental Machine Learning Models Linear Regression analytic solution by pseudo inverse Logistic Regression minimize cross-entropy error with gradient descent Nonlinear Transform the secrete ‘force’ to enrich your model Decision Tree human-like explainable model learned recursively Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 70/128
  • 95. Hazard of Overfitting Roadmap Hazard of Overfitting Overfitting Data Manipulation and Regularization Validation Principles of Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 71/128
  • 96. Hazard of Overfitting Overfitting Hazard of Overfitting :: Overfitting Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 72/128
  • 97. Hazard of Overfitting Overfitting Theoretical Foundation of Statistical Learning if training and testing from same distribution, with a high probability, Eout(g) test error ≤ Ein(g) training error + 8 N ln 4(2N)dVC(H) δ Ω:price of using H in-sample error model complexity out-of-sample error VC dimension, dvc Error d∗ vc • dVC(H): complexity of H, ≈ # of parameters to describe H • dVC ↑: Ein ↓ but Ω ↑ • dVC ↓: Ω ↓ but Ein ↑ • best d∗ VC in the middle powerful H not always good! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 73/128
  • 98. Hazard of Overfitting Overfitting Bad Generalization • regression for x ∈ R with N = 5 examples • target f(x) = 2nd order polynomial • label yn = f(xn) + very small noise • linear regression in Z-space + Φ = 4th order polynomial • unique solution passing all examples =⇒ Ein(g) = 0 • Eout(g) huge x y Data Target Fit bad generalization: low Ein, high Eout Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 74/128
  • 99. Hazard of Overfitting Overfitting Bad Generalization and Overfitting • take dVC = 1126 for learning: bad generalization —(Eout - Ein) large • switch from dVC = d∗ VC to dVC = 1126: overfitting —Ein ↓, Eout ↑ • switch from dVC = d∗ VC to dVC = 1: underfitting —Ein ↑, Eout ↑ in-sample error model complexity out-of-sample error VC dimension, dvc Error d∗ vc bad generalization: low Ein, high Eout; overfitting: lower Ein, higher Eout Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 75/128
  • 100. Hazard of Overfitting Overfitting Cause of Overfitting: A Driving Analogy x y ‘good fit’ =⇒ x y Data Target Fit overfit learning driving overfit commit a car accident use excessive dVC ‘drive too fast’ noise bumpy road limited data size N limited observations about road condition let’s ‘visualize’ overfitting Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 76/128
  • 101. Hazard of Overfitting Overfitting Impact of Noise and Data Size impact of σ2 versus N: stochastic noise Number of Data Points, N NoiseLevel,σ2 80 100 120 -0.2 -0.1 0 0.1 0.2 0 1 2 reasons of serious overfitting: data size N ↓ overfit ↑ stochastic noise ↑ overfit ↑ overfitting ‘easily’ happens (more on ML Foundations, Lecture 13) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 77/128
  • 102. Hazard of Overfitting Overfitting Linear Model First in-sample error model complexity out-of-sample error VC dimension, dvc Error d∗ vc • tempting sin: use H1126, low Ein(g1126) to fool your boss —really? :-( a dangerous path of no return • safe route: H1 first • if Ein(g1) good enough, live happily thereafter :-) • otherwise, move right of the curve with nothing lost except ‘wasted’ computation linear model first: simple, efficient, safe, and workable! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 78/128
  • 103. Hazard of Overfitting Overfitting Driving Analogy Revisited learning driving overfit commit a car accident use excessive dVC ‘drive too fast’ noise bumpy road limited data size N limited observations about road condition start from simple model drive slowly data cleaning/pruning use more accurate road information data hinting exploit more road information regularization put the brakes validation monitor the dashboard all very practical techniques to combat overfitting Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 79/128
  • 104. Hazard of Overfitting Data Manipulation and Regularization Hazard of Overfitting :: Data Manipulation and Regularization Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 80/128
  • 105. Hazard of Overfitting Data Manipulation and Regularization Data Cleaning/Pruning • if ‘detect’ the outlier 5 at the top by • too close to other ◦, or too far from other × • wrong by current classifier • . . . • possible action 1: correct the label (data cleaning) • possible action 2: remove the example (data pruning) possibly helps, but effect varies Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 81/128
  • 106. Hazard of Overfitting Data Manipulation and Regularization Data Hinting • slightly shifted/rotated digits carry the same meaning • possible action: add virtual examples by shifting/rotating the given digits (data hinting) possibly helps, but watch out not to steal the thunder Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 82/128
  • 107. Hazard of Overfitting Data Manipulation and Regularization Regularization: The Magic x y ‘regularized fit’ ⇐= x y Data Target Fit overfit • idea: ‘step back’ from 10-th order polynomials to 2-nd order ones H0 H1 H2 H3 · · · • name history: function approximation for ill-posed problems how to step back? Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 83/128
  • 108. Hazard of Overfitting Data Manipulation and Regularization Step Back by Minimizing the Augmented Error Augmented Error Eaug(w) = Ein(w) + λ N wT w VC Bound Eout(w) ≤ Ein(w) + Ω(H) • regularizer wT w : complexity of a single hypothesis • generalization price Ω(H): complexity of a hypothesis set • if λ N Ω(w) ‘represents’ Ω(H) well, Eaug is a better proxy of Eout than Ein minimizing Eaug: (heuristically) operating with the better proxy; (technically) enjoying flexibility of whole H Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 84/128
  • 109. Hazard of Overfitting Data Manipulation and Regularization The Optimal λ stochastic noise Regularization Parameter, λ ExpectedEout σ2 = 0 σ2 = 0.25 σ2 = 0.5 0.5 1 1.5 2 0.25 0.5 0.75 1 • more noise ⇐⇒ more regularization needed —more bumpy road ⇐⇒ putting brakes more • noise unknown—important to make proper choices how to choose? validation! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 85/128
  • 110. Hazard of Overfitting Validation Hazard of Overfitting :: Validation Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 86/128
  • 111. Hazard of Overfitting Validation Model Selection Problem H1 which one do you prefer? :-) H2 • given: M models H1, H2, . . . , HM, each with corresponding algorithm A1, A2, . . . , AM • goal: select Hm∗ such that gm∗ = Am∗ (D) is of low Eout(gm∗ ) • unknown Eout, as always :-) • arguably the most important practical problem of ML how to select? visually? —no, can you really visualize R1126? :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 87/128
  • 112. Hazard of Overfitting Validation Validation Set Dval Ein(h) Eval(h) ↑ ↑ D size N → Dtrain size N−K ∪ Dval size K ↓ ↓ gm = Am(D) g− m = Am(Dtrain) • Dval ⊂ D: called validation set—‘on-hand’ simulation of test set • to connect Eval with Eout: select K examples from D at random • to make sure Dval ‘clean’: feed only Dtrain to Am for model selection Eout(g− m) ≤ Eval(g− m) + ‘small Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 88/128
  • 113. Hazard of Overfitting Validation Model Selection by Best Eval m∗ = argmin 1≤m≤M (Em = Eval(Am(Dtrain))) • generalization guarantee for all m: Eout(g− m) ≤ Eval(g− m) + ‘small • heuristic gain from N − K to N: Eout    gm∗ Am∗ (D)    ≤ Eout    g− m∗ Am∗ (Dtrain)    H1 H2 HM g1 g2 gM· · · · · · E1 · · · EM Dval Dtrain gm∗ E2 (Hm∗ , Em∗ ) pick the best D Eout(gm∗ ) ≤ Eout(g− m∗ ) ≤ Eval(g− m∗ ) + ‘small Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 89/128
  • 114. Hazard of Overfitting Validation V-fold Cross Validation making validation more stable • V-fold cross-validation: random-partition of D to V equal parts, D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 train trainvalidate D take V − 1 for training and 1 for validation orderly Ecv(H, A) = 1 V V v=1 E (v) val (g− v ) • selection by Ecv: m∗ = argmin 1≤m≤M (Em = Ecv(Hm, Am)) practical rule of thumb: V = 10 Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 90/128
  • 115. Hazard of Overfitting Validation Final Words on Validation ‘Selecting’ Validation Tool • V-Fold generally preferred over single validation if computation allows • 5-Fold or 10-Fold generally works well Nature of Validation • all training models: select among hypotheses • all validation schemes: select among finalists • all testing methods: just evaluate validation still more optimistic than testing do not fool yourself and others :-), report test result, not best validation result Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 91/128
  • 116. Hazard of Overfitting Principles of Learning Hazard of Overfitting :: Principles of Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 92/128
  • 117. Hazard of Overfitting Principles of Learning Occam’s Razor for Learning The simplest model that fits the data is also the most plausible. which one do you prefer? :-) My KISS Principle: Keep It Simple, XXXXStupid Safe Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 93/128
  • 118. Hazard of Overfitting Principles of Learning Sampling Bias If the data is sampled in a biased way, learning will pro- duce a similarly biased outcome. philosophical explanation: study Math hard but test English: no strong test guarantee A True Personal Story • Netflix competition for movie recommender system: 10% improvement = 1M US dollars • on my own validation data, first shot, showed 13% improvement • why am I still teaching here? :-) validation: random examples within data; test: “last” user records “after” data practical rule of thumb: match test scenario as much as possible Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 94/128
  • 119. Hazard of Overfitting Principles of Learning Visual Data Snooping If a data set has affected any step in the learning pro- cess, its ability to assess the outcome has been com- promised. Visualize X = R2 • full Φ2: z = (1, x1, x2, x2 1 , x1x2, x2 2 ), dVC = 6 • or z = (1, x2 1 , x2 2 ), dVC = 3, after visualizing? • or better z = (1, x2 1 + x2 2 ) , dVC = 2? • or even better z = sign(0.6 − x2 1 − x2 2 ) ? —careful about your brain’s ‘model complexity’ −1 0 1 −1 0 1 if you torture the data long enough, it will confess :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 95/128
  • 120. Hazard of Overfitting Principles of Learning Dealing with Data Snooping • truth—very hard to avoid, unless being extremely honest • extremely honest: lock your test data in safe • less honest: reserve validation and use cautiously • be blind: avoid making modeling decision by data • be suspicious: interpret research results (including your own) by proper feeling of contamination one secret to winning KDDCups: careful balance between data-driven modeling (snooping) and validation (no-snooping) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 96/128
  • 121. Hazard of Overfitting Principles of Learning Mini-Summary Hazard of Overfitting Overfitting the ‘accident’ that is everywhere in learning Data Manipulation and Regularization clean data, synthetic data, or augmented error Validation honestly simulate testing procedure for proper model selection Principles of Learning simple model, matching test scenario, and no snooping Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 97/128
  • 122. Modern Machine Learning Models Roadmap Modern Machine Learning Models Support Vector Machine Random Forest Adaptive (or Gradient) Boosting Deep Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 98/128
  • 123. Modern Machine Learning Models Support Vector Machine Modern Machine Learning Models :: Support Vector Machine Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 99/128
  • 124. Modern Machine Learning Models Support Vector Machine Motivation: Large-Margin Separating Hyperplane max w fatness(w) subject to w classifies every (xn, yn) correctly fatness(w) = min n=1,...,N distance(xn, w) • fatness: formally called margin • correctness: yn = sign(wT xn) initial goal: find largest-margin separating hyperplane Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 100/128
  • 125. Modern Machine Learning Models Support Vector Machine Motivation: Large-Margin Separating Hyperplane max w margin(w) subject to every ynwT xn 0 margin(w) = min n=1,...,N distance(xn, w) • fatness: formally called margin • correctness: yn = sign(wT xn) initial goal: find largest-margin separating hyperplane Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 100/128
  • 126. Modern Machine Learning Models Support Vector Machine Soft-Margin Support Vector Machine initial goal: find largest-margin separating hyperplane • soft-margin (practical) SVM: not insisting on separating: • minimize large-margin regularizer + C· separation error, • just like regularization with augmented error min Eaug(w) = Ein(w) + λ N wT w • two forms: • finding hyperplane in original space (linear first!!) LIBLINEAR www.csie.ntu.edu.tw/~cjlin/liblinear • or in mysterious transformed space hidden in ‘kernels’ LIBSVM www.csie.ntu.edu.tw/~cjlin/libsvm linear: ‘best’ linear classification model; non-linear: ‘leading’ non-linear classification model for mid-sized data Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 101/128
  • 127. Modern Machine Learning Models Support Vector Machine Hypothesis of Gaussian SVM Gaussian kernel K(x, x ) = exp −γ x − x 2 gSVM(x) = sign SV αnynK(xn, x) + b = sign SV αnynexp −γ x − xn 2 + b • linear combination of Gaussians centered at SVs xn • also called Radial Basis Function (RBF) kernel Gaussian SVM: find αn to combine Gaussians centered at xn achieve large margin in infinite-dim. space Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 102/128
  • 128. Modern Machine Learning Models Support Vector Machine Support Vector Mechanism large-margin hyperplanes + higher-order transforms with kernel trick + noise tolerance of soft-margin # not many boundary sophisticated • transformed vector z = Φ(x) =⇒ efficient kernel K(x, x ) • store optimal w =⇒ store a few SVs and αn new possibility by Gaussian SVM: infinite-dimensional linear classification, with generalization ‘guarded by’ large-margin :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 103/128
  • 129. Modern Machine Learning Models Support Vector Machine Practical Need: Model Selection replacemen • large γ =⇒ sharp Gaussians =⇒ ‘overfit’? • complicated even for (C, γ) of Gaussian SVM • more combinations if including other kernels or parameters how to select? validation :-) Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 104/128
  • 130. Modern Machine Learning Models Support Vector Machine Step-by-step Use of SVM strongly recommended: ‘A Practical Guide to Support Vector Classification’ http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf 1 scale each feature of your data to a suitable range (say, [−1, 1]) 2 use a Gaussian RBF kernel 3 use cross validation and grid search to choose good (γ, C) 4 use the best (γ, C) on your data 5 do testing with the learned SVM classifier all included in easy.py of LIBSVM Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 105/128
  • 131. Modern Machine Learning Models Random Forest Modern Machine Learning Models :: Random Forest Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 106/128
  • 132. Modern Machine Learning Models Random Forest Random Forest (RF) random forest (RF) = bagging (random sampling) + fully-grown CRT decision tree function RandomForest(D) For t = 1, 2, . . . , T 1 request size-N data ˜Dt by bootstrapping with D 2 obtain tree gt by DTree( ˜Dt ) return G = Uniform({gt }) function DTree(D) if termination return base gt else 1 learn b(x) and split D to Dc by b(x) 2 build Gc ← DTree(Dc) 3 return G(x) = C c=1 b(x) = c Gc(x) • highly parallel/efficient to learn • inherit pros of CRT • eliminate cons of fully-grown tree Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 107/128
  • 133. Modern Machine Learning Models Random Forest Feature Selection for x = (x1, x2, . . . , xd ), want to remove • redundant features: like keeping one of ‘age’ and ‘full birthday’ • irrelevant features: like insurance type for cancer prediction and only ‘learn’ subset-transform Φ(x) = (xi1 , xi2 , xid ) with d d for g(Φ(x)) advantages: • efficiency: simpler hypothesis and shorter prediction time • generalization: ‘feature noise’ removed • interpretability disadvantages: • computation: ‘combinatorial’ optimization in training • overfit: ‘combinatorial’ selection • mis-interpretability decision tree: a rare model with built-in feature selection Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 108/128
  • 134. Modern Machine Learning Models Random Forest Feature Selection by Importance idea: if possible to calculate importance(i) for i = 1, 2, . . . , d then can select i1, i2, . . . , id of top-d importance importance by linear model score = wT x = d i=1 wixi • intuitive estimate: importance(i) = |wi| with some ‘good’ w • getting ‘good’ w: learned from data • non-linear models? often much harder but ‘easy’ feature selection in RF Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 109/128
  • 135. Modern Machine Learning Models Random Forest Feature Importance by Permutation Test idea: random test —if feature i needed, ‘random’ values of xn,i degrades performance permutation test: importance(i) = performance(D) − performance(D(p) ) with D(p) is D with {xn,i} replaced by permuted {xn,i}N n=1 permutation test: a general statistical tool that can be easily coupled with RF Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 110/128
  • 136. Modern Machine Learning Models Random Forest A Complicated Data Set gt (N = N/2) G with first t trees Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
  • 137. Modern Machine Learning Models Random Forest A Complicated Data Set gt (N = N/2) G with first t trees Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
  • 138. Modern Machine Learning Models Random Forest A Complicated Data Set gt (N = N/2) G with first t trees Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
  • 139. Modern Machine Learning Models Random Forest A Complicated Data Set gt (N = N/2) G with first t trees Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
  • 140. Modern Machine Learning Models Random Forest A Complicated Data Set gt (N = N/2) G with first t trees ‘easy yet robust’ nonlinear model Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 111/128
  • 141. Modern Machine Learning Models Adaptive (or Gradient) Boosting Modern Machine Learning Models :: Adaptive (or Gradient) Boosting Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 112/128
  • 142. Modern Machine Learning Models Adaptive (or Gradient) Boosting Apple Recognition Problem • is this a picture of an apple? • say, want to teach a class of 6 year olds • gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) (APAL stands for Apple and Pear Australia Ltd) Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster https: //flic. kr/p/jNQ55 https: //flic. kr/p/jzP1VB https: //flic. kr/p/bdy2hZ https: //flic. kr/p/51DKA8 https: //flic. kr/p/9C3Ybd nachans APAL Jo Jakeman APAL APAL https: //flic. kr/p/9XD7Ag https: //flic. kr/p/jzRe4u https: //flic. kr/p/7jwtGp https: //flic. kr/p/jzPYNr https: //flic. kr/p/jzScif Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 113/128
  • 143. Modern Machine Learning Models Adaptive (or Gradient) Boosting Apple Recognition Problem • is this a picture of an apple? • say, want to teach a class of 6 year olds • gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) Mr. Roboto. Richard North Richard North Emilian Robert Vicol Nathaniel Mc- Queen https: //flic. kr/p/i5BN85 https: //flic. kr/p/bHhPkB https: //flic. kr/p/d8tGou https: //flic. kr/p/bpmGXW https: //flic. kr/p/pZv1Mf Crystal jfh686 skyseeker Janet Hudson Rennett Stowe https: //flic. kr/p/kaPYp https: //flic. kr/p/6vjRFH https: //flic. kr/p/2MynV https: //flic. kr/p/7QDBbm https: //flic. kr/p/agmnrk Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 113/128
  • 144. Modern Machine Learning Models Adaptive (or Gradient) Boosting Our Fruit Class Begins • Teacher: Please look at the pictures of apples and non-apples below. Based on those pictures, how would you describe an apple? Michael? • Michael: I think apples are circular. (Class): Apples are circular. Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 114/128
  • 145. Modern Machine Learning Models Adaptive (or Gradient) Boosting Our Fruit Class Continues • Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina? • Tina: It looks like apples are red. (Class): Apples are somewhat circular and somewhat red. Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 115/128
  • 146. Modern Machine Learning Models Adaptive (or Gradient) Boosting Our Fruit Class Continues More • Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey? • Joey: Apples could also be green. (Class): Apples are somewhat circular and somewhat red and possibly green. Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 116/128
  • 147. Modern Machine Learning Models Adaptive (or Gradient) Boosting Our Fruit Class Ends • Teacher: Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica? • Jessica: Apples have stems at the top. (Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 117/128
  • 148. Modern Machine Learning Models Adaptive (or Gradient) Boosting Motivation • students: simple hypotheses gt (like vertical/horizontal lines) • (Class): sophisticated hypothesis G (like black curve) • Teacher: a tactic learning algorithm that directs the students to focus on key examples next: demo of such an algorithm Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 118/128
  • 149. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 150. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 151. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 152. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 153. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 154. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 155. Modern Machine Learning Models Adaptive (or Gradient) Boosting A Simple Data Set ‘Teacher’-like algorithm works! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 119/128
  • 156. Modern Machine Learning Models Adaptive (or Gradient) Boosting Putting Everything Together Gradient Boosted Decision Tree (GBDT) s1 = s2 = . . . = sN = 0 for t = 1, 2, . . . , T 1 obtain gt by A({(xn, yn − sn)}) where A is a (squared-error) regression algorithm —such as ‘weak’ CRT? 2 compute αt = OneVarLinearRegression({(gt (xn), yn − sn)}) 3 update sn ← sn + αt gt (xn) return G(x) = T t=1 αt gt (x) GBDT: ‘regression sibling’ of AdaBoost + decision tree —very popular in practice Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 120/128
  • 157. Modern Machine Learning Models Deep Learning Modern Machine Learning Models :: Deep Learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 121/128
  • 158. Modern Machine Learning Models Deep Learning Physical Interpretation of Neural Network x0 = 1 x1 x2 ... xd +1 tanh tanh w (1) ij w (2) jk w (3) kq +1 tanh tanh tanhs (2) 3 x (2) 3 • each layer: pattern feature extracted from data • how many neurons? how many layers? —more generally, what structure? • subjectively, your design! • objectively, validation, maybe? structural decisions: key issue for applying NNet Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 122/128
  • 159. Modern Machine Learning Models Deep Learning Shallow versus Deep Neural Networks shallow: few (hidden) layers; deep: many layers Shallow NNet • more efficient to train ( ) • simpler structural decisions ( ) • theoretically powerful enough ( ) Deep NNet • challenging to train (×) • sophisticated structural decisions (×) • ‘arbitrarily’ powerful ( ) • more ‘meaningful’? (see next slide) deep NNet (deep learning) became popular in recent years Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 123/128
  • 160. Modern Machine Learning Models Deep Learning Meaningfulness of Deep Learning , is it a ‘1’? ✲ ✛ is it a ‘5’? ✻ z1 z5 φ1 φ2 φ3 φ4 φ5 φ6 positive weight negative weight • ‘less burden’ for each layer: simple to complex features/patterns • natural for difficult learning task with raw features, like vision deep NNet: currently popular in vision/speech/. . . Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 124/128
  • 161. Modern Machine Learning Models Deep Learning Challenges and Key Techniques for Deep Learning • difficult structural decisions: • sometimes subjective with domain knowledge • high model complexity: • no big worries if big enough data • regularization towards noise-tolerant: like • dropout (tolerant when network corrupted) • denoising (tolerant when input corrupted) • hard optimization problem: • careful initialization to avoid bad local minimum, e.g. pre-trained models • huge computational complexity (worsen with big data): • novel hardware/architecture: like mini-batch with GPU structural and regularization techniques are consistently being developed Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 125/128
  • 162. Modern Machine Learning Models Deep Learning Structural Decision in Convolutional Neural Network node node node node node NNet • special nodes consisting of • weighted sum on local inputs (called convolution) • nonlinear transform function • down-sampling (called polling) • mimics ‘domain knowlege’ about human vision arguably the earliest success of deep learning Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 126/128
  • 163. Modern Machine Learning Models Deep Learning Mini-Summary Modern Machine Learning Models Support Vector Machine large-margin boundary ranging from linear to non-linear Random Forest uniform blending of many many decision trees Adaptive (or Gradient) Boosting keep adding simple hypotheses to gang Deep Learning neural network with deep architecture and careful design Thank you!! Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 127/128
  • 164. Hsuan-Tien Lin (Appier) Quick Tour of Machine Learning 128/128