This document discusses the changing role of human scientists in an era where metahuman science has advanced far beyond human comprehension. It outlines how human scientists have shifted from conducting original research to interpreting and analyzing the work of metahumans through hermeneutic approaches like textual analysis of publications, reverse engineering of technological artifacts, and remote sensing of research facilities. While some see these as a waste of time, the document argues they are worthwhile pursuits that continue scientific inquiry and increase human knowledge, and may even uncover applications not considered by metahumans.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
Machine Learning Introduction
1. Machine Learning: Introduction
Book reading: 2014 summer
Jinseob Kim
GSPH, SNU
October 18, 2014
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 1 / 55
2. What is Machine Learning?
ôè0 YµXì !` ˆÄ] !¨(prediction)D
X” xõÀ¥X „|.
Computer science + Statistics ??
Amazon, Google, Facebook..
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 2 / 55
3. Œ IT0Å `0ÄYµ' Ñ http://www.dt.co.kr/contents.
html?article_no=2014062002010960718002
8Ä” À xõÀ¥ ô 6pìì è$X m@ `]'
http://vip.mk.co.kr/news/view/21/20/1178659.html
MS t|°Ü, `8àìÝ' tÄä
http://www.bloter.net/archives/196341
$t” 5 ü” 0 ü `%ìÝ'
http://www.wikitree.co.kr/main/news_view.php?id=157174
xõÀ¥ Ü lX èt¼ ¸ http://weekly.chosun.
com/client/news/viw.asp?nNewsNumb=002311100009ctcd=C02
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 3 / 55
4. Overview
Contents
1 Overview
Interpretation vs Prediction
Types of Machine Learning
Techniques
2 Book Reading Plan
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 4 / 55
5. Overview Interpretation vs Prediction
Objective of statistics
1 ÀÝX U¥, Causal inference
µÄY Pearson: äX ÄT` …D Xì..
2 X¬°
µÄY R.A Fisher: ¥ 1¥t ‹@ DÌ Ý
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 5 / 55
6. Overview Interpretation vs Prediction
Statistics in Epidemiology
Causal inference: Ðxt 4Çx?
tt ˜” ¨t ñtä. xüÄ ”`.
è ¨ 8.
ŽÀX èÄ ”(Kilometer VS meter, centering issue)
30. : Hazard Ratio(HR)
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 14 / 55
31. Overview Interpretation vs Prediction
Hazard Ratio
t ¸Xä. Odd Ratio .
But, t Ît ä´ä.
Ýt õ¡t Ä°t ´5ä.
Conditional Logistic Regression..
PredictionÐÄ Cox| àÑ` D”” Æä.
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 15 / 55
32. Overview Interpretation vs Prediction
Alternatives
Yi : Time of event
Not censored
p(yi ji ; 2) = (22)1
2 expf
(yi i )2
22
g
Censored
p(yi ti ji ; 2) =
Z 1
ti
(22)1
2 expf
(yi i )2
22
g@yi = (
i ti
)
Ü„ìX CDF èˆ ! Ä°t }ä!!
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 16 / 55
33. Overview Interpretation vs Prediction
Example3: Correlation Structure
Example: Pedigree structure à$t|X˜?
1 Genome-Wide Association Study(GWAS): Important
34. X s.e ä. ! p-value ä.
2 Prediction model: Not important
36. Overview Interpretation vs Prediction
Our data
SNP Chromosome Position A1 A2 N Beta SE P Beta FASTA SE FASTA P FASTA
rs2801233 21 13525448 T C 1799 -2.78 2.45 0.258 -3.05 2.62 0.244
rs2801294 21 13557024 C G 1830 -2.12 2.78 0.447 -1.94 2.95 0.510
rs2260895 21 13564335 C T 1815 -3.04 2.77 0.273 -2.79 2.94 0.343
rs2821796 21 13571669 A C 1833 -6.13 2.45 0.012 -6.29 2.59 0.015
rs2742182 21 13587844 T C 1819 -2.29 2.77 0.407 -2.18 2.93 0.458
rs2259207 21 13598778 T C 1804 -3.35 3.03 0.269 -4.45 3.17 0.160
rs2259403 21 13615252 G A 1818 -6.07 2.48 0.014 -6.08 2.60 0.020
rs2821847 21 13689440 A G 1817 -2.10 2.87 0.463 -2.10 2.98 0.482
rs2821849 21 13691411 T C 1816 -1.74 2.72 0.522 -1.13 2.82 0.688
rs2747265 21 13696956 C G 1819 -18.96 10.75 0.078 -14.97 11.27 0.184
Table. No pedigree VS pedigree : TG-GWAS
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 18 / 55
37. Overview Interpretation vs Prediction
Figure. A representation of the tradeo between
exibility and interpretability,
using dierent statistical learning methods. In general, as the
exibility of a
method increases, its interpretability decreases[3]
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 19 / 55
39. Overview Interpretation vs Prediction
Human VS metahuman[1]
Ted Chiang : SF Œ$
TÀ xX(xõÀ¥)X UÄx Àݘ¬¥%.
Human science: TÀ xX ¸ ƒäD tX” ÄX .
TÀ xXX |8D ˆíX” ƒt human science..
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 21 / 55
40. Overview Types of Machine Learning
0ÄYµX …X
Supervised learning
Labeled data
Regression, classi
41. cation...
Unsupervised learning
Unlabeled data
Semi-supervised learning
Labeled + Unlabeled data (ex: censored data, missing data)
Reinforcement learning
Reward
Etc: : :
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 22 / 55
42. Overview Types of Machine Learning
http:
//www.astroml.org/sklearn_tutorial/general_concepts.html
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 23 / 55
43. Overview Types of Machine Learning
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 24 / 55
44. Overview Types of Machine Learning
Shah A R et al. Bioinformatics 2008;24:783-790[8]
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 25 / 55
45. Overview Types of Machine Learning
http://www.cns.atr.jp/cnb/crp/
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 26 / 55
46. Overview Types of Machine Learning
http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 27 / 55
47. Overview Techniques
Techniques
k-Nearest Neighbors(kNN)
Neural Network
K-Means Clustering
Principal Component Analysis
Tree(Bagging, Boosting, Ensemble)
Support Vector Machine
Naive Bayes
Etc..
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 28 / 55
48. Overview Techniques
k-Nearest Neighbors(kNN)
useR 2014 tutorial: Applied Predictive Modelling
http://appliedpredictivemodeling.com/s/Applied_Predictive_
Modeling_in_R.pdf
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 29 / 55
49. Overview Techniques
Neural Network
Human brain VS Computer
3431 3324 =??
@ à‘t lÄ, L1xÝ, 8xÝ
Sequential VS Parallel
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 30 / 55
56. Overview Techniques
Paper[5, 2]
Building High-level Features
Using Large Scale Unsupervised Learning
Quoc V. Le quocle@cs.stanford.edu
Marc’Aurelio Ranzato ranzato@google.com
Rajat Monga rajatmonga@google.com
Matthieu Devin mdevin@google.com
Kai Chen kaichen@google.com
Greg S. Corrado gcorrado@google.com
Jeff Dean jeff@google.com
Andrew Y. Ng ang@cs.stanford.edu
Abstract
We consider the problem of building high-level,
class-specific feature detectors from
only unlabeled data. For example, is it pos-sible
to learn a face detector using only unla-beled
images? To answer this, we train a 9-
layered locally connected sparse autoencoder
with pooling and local contrast normalization
on a large dataset of images (the model has
1 billion connections, the dataset has 10 mil-lion
200x200 pixel images downloaded from
the Internet). We train this network using
model parallelism and asynchronous SGD on
a cluster with 1,000 machines (16,000 cores)
for three days. Contrary to what appears to
be a widely-held intuition, our experimental
results reveal that it is possible to train a face
detector without having to label images as
containing a face or not. Control experiments
show that this feature detector is robust not
only to translation but also to scaling and
out-of-plane rotation. We also find that the
same network is sensitive to other high-level
concepts such as cat faces and human bod-ies.
Starting with these learned features, we
trained our network to obtain 15.8% accu-racy
in recognizing 22,000 object categories
from ImageNet, a leap of 70% relative im-provement
over the previous state-of-the-art.
Appearing in Proceedings of the 29 th International Confer-ence
on Machine Learning, Edinburgh, Scotland, UK, 2012.
Copyright 2012 by the author(s)/owner(s).
1. Introduction
The focus of this work is to build high-level, class-specific
feature detectors from unlabeled images. For
instance, we would like to understand if it is possible to
build a face detector from only unlabeled images. This
approach is inspired by the neuroscientific conjecture
that there exist highly class-specific neurons in the hu-man
brain, generally and informally known as “grand-mother
neurons.” The extent of class-specificity of
neurons in the brain is an area of active investigation,
but current experimental evidence suggests the possi-bility
that some neurons in the temporal cortex are
highly selective for object categories such as faces or
hands (Desimone et al., 1984), and perhaps even spe-cific
people (Quiroga et al., 2005).
Contemporary computer vision methodology typically
emphasizes the role of labeled data to obtain these
class-specific feature detectors. For example, to build
a face detector, one needs a large collection of images
labeled as containing faces, often with a bounding box
around the face. The need for large labeled sets poses
a significant challenge for problems where labeled data
are rare. Although approaches that make use of inex-pensive
unlabeled data are often preferred, they have
not been shown to work well for building high-level
features.
This work investigates the feasibility of building high-level
features from only unlabeled data. A positive
answer to this question will give rise to two significant
results. Practically, this provides an inexpensive way
to develop features from unlabeled data. But perhaps
more importantly, it answers an intriguing question as
to whether the specificity of the “grandmother neuron”
could possibly be learned from unlabeled data. Infor-mally,
this would suggest that it is at least in principle
possible that a baby learns to group faces into one class
Deep learning with COTS HPC systems
Adam Coates acoates@cs.stanford.edu
Brody Huval brodyh@stanford.edu
Tao Wang twangcat@stanford.edu
David J. Wu dwu4@cs.stanford.edu
Andrew Y. Ng ang@cs.stanford.edu
Stanford University Computer Science Dept., 353 Serra Mall, Stanford, CA 94305 USA
Bryan Catanzaro bcatanzaro@nvidia.com
NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA 95050
Abstract
Scaling up deep learning algorithms has been
shown to lead to increased performance in
benchmark tasks and to enable discovery of
complex high-level features. Recent efforts
to train extremely large networks (with over
1 billion parameters) have relied on cloud-like
computing infrastructure and thousands
of CPU cores. In this paper, we present tech-nical
details and results from our own sys-tem
based on Commodity Off-The-Shelf High
Performance Computing (COTS HPC) tech-nology:
a cluster of GPU servers with Infini-band
interconnects and MPI. Our system is
able to train 1 billion parameter networks on
just 3 machines in a couple of days, and we
show that it can scale to networks with over
11 billion parameters using just 16 machines.
As this infrastructure is much more easily
marshaled by others, the approach enables
much wider-spread research with extremely
large neural networks.
1. Introduction
A significant amount of effort has been put into de-veloping
deep learning systems that can scale to very
large models and large training sets. With each leap
in scale new results proliferate: large models in the
literature are now top performers in supervised vi-sual
recognition tasks (Krizhevsky et al., 2012; Cire-san
et al., 2012; Le et al., 2012), and can even learn
Proceedings of the 30 th International Conference on Ma-chine
Learning, Atlanta, Georgia, USA, 2013. JMLR:
WCP volume 28. Copyright 2013 by the author(s).
to detect objects when trained from unlabeled im-ages
alone (Coates et al., 2012; Le et al., 2012). The
very largest of these systems has been constructed by
Le et al. (Le et al., 2012) and Dean et al. (Dean et al.,
2012), which is able to train neural networks with over
1 billion trainable parameters. While such extremely
large networks are potentially valuable objects of AI
research, the expense to train them is overwhelming:
the distributed computing infrastructure (known as
“DistBelief”) used for the experiments in (Le et al.,
2012) manages to train a neural network using 16000
CPU cores (in 1000 machines) in just a few days, yet
this level of resource is likely beyond those available
to most deep learning researchers. Less clear still is
how to continue scaling significantly beyond this size
of network. In this paper we present an alternative
approach to training such networks that leverages in-expensive
computing power in the form of GPUs and
introduces the use of high-speed communications in-frastructure
to tightly coordinate distributed gradient
computations. Our system trains neural networks at
scales comparable to DistBelief with just 3 machines.
We demonstrate the ability to train a network with
more than 11 billion parameters—6.5 times larger than
the model in (Dean et al., 2012)—in only a few days
with 2% as many machines.
Buoyed by many empirical successes (Uetz Behnke,
2009; Raina et al., 2009; Ciresan et al., 2012;
Krizhevsky, 2010; Coates et al., 2011) much deep
learning research has focused on the goal of building
larger models with more parameters. Though some
techniques (such as locally connected networks (Le-
Cun et al., 1989; Raina et al., 2009; Krizhevsky, 2010),
and improved optimizers (Martens, 2010; Le et al.,
2011)) have enabled scaling by algorithmic advan-tage,
another main approach has been to achieve scale
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 35 / 55
71. Book Reading Plan
Contents
1 Overview
Interpretation vs Prediction
Types of Machine Learning
Techniques
2 Book Reading Plan
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 49 / 55
77. Book Reading Plan
Reference I
[1] Chiang, T. (2000). Catching crumbs from the table. Nature, 405(6786):517{517.
[2] Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., and Andrew, N. (2013).
Deep learning with cots hpc systems. In Proceedings of The 30th International
Conference on Machine Learning, pages 1337{1345.
[3] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to
statistical learning. Springer.
[4] Kuhn, M. and Johnson, K. (2013). Applied predictive modeling. Springer.
[5] Le, Q. V. (2013). Building high-level features using large scale unsupervised
learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE
International Conference on, pages 8595{8598. IEEE.
[6] Lu, D. and Xu, S. (2013). Principal component analysis reveals the 1000 genomes
project does not suciently cover the human genetic diversity in asia. Frontiers in
genetics, 4.
[7] Maltarollo, V. G., Honorio, K. M., and da Silva, A. B. F. (2013). Applications of
arti
78. cial neural networks in chemical problems.
[8] Shah, A. R., Oehmen, C. S., and Webb-Robertson, B.-J. (2008). Svm-hustle|an
iterative semi-supervised machine learning approach for pairwise protein remote
homology detection. Bioinformatics, 24(6):783{790.
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 54 / 55
79. Book Reading Plan
END
Email : secondmath85@gmail.com
Oce: (02)880-2743
H.P: 010-9192-5385
Jinseob Kim (GSPH, SNU) Machine Learning: Introduction October 18, 2014 55 / 55