SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017
Human-­‐in-­‐a-­‐loop:	
  
design	
  pattern	
  for	
  managing	
  teams	
  that	
  leverage	
  ML
Paco	
  Nathan	
  	
  @pacoid	
  
Director,	
  Learning	
  Group	
  @	
  O’Reilly	
  Media	
  
Big	
  Data	
  Spain,	
  Madrid	
  	
  2017-­‐11-­‐16
slides:	
  goo.gl/ba85nF
Framing
Imagine	
  having	
  a	
  mostly-­‐automated	
  system	
  where	
  

people	
  and	
  machines	
  collaborate	
  together…	
  
May	
  sound	
  a	
  bit	
  Sci-­‐Fi,	
  though	
  arguably	
  commonplace.	
  

One	
  challenge	
  is	
  whether	
  we	
  can	
  advance	
  beyond	
  just	
  
handling	
  rote	
  tasks.	
  	
  
Instead	
  of	
  simply	
  running	
  code	
  libraries,	
  can	
  machines	
  

make	
  difficult	
  decisions,	
  exercise	
  judgement	
  in	
  complex	
  
situations?	
  	
  
Can	
  we	
  build	
  systems	
  in	
  which	
  people	
  who	
  aren’t	
  

AI	
  experts	
  can	
  “teach”	
  machines	
  to	
  perform	
  complex	
  

work	
  –	
  based	
  on	
  examples,	
  not	
  code?
Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017
UX	
  for	
  content	
  discovery:	
  
▪ partly	
  generated	
  +	
  curated	
  by	
  people	
  
▪ partly	
  generated	
  +	
  curated	
  by	
  AI	
  apps
AI	
  in	
  Media
▪ content	
  which	
  can	
  represented	
  as	
  

text	
  can	
  be	
  parsed	
  by	
  NLP,	
  then	
  
manipulated	
  by	
  available	
  AI	
  tooling	
  	
  
▪ labeled	
  images	
  get	
  really	
  interesting	
  
▪ assumption:	
  text	
  or	
  images	
  –	
  within	
  

a	
  context	
  –	
  have	
  inherent	
  structure	
  
▪ representation	
  of	
  that	
  kind	
  of	
  structure	
  
is	
  rare	
  in	
  the	
  Media	
  vertical	
  –	
  so	
  far
5
{"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49],
"take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l
"NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [
"few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0
"often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11,
"people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, "
"first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou
"about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they"
"they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and"
0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69],
"in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of",
0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31,
"existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS
76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people",
1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80
"'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati
"virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l
"like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", "
"CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also",
"also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as"
0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN"
93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95],
"tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2,
"be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38,
"hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [
"virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1
103]], "id": "001.video197359", "sha1":
"4b69cf60f0497887e3776619b922514f2e5b70a8"}
AI	
  in	
  Media
6
{"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"}
{"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"}
{"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"}
{"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"}
{"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"}
{"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"}
{"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"}
{"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"}
{"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"}
Transcript: let's take a look at a few examples often when
people are first learning about Docker they try and put it in
one of a few existing categories sometimes people think it's
a virtualization tool like VMware or virtualbox also known as
a hypervisor these are tools which are emulating hardware for
virtual software
Confidence: 0.973419129848
39 KUBERNETES
0.8747 coreos
0.8624 etcd
0.8478 DOCKER CONTAINERS
0.8458 mesos
0.8406 DOCKER
0.8354 DOCKER CONTAINER
0.8260 KUBERNETES CLUSTER
0.8258 docker image
0.8252 EC2
0.8210 docker hub
0.8138 OPENSTACK
orm:Docker a orm:Vendor;
a orm:Container;
a orm:Open_Source;
a orm:Commercial_Software;
owl:sameAs dbr:Docker_%28software%29;
skos:prefLabel "Docker"@en;
Knowledge	
  Graph
▪ used	
  to	
  construct	
  an	
  ontology	
  about	
  
technology,	
  based	
  on	
  learning	
  
materials	
  from	
  200+	
  publishers	
  
▪ uses	
  SKOS	
  as	
  a	
  foundation,	
  ties	
  into	
  

US	
  Library	
  of	
  Congress	
  and	
  DBpedia	
  

as	
  upper	
  ontologies	
  
▪ primary	
  structure	
  is	
  “human	
  scale”,	
  

used	
  as	
  control	
  points	
  
▪ majority	
  (>90%)	
  of	
  the	
  graph	
  

comes	
  from	
  machine	
  generated	
  

data	
  products
7
AI	
  is	
  real,	
  but	
  why	
  now?
▪ Big	
  Data:	
  machine	
  data	
  (1997-­‐ish)	
  
▪ Big	
  Compute:	
  cloud	
  computing	
  (2006-­‐ish)	
  
▪ Big	
  Models:	
  deep	
  learning	
  (2009-­‐ish)	
  
The	
  confluence	
  of	
  three	
  factors	
  created	
  a	
  business	
  

environment	
  where	
  AI	
  could	
  become	
  mainstream	
  
What	
  else	
  is	
  needed?
8
Background:	
  

helping	
  machines	
  learn
Machine	
  learning
supervised	
  ML:	
  
▪ take	
  a	
  dataset	
  where	
  each	
  element	
  
has	
  a	
  label	
  
▪ train	
  models	
  on	
  a	
  portion	
  of	
  the	
  
data	
  to	
  predict	
  the	
  labels,	
  then	
  

evaluate	
  on	
  the	
  holdout	
  
▪ deep	
  learning	
  is	
  a	
  popular	
  example,	
  

but	
  only	
  if	
  you	
  have	
  lots	
  of	
  labeled	
  
training	
  data	
  available
Machine	
  learning
unsupervised	
  ML:	
  
▪ run	
  lots	
  of	
  unlabeled	
  data	
  through	
  
an	
  algorithm	
  to	
  detect	
  “structure”	
  
or	
  embedding	
  
▪ for	
  example,	
  clustering	
  algorithms	
  
such	
  as	
  K-­‐means	
  
▪ unsupervised	
  approaches	
  for	
  AI	
  

are	
  an	
  open	
  research	
  question
Active	
  learning
special	
  case	
  of	
  semi-­‐supervised	
  ML:	
  
▪ send	
  difficult	
  decisions/edge	
  cases	
  

to	
  experts;	
  let	
  algorithms	
  handle	
  
routine	
  decisions	
  (automation)	
  
▪ works	
  well	
  in	
  use	
  cases	
  which	
  have	
  
lots	
  of	
  inexpensive,	
  unlabeled	
  data	
  
▪ e.g.,	
  abundance	
  of	
  content	
  to	
  be	
  
classified,	
  where	
  the	
  cost	
  of	
  
labeling	
  is	
  the	
  expense
The	
  reality	
  of	
  data	
  rates
“If	
  you	
  only	
  have	
  10	
  examples	
  of	
  something,	
  it’s	
  going

	
  	
  to	
  be	
  hard	
  to	
  make	
  deep	
  learning	
  work.	
  If	
  you	
  have

	
  	
  100,000	
  things	
  you	
  care	
  about,	
  records	
  or	
  whatever,

	
  	
  that’s	
  the	
  kind	
  of	
  scale	
  where	
  you	
  should	
  really	
  start

	
  	
  thinking	
  about	
  these	
  kinds	
  of	
  techniques.”	
  
Jeff	
  Dean	
  	
  Google

VB	
  Summit	
  2017-­‐10-­‐23	
  
venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐
examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/
The	
  reality	
  of	
  data	
  rates
Use	
  cases	
  for	
  deep	
  learning	
  must	
  have	
  large,	
  carefully	
  
labeled	
  data	
  sets,	
  while	
  reinforcement	
  learning	
  needs	
  
much	
  more	
  data	
  than	
  that.	
  
Active	
  learning	
  can	
  yield	
  good	
  results	
  with	
  substantially	
  
smaller	
  data	
  rates,	
  while	
  leveraging	
  an	
  organization’s	
  
expertise	
  to	
  bootstrap	
  toward	
  larger	
  labeled	
  data	
  sets,	
  
e.g.,	
  as	
  preparation	
  for	
  deep	
  learning,	
  etc.
reinforcement
learning
supervised
learning
active
learning
deep
learning
data rates
(log scale)
Case	
  studies:	
  

practices	
  in	
  industry
On-­‐demand	
  humans
16
Active	
  learning
Real-­‐World	
  Active	
  Learning:	
  Applications	
  and	
  
Strategies	
  for	
  Human-­‐in-­‐the-­‐Loop	
  Machine	
  Learning

radar.oreilly.com/2015/02/human-­‐in-­‐the-­‐loop-­‐
machine-­‐learning.html

Ted	
  Cuzzillo

O’Reilly	
  Media,	
  2015-­‐02-­‐05	
  
Develop	
  a	
  policy	
  for	
  how	
  human	
  experts	
  select	
  exemplars:	
  
▪ bias	
  toward	
  labels	
  most	
  likely	
  to	
  influence	
  the	
  classifier	
  
▪ bias	
  toward	
  ensemble	
  disagreement	
  
▪ bias	
  toward	
  denser	
  regions	
  of	
  training	
  data
17
Active	
  learning
Active	
  learning	
  and	
  transfer	
  learning

safaribooksonline.com/library/view/oreilly-­‐
artificial-­‐intelligence/9781491985250/
video314919.html

Luke	
  Biewald	
  	
  CrowdFlower

The	
  AI	
  Conf,	
  2017-­‐09-­‐17	
  
breakthroughs	
  lag	
  algorithm	
  invention,	
  waiting	
  for	
  
“killer	
  data	
  set”	
  to	
  emerge,	
  often	
  decade+
18
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  a	
  business	
  that	
  combines	
  human	
  experts	
  
and	
  data	
  science

oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐
combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2

Eric	
  Colson	
  	
  StitchFix

O’Reilly	
  Data	
  Show,	
  2016-­‐01-­‐28	
  
“what	
  machines	
  can’t	
  do	
  are	
  things	
  around	
  cognition,

	
  	
  things	
  that	
  have	
  to	
  do	
  with	
  ambient	
  information,	
  or

	
  	
  appreciation	
  of	
  aesthetics,	
  or	
  even	
  the	
  ability	
  to

	
  	
  relate	
  to	
  another	
  human”



19
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Strategies	
  for	
  integrating	
  people	
  and	
  machine	
  
learning	
  in	
  online	
  systems

safaribooksonline.com/library/view/oreilly-­‐
artificial-­‐intelligence/9781491976289/
video311857.html

Jason	
  Laska	
  	
  Clara	
  Labs

The	
  AI	
  Conf,	
  2017-­‐06-­‐29	
  
how	
  to	
  create	
  a	
  two-­‐sided	
  marketplace	
  where	
  machines	
  
and	
  people	
  compete	
  on	
  a	
  spectrum	
  of	
  relative	
  expertise	
  
and	
  capabilities



20
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  human-­‐assisted	
  AI	
  applications

oreilly.com/ideas/building-­‐human-­‐
assisted-­‐ai-­‐applications

Adam	
  Marcus	
  	
  B12

O’Reilly	
  Data	
  Show,	
  2016-­‐08-­‐25	
  
Orchestra:	
  a	
  platform	
  for	
  building	
  human-­‐
assisted	
  AI	
  applications,	
  e.g.,	
  to	
  create	
  
business	
  websites

https://github.com/b12io/orchestra	
  
example	
  http://www.coloradopicked.com/
21
Design	
  pattern:	
  Flash	
  teams
Expert	
  Crowdsourcing	
  with	
  Flash	
  Teams

hci.stanford.edu/publications/2014/
flashteams/flashteams-­‐uist2014.pdf

Daniela	
  Retelny,	
  et	
  al.	
  

Stanford	
  HCI	
  
“A	
  flash	
  team	
  is	
  a	
  linked	
  set	
  of	
  modular	
  tasks	
  

	
  	
  that	
  draw	
  upon	
  paid	
  experts	
  from	
  the	
  crowd,	
  

	
  	
  often	
  three	
  to	
  six	
  at	
  a	
  time,	
  on	
  demand”	
  
http://stanfordhci.github.io/flash-­‐teams/
22
Weak	
  supervision	
  /	
  Data	
  programming
Creating	
  large	
  training	
  data	
  sets	
  quickly

oreilly.com/ideas/creating-­‐large-­‐training-­‐
data-­‐sets-­‐quickly

Alex	
  Ratner	
  	
  Stanford

O’Reilly	
  Data	
  Show,	
  2017-­‐06-­‐08	
  
Snorkel:	
  “weak	
  supervision”	
  and	
  “data	
  
programming”	
  as	
  another	
  instance	
  of	
  

human-­‐in-­‐the-­‐loop

github.com/HazyResearch/snorkel	
  
conferences.oreilly.com/strata/strata-­‐ny/public/
schedule/detail/61849
23
Prodigy	
  by	
  Explosion.ai
https://explosion.ai/blog/prodigy-­‐
annotation-­‐tool-­‐active-­‐learning
24
Problem:	
  
disambiguating	
  contexts
Disambiguating	
  contexts
Overlapping	
  contexts	
  pose	
  hard	
  problems	
  in	
  natural	
  language	
  understanding.	
  
That	
  runs	
  counter	
  to	
  the	
  correlation	
  emphasis	
  of	
  big	
  data.

NLP	
  libraries	
  lack	
  features	
  for	
  disambiguation.
Disambiguating	
  contexts
27
Suppose	
  someone	
  publishes	
  a	
  book	
  which	
  uses	
  the	
  term	
  
`IOS`:	
  are	
  they	
  talking	
  about	
  an	
  operating	
  system	
  for	
  an	
  
Apple	
  iPhone,	
  or	
  about	
  an	
  operating	
  system	
  for	
  a	
  Cisco	
  
router?	
  	
  
We	
  handle	
  lots	
  of	
  content	
  about	
  both.	
  Disambiguating	
  those	
  
contexts	
  is	
  important	
  for	
  good	
  UX	
  in	
  personalized	
  learning.	
  
In	
  other	
  words,	
  how	
  do	
  machines	
  help	
  people	
  

distinguish	
  that	
  content	
  within	
  search?	
  
Potentially	
  a	
  good	
  case	
  for	
  deep	
  learning,	
  

except	
  for	
  the	
  lack	
  of	
  labeled	
  data	
  at	
  scale.
Active	
  learning	
  through	
  Jupyter
28
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  

pipelines	
  for	
  disambiguation,	
  where	
  machines	
  

and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom	
  
▪ based	
  on	
  use	
  of	
  nbformat,	
  pandas,	
  scikit-­‐learn
Active	
  learning	
  through	
  Jupyter
29
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  
pipelines
and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom
▪ based	
  on	
  use	
  of	
  
Jupyter	
  notebook	
  as…	
  
▪ one	
  part	
  configuration	
  file	
  
▪ one	
  part	
  data	
  sample	
  
▪ one	
  part	
  structured	
  log	
  
▪ one	
  part	
  data	
  visualization	
  tool	
  
plus,	
  subsequent	
  data	
  mining	
  of	
  these	
  

notebooks	
  helps	
  augment	
  our	
  ontology
Active	
  learning	
  through	
  Jupyter
30
ML#Pipelines
Jupyter#kernel
Browser
SSH#tunnel
Active	
  learning	
  through	
  Jupyter
▪ Notebooks	
  allow	
  the	
  human	
  experts	
  to	
  access	
  the	
  
internals	
  of	
  a	
  mostly	
  automated	
  ML	
  pipeline,	
  rapidly	
  
▪ Stated	
  another	
  way,	
  both	
  the	
  machines	
  and	
  the	
  people	
  
become	
  collaborators	
  on	
  shared	
  documents	
  
▪ Anticipates	
  upcoming	
  collaborative	
  document	
  features	
  
in	
  JupyterLab
Active	
  learning	
  through	
  Jupyter
1. Experts	
  use	
  notebooks	
  to	
  provide	
  examples	
  of	
  book	
  chapters,	
  video	
  
segments,	
  etc.,	
  for	
  each	
  key	
  phrase	
  that	
  has	
  overlapping	
  contexts	
  
2. Machines	
  build	
  ensemble	
  ML	
  models	
  based	
  on	
  those	
  examples,	
  
updating	
  notebooks	
  with	
  model	
  evaluation	
  
3. Machines	
  attempt	
  to	
  annotate	
  labels	
  for	
  millions	
  of	
  pieces	
  of	
  content,	
  

e.g.,	
  `AlphaGo`,	
  `Golang`,	
  versus	
  a	
  mundane	
  use	
  of	
  the	
  verb	
  `go`	
  
4. Disambiguation	
  can	
  run	
  mostly	
  automated,	
  in	
  parallel	
  at	
  scale	
  –	
  

through	
  integration	
  with	
  Apache	
  Spark	
  
5. In	
  cases	
  where	
  ensembles	
  disagree,	
  ML	
  pipelines	
  defer	
  to	
  human	
  
experts	
  who	
  make	
  judgement	
  calls,	
  providing	
  further	
  examples	
  
6. New	
  examples	
  go	
  into	
  training	
  ML	
  pipelines	
  to	
  build	
  better	
  models	
  
7. Rinse,	
  lather,	
  repeat
Nuances
▪ No	
  Free	
  Lunch	
  theorem:	
  it	
  is	
  better	
  to	
  err	
  on	
  the	
  
side	
  of	
  less	
  false	
  positives	
  /	
  more	
  false	
  negatives	
  
in	
  use	
  cases	
  about	
  learning	
  materials	
  
▪ Employ	
  a	
  bias	
  toward	
  exemplars	
  policy,	
  i.e.,	
  those	
  
most	
  likely	
  to	
  influence	
  the	
  classifier	
  
▪ Potentially,	
  “AI	
  experts”	
  may	
  be	
  Customer	
  Service	
  
staff	
  who	
  review	
  edge	
  cases	
  within	
  search	
  results	
  
or	
  recommended	
  content	
  –	
  as	
  an	
  integral	
  part	
  of	
  
our	
  UX	
  –	
  then	
  re-­‐train	
  the	
  ML	
  pipelines	
  through	
  
examples	
  
Management	
  strategy	
  –	
  before
Generally	
  with	
  Big	
  Data,	
  we	
  are	
  considering:	
  
▪ DAG	
  workflow	
  execution	
  –	
  which	
  is	
  linear	
  
▪ data-­‐driven	
  organizations	
  
▪ ML	
  based	
  on	
  optimizing	
  for	
  

objective	
  functions	
  
▪ questions	
  of	
  correlation	
  

versus	
  causation	
  
▪ avoiding	
  “garbage	
  in,	
  garbage	
  out”
Scrub
token
Document
Collection
Tokenize
Word
Count
GroupBy
token
Count
Stop Word
List
Regex
token
HashJoin
Left
RHS
M
R
34
Management	
  strategy	
  –	
  after
HITL	
  introduces	
  circularities:	
  
▪ aka,	
  second-­‐order	
  cybernetics	
  
▪ leverage	
  feedback	
  loops	
  

as	
  conversations	
  
▪ focus	
  on	
  human	
  scale,	
  

design	
  thinking	
  
▪ people	
  and	
  machines	
  

work	
  together	
  on	
  teams	
  
▪ budget	
  experts’	
  time	
  on	
  

handling	
  the	
  exceptions
AI team
content
ontology
ML models attempt
to label the data
automatically
Expert judgement
about edge cases,
provides examples
ML models trained
using examples
Expert decisions
to extend vocabulary
ML models
have consensus,
confidence
labels
35
Essential	
  takeaway	
  idea:	
  
Depending	
  on	
  the	
  organization,	
  key	
  ingredients	
  
needed	
  to	
  enable	
  effective	
  AI	
  apps	
  may	
  come	
  
from	
  non-­‐traditional	
  “tech”	
  sources	
  …	
  
In	
  other	
  words,	
  based	
  on	
  human-­‐in-­‐the-­‐loop	
  
design	
  pattern,	
  AI	
  expertise	
  may	
  emerge	
  from	
  
your	
  Sales,	
  Marketing,	
  and	
  Customer	
  Service	
  
teams	
  –	
  which	
  have	
  crucial	
  insights	
  about	
  your	
  
customers’	
  needs.
Looking	
  ahead:	
  
some	
  trends	
  at	
  work
Looking	
  ahead	
  2018:	
  hardware	
  trends
Indications:	
  	
  progressively	
  more	
  advanced	
  mathematics	
  
moves	
  into	
  hardware	
  and	
  low-­‐level	
  software,	
  as	
  use	
  
cases	
  and	
  ROI	
  become	
  established	
  over	
  time	
  –	
  optimizing	
  
for	
  the	
  speed	
  of	
  calculations	
  and	
  capacity	
  of	
  data	
  storage	
  
Contra:	
  	
  programming	
  languages	
  which	
  use	
  abstraction	
  
layers	
  that	
  obscure	
  access	
  to	
  hardware	
  features,	
  aka	
  Java
38
… … … … …
Indications:
moves	
  into	
  hardware	
  and	
  low-­‐level	
  software,	
  as	
  use	
  
cases	
  and	
  ROI	
  become	
  established	
  over	
  time	
  –	
  optimizing	
  
for	
  the	
  speed	
  of	
  calculations	
  and	
  capacity	
  of	
  data	
  storage
Contra:
layers	
  that	
  obscure	
  access	
  to	
  hardware	
  features,	
  aka	
  Java
Looking	
  ahead	
  2018:	
  hardware	
  trends
39
… … … … …
Realistically,	
  current	
  use	
  of	
  math	
  in	
  ML	
  suffers	
  from	
  some	
  
“legacy	
  software”	
  aspects:	
  	
  underlying	
  libraries	
  generally	
  
focus	
  on	
  linear	
  algebra,	
  optimizing	
  for	
  1-­‐2	
  variables,	
  etc.	
  	
  
Meanwhile	
  our	
  use	
  cases	
  require	
  graphs,	
  multivariate	
  
problems,	
  and	
  other	
  compelling	
  cases	
  for	
  more	
  advanced	
  
math.	
  We	
  will	
  see	
  these	
  eventually	
  move	
  into	
  hardware	
  

and	
  low-­‐level	
  libraries:	
  	
  tensor	
  decomposition,	
  homology,	
  
hypervolume	
  optimization,	
  etc.
Looking	
  ahead	
  2018:	
  software	
  trends
Indications:	
  	
  cognitive	
  subsystems	
  progressively	
  becoming	
  
automated,	
  e.g.,	
  sensory	
  perception,	
  pattern	
  recognition,	
  
decisions,	
  gaming,	
  mimicry,	
  optimization,	
  knowledge	
  
representation,	
  language,	
  complex	
  movements,	
  planning,	
  
scheduling,	
  etc.	
  
Contra:	
  	
  merely	
  incremental	
  changes	
  for	
  practices	
  in	
  

software	
  engineering	
  and	
  product	
  management	
  –	
  within	
  the	
  
context	
  of	
  AI	
  apps	
  –	
  which	
  has	
  suffered	
  from	
  being	
  	
  too“linear”
40
Indications:
automated,	
  e.g.,	
  sensory	
  perception,	
  pattern	
  recognition,	
  
decisions,	
  gaming,	
  mimicry,	
  optimization,	
  knowledge	
  
representation,	
  language,	
  complex	
  movements,	
  planning,	
  
scheduling,	
  etc.
Contra:
software	
  engineering	
  and	
  product	
  management	
  –	
  within	
  the	
  
context	
  of	
  AI	
  apps	
  –	
  which	
  has	
  
Looking	
  ahead	
  2018:	
  software	
  trends
41
Enormous	
  upside	
  from	
  AI,	
  across	
  verticals;	
  however,	
  to	
  be	
  

in	
  the	
  game,	
  an	
  organization	
  must	
  already	
  have	
  Big	
  Data	
  
infrastructure	
  and	
  related	
  practices	
  in	
  place:	
  (1)	
  cloud	
  and	
  
SRE;	
  (2)	
  eliminating	
  data	
  silos;	
  (3)	
  cleaning	
  data	
  /	
  repairing	
  
metadata;	
  (4)	
  embracing	
  contemporary	
  data	
  science.	
  
Those	
  are	
  prerequisites,	
  there	
  are	
  no	
  short	
  cuts	
  in	
  AI.	
  

Plus,	
  there’s	
  an	
  ongoing	
  talent	
  crunch.	
  
–	
  consensus	
  among	
  major	
  consulting	
  firms,	
  

	
  	
  	
  Strata	
  2017	
  Exec	
  Briefings
Looking	
  ahead	
  2018:	
  people	
  trends
Indications:	
  	
  organizations	
  embracing	
  circularities,	
  focused	
  
on	
  optimizing	
  for	
  fitness	
  functions	
  (populations	
  of	
  priorities,	
  
longer-­‐term	
  ROI)	
  in	
  lieu	
  of	
  optimizing	
  for	
  objective	
  functions	
  
(singular	
  goals,	
  linear	
  cognition,	
  short-­‐term	
  ROI)	
  
Contra:	
  	
  conflict	
  defined	
  by	
  “confident	
  personalities	
  vs.	
  
confidence	
  intervals”,	
  see	
  goo.gl/GPYZ6v
42
Indications:
on	
  optimizing	
  for	
  
longer-­‐term	
  ROI)	
  in	
  lieu	
  of	
  optimizing	
  for	
  
(singular	
  goals,	
  linear	
  cognition,	
  short-­‐term	
  ROI)
Contra:
confidence	
  intervals”,	
  see	
  
Looking	
  ahead	
  2018:	
  people	
  trends
43
Peter	
  Norvig:	
  	
  disruptions	
  in	
  software	
  process	
  for	
  uncertain	
  
domains	
  –	
  the	
  workflow	
  of	
  the	
  AI	
  researcher	
  has	
  been	
  quite	
  
different	
  from	
  the	
  workflow	
  of	
  the	
  software	
  developer	
  	
  

goo.gl/XcDCZ2
François	
  Chollet:	
  	
  “casting	
  the	
  end	
  goal	
  of	
  intelligence	
  as	
  
the	
  optimization	
  of	
  an	
  extrinsic,	
  scalar	
  reward	
  function”	
  	
  

goo.gl/q7Je7D
Summary
Ahead	
  in	
  AI:	
  hardware	
  advances	
  force	
  abrupt	
  
changes	
  in	
  software	
  practices	
  –	
  which	
  has	
  
lagged	
  due	
  to	
  lack	
  of	
  infrastructure,	
  data	
  
quality,	
  outdated	
  process,	
  etc.	
  
HITL	
  (active	
  learning)	
  as	
  management	
  strategy	
  
for	
  AI	
  addresses	
  broad	
  needs	
  across	
  industry,	
  
especially	
  for	
  enterprise	
  organizations.	
  
Big	
  Team	
  begins	
  to	
  take	
  its	
  place	
  in	
  the	
  formula	
  
Big	
  Data	
  +	
  Big	
  Compute	
  +	
  Big	
  Models.
Summary
The	
  “game”	
  is	
  not	
  to	
  replace	
  people	
  –	
  instead	
  it	
  
is	
  about	
  leveraging	
  AI	
  to	
  augment	
  staff,	
  so	
  that	
  
organizations	
  can	
  retain	
  people	
  with	
  valuable	
  
domain	
  expertise,	
  making	
  their	
  contributions	
  
and	
  experience	
  even	
  more	
  vital.	
  
This	
  is	
  a	
  personal	
  opinion,	
  which	
  does	
  not	
  
necessarily	
  reflect	
  the	
  views	
  of	
  my	
  employer.	
  
However,	
  the	
  views	
  of	
  my	
  employer…
Why	
  we’ll	
  never	
  run	
  out	
  of	
  jobs
46
Strata	
  Data	
  
SG,	
  Dec	
  4-­‐7

SJ,	
  Mar	
  5-­‐8

UK,	
  May	
  21-­‐24

CN,	
  Jul	
  12-­‐15	
  
The	
  AI	
  Conf	
  
CN	
  Apr	
  10-­‐13

NY,	
  Apr	
  29-­‐May	
  2

SF,	
  Sep	
  4-­‐7

UK,	
  Oct	
  8-­‐11	
  
JupyterCon	
  
NY,	
  Aug	
  21-­‐24	
  
OSCON	
  
PDX,	
  Jul	
  16-­‐19,	
  2018
47
48
Get	
  Started	
  with	
  
NLP	
  in	
  Python
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
updates,	
  reviews,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid
Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataJongwook Woo
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryJongwook Woo
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkJongwook Woo
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 

Was ist angesagt? (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Ähnlich wie Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017

Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Scienceds4good
 
Big Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedBig Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedMatt Stubbs
 
Materials for getting started with data science
Materials for getting started with data scienceMaterials for getting started with data science
Materials for getting started with data scienceihansel
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleMartin Kaltenböck
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Agence du Numérique (AdN)
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptxRUDRAPRASADSABAR
 

Ähnlich wie Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017 (20)

Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
Big Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning DemystifiedBig Data LDN 2017: Deep Learning Demystified
Big Data LDN 2017: Deep Learning Demystified
 
Materials for getting started with data science
Materials for getting started with data scienceMaterials for getting started with data science
Materials for getting started with data science
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycle
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
 

Mehr von Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

Mehr von Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Kürzlich hochgeladen

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Kürzlich hochgeladen (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

Human-in-the-loop: a design pattern for managing teams which leverage ML by Paco Nathan at Big Data Spain 2017

  • 2. Human-­‐in-­‐a-­‐loop:   design  pattern  for  managing  teams  that  leverage  ML Paco  Nathan    @pacoid   Director,  Learning  Group  @  O’Reilly  Media   Big  Data  Spain,  Madrid    2017-­‐11-­‐16 slides:  goo.gl/ba85nF
  • 3. Framing Imagine  having  a  mostly-­‐automated  system  where  
 people  and  machines  collaborate  together…   May  sound  a  bit  Sci-­‐Fi,  though  arguably  commonplace.  
 One  challenge  is  whether  we  can  advance  beyond  just   handling  rote  tasks.     Instead  of  simply  running  code  libraries,  can  machines  
 make  difficult  decisions,  exercise  judgement  in  complex   situations?     Can  we  build  systems  in  which  people  who  aren’t  
 AI  experts  can  “teach”  machines  to  perform  complex  
 work  –  based  on  examples,  not  code?
  • 5. UX  for  content  discovery:   ▪ partly  generated  +  curated  by  people   ▪ partly  generated  +  curated  by  AI  apps
  • 6. AI  in  Media ▪ content  which  can  represented  as  
 text  can  be  parsed  by  NLP,  then   manipulated  by  available  AI  tooling     ▪ labeled  images  get  really  interesting   ▪ assumption:  text  or  images  –  within  
 a  context  –  have  inherent  structure   ▪ representation  of  that  kind  of  structure   is  rare  in  the  Media  vertical  –  so  far 5
  • 7. {"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49], "take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l "NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [ "few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0 "often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11, "people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, " "first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou "about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they" "they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and" 0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69], "in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of", 0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31, "existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS 76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people", 1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80 "'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati "virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l "like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", " "CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also", "also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as" 0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN" 93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95], "tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2, "be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38, "hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [ "virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1 103]], "id": "001.video197359", "sha1": "4b69cf60f0497887e3776619b922514f2e5b70a8"} AI  in  Media 6 {"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"} {"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"} {"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"} {"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"} {"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"} {"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"} {"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"} {"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"} {"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"} Transcript: let's take a look at a few examples often when people are first learning about Docker they try and put it in one of a few existing categories sometimes people think it's a virtualization tool like VMware or virtualbox also known as a hypervisor these are tools which are emulating hardware for virtual software Confidence: 0.973419129848 39 KUBERNETES 0.8747 coreos 0.8624 etcd 0.8478 DOCKER CONTAINERS 0.8458 mesos 0.8406 DOCKER 0.8354 DOCKER CONTAINER 0.8260 KUBERNETES CLUSTER 0.8258 docker image 0.8252 EC2 0.8210 docker hub 0.8138 OPENSTACK orm:Docker a orm:Vendor; a orm:Container; a orm:Open_Source; a orm:Commercial_Software; owl:sameAs dbr:Docker_%28software%29; skos:prefLabel "Docker"@en;
  • 8. Knowledge  Graph ▪ used  to  construct  an  ontology  about   technology,  based  on  learning   materials  from  200+  publishers   ▪ uses  SKOS  as  a  foundation,  ties  into  
 US  Library  of  Congress  and  DBpedia  
 as  upper  ontologies   ▪ primary  structure  is  “human  scale”,  
 used  as  control  points   ▪ majority  (>90%)  of  the  graph  
 comes  from  machine  generated  
 data  products 7
  • 9. AI  is  real,  but  why  now? ▪ Big  Data:  machine  data  (1997-­‐ish)   ▪ Big  Compute:  cloud  computing  (2006-­‐ish)   ▪ Big  Models:  deep  learning  (2009-­‐ish)   The  confluence  of  three  factors  created  a  business  
 environment  where  AI  could  become  mainstream   What  else  is  needed? 8
  • 11. Machine  learning supervised  ML:   ▪ take  a  dataset  where  each  element   has  a  label   ▪ train  models  on  a  portion  of  the   data  to  predict  the  labels,  then  
 evaluate  on  the  holdout   ▪ deep  learning  is  a  popular  example,  
 but  only  if  you  have  lots  of  labeled   training  data  available
  • 12. Machine  learning unsupervised  ML:   ▪ run  lots  of  unlabeled  data  through   an  algorithm  to  detect  “structure”   or  embedding   ▪ for  example,  clustering  algorithms   such  as  K-­‐means   ▪ unsupervised  approaches  for  AI  
 are  an  open  research  question
  • 13. Active  learning special  case  of  semi-­‐supervised  ML:   ▪ send  difficult  decisions/edge  cases  
 to  experts;  let  algorithms  handle   routine  decisions  (automation)   ▪ works  well  in  use  cases  which  have   lots  of  inexpensive,  unlabeled  data   ▪ e.g.,  abundance  of  content  to  be   classified,  where  the  cost  of   labeling  is  the  expense
  • 14. The  reality  of  data  rates “If  you  only  have  10  examples  of  something,  it’s  going
    to  be  hard  to  make  deep  learning  work.  If  you  have
    100,000  things  you  care  about,  records  or  whatever,
    that’s  the  kind  of  scale  where  you  should  really  start
    thinking  about  these  kinds  of  techniques.”   Jeff  Dean    Google
 VB  Summit  2017-­‐10-­‐23   venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐ examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/
  • 15. The  reality  of  data  rates Use  cases  for  deep  learning  must  have  large,  carefully   labeled  data  sets,  while  reinforcement  learning  needs   much  more  data  than  that.   Active  learning  can  yield  good  results  with  substantially   smaller  data  rates,  while  leveraging  an  organization’s   expertise  to  bootstrap  toward  larger  labeled  data  sets,   e.g.,  as  preparation  for  deep  learning,  etc. reinforcement learning supervised learning active learning deep learning data rates (log scale)
  • 18. Active  learning Real-­‐World  Active  Learning:  Applications  and   Strategies  for  Human-­‐in-­‐the-­‐Loop  Machine  Learning
 radar.oreilly.com/2015/02/human-­‐in-­‐the-­‐loop-­‐ machine-­‐learning.html
 Ted  Cuzzillo
 O’Reilly  Media,  2015-­‐02-­‐05   Develop  a  policy  for  how  human  experts  select  exemplars:   ▪ bias  toward  labels  most  likely  to  influence  the  classifier   ▪ bias  toward  ensemble  disagreement   ▪ bias  toward  denser  regions  of  training  data 17
  • 19. Active  learning Active  learning  and  transfer  learning
 safaribooksonline.com/library/view/oreilly-­‐ artificial-­‐intelligence/9781491985250/ video314919.html
 Luke  Biewald    CrowdFlower
 The  AI  Conf,  2017-­‐09-­‐17   breakthroughs  lag  algorithm  invention,  waiting  for   “killer  data  set”  to  emerge,  often  decade+ 18
  • 20. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  a  business  that  combines  human  experts   and  data  science
 oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐ combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2
 Eric  Colson    StitchFix
 O’Reilly  Data  Show,  2016-­‐01-­‐28   “what  machines  can’t  do  are  things  around  cognition,
    things  that  have  to  do  with  ambient  information,  or
    appreciation  of  aesthetics,  or  even  the  ability  to
    relate  to  another  human”
 
 19
  • 21. Design  pattern:  Human-­‐in-­‐the-­‐loop Strategies  for  integrating  people  and  machine   learning  in  online  systems
 safaribooksonline.com/library/view/oreilly-­‐ artificial-­‐intelligence/9781491976289/ video311857.html
 Jason  Laska    Clara  Labs
 The  AI  Conf,  2017-­‐06-­‐29   how  to  create  a  two-­‐sided  marketplace  where  machines   and  people  compete  on  a  spectrum  of  relative  expertise   and  capabilities
 
 20
  • 22. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  human-­‐assisted  AI  applications
 oreilly.com/ideas/building-­‐human-­‐ assisted-­‐ai-­‐applications
 Adam  Marcus    B12
 O’Reilly  Data  Show,  2016-­‐08-­‐25   Orchestra:  a  platform  for  building  human-­‐ assisted  AI  applications,  e.g.,  to  create   business  websites
 https://github.com/b12io/orchestra   example  http://www.coloradopicked.com/ 21
  • 23. Design  pattern:  Flash  teams Expert  Crowdsourcing  with  Flash  Teams
 hci.stanford.edu/publications/2014/ flashteams/flashteams-­‐uist2014.pdf
 Daniela  Retelny,  et  al.  
 Stanford  HCI   “A  flash  team  is  a  linked  set  of  modular  tasks  
    that  draw  upon  paid  experts  from  the  crowd,  
    often  three  to  six  at  a  time,  on  demand”   http://stanfordhci.github.io/flash-­‐teams/ 22
  • 24. Weak  supervision  /  Data  programming Creating  large  training  data  sets  quickly
 oreilly.com/ideas/creating-­‐large-­‐training-­‐ data-­‐sets-­‐quickly
 Alex  Ratner    Stanford
 O’Reilly  Data  Show,  2017-­‐06-­‐08   Snorkel:  “weak  supervision”  and  “data   programming”  as  another  instance  of  
 human-­‐in-­‐the-­‐loop
 github.com/HazyResearch/snorkel   conferences.oreilly.com/strata/strata-­‐ny/public/ schedule/detail/61849 23
  • 27. Disambiguating  contexts Overlapping  contexts  pose  hard  problems  in  natural  language  understanding.   That  runs  counter  to  the  correlation  emphasis  of  big  data.
 NLP  libraries  lack  features  for  disambiguation.
  • 28. Disambiguating  contexts 27 Suppose  someone  publishes  a  book  which  uses  the  term   `IOS`:  are  they  talking  about  an  operating  system  for  an   Apple  iPhone,  or  about  an  operating  system  for  a  Cisco   router?     We  handle  lots  of  content  about  both.  Disambiguating  those   contexts  is  important  for  good  UX  in  personalized  learning.   In  other  words,  how  do  machines  help  people  
 distinguish  that  content  within  search?   Potentially  a  good  case  for  deep  learning,  
 except  for  the  lack  of  labeled  data  at  scale.
  • 29. Active  learning  through  Jupyter 28 Jupyter  notebooks  are  used  to  manage  ML  
 pipelines  for  disambiguation,  where  machines  
 and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom   ▪ based  on  use  of  nbformat,  pandas,  scikit-­‐learn
  • 30. Active  learning  through  Jupyter 29 Jupyter  notebooks  are  used  to  manage  ML   pipelines and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom ▪ based  on  use  of   Jupyter  notebook  as…   ▪ one  part  configuration  file   ▪ one  part  data  sample   ▪ one  part  structured  log   ▪ one  part  data  visualization  tool   plus,  subsequent  data  mining  of  these  
 notebooks  helps  augment  our  ontology
  • 31. Active  learning  through  Jupyter 30 ML#Pipelines Jupyter#kernel Browser SSH#tunnel
  • 32. Active  learning  through  Jupyter ▪ Notebooks  allow  the  human  experts  to  access  the   internals  of  a  mostly  automated  ML  pipeline,  rapidly   ▪ Stated  another  way,  both  the  machines  and  the  people   become  collaborators  on  shared  documents   ▪ Anticipates  upcoming  collaborative  document  features   in  JupyterLab
  • 33. Active  learning  through  Jupyter 1. Experts  use  notebooks  to  provide  examples  of  book  chapters,  video   segments,  etc.,  for  each  key  phrase  that  has  overlapping  contexts   2. Machines  build  ensemble  ML  models  based  on  those  examples,   updating  notebooks  with  model  evaluation   3. Machines  attempt  to  annotate  labels  for  millions  of  pieces  of  content,  
 e.g.,  `AlphaGo`,  `Golang`,  versus  a  mundane  use  of  the  verb  `go`   4. Disambiguation  can  run  mostly  automated,  in  parallel  at  scale  –  
 through  integration  with  Apache  Spark   5. In  cases  where  ensembles  disagree,  ML  pipelines  defer  to  human   experts  who  make  judgement  calls,  providing  further  examples   6. New  examples  go  into  training  ML  pipelines  to  build  better  models   7. Rinse,  lather,  repeat
  • 34. Nuances ▪ No  Free  Lunch  theorem:  it  is  better  to  err  on  the   side  of  less  false  positives  /  more  false  negatives   in  use  cases  about  learning  materials   ▪ Employ  a  bias  toward  exemplars  policy,  i.e.,  those   most  likely  to  influence  the  classifier   ▪ Potentially,  “AI  experts”  may  be  Customer  Service   staff  who  review  edge  cases  within  search  results   or  recommended  content  –  as  an  integral  part  of   our  UX  –  then  re-­‐train  the  ML  pipelines  through   examples  
  • 35. Management  strategy  –  before Generally  with  Big  Data,  we  are  considering:   ▪ DAG  workflow  execution  –  which  is  linear   ▪ data-­‐driven  organizations   ▪ ML  based  on  optimizing  for  
 objective  functions   ▪ questions  of  correlation  
 versus  causation   ▪ avoiding  “garbage  in,  garbage  out” Scrub token Document Collection Tokenize Word Count GroupBy token Count Stop Word List Regex token HashJoin Left RHS M R 34
  • 36. Management  strategy  –  after HITL  introduces  circularities:   ▪ aka,  second-­‐order  cybernetics   ▪ leverage  feedback  loops  
 as  conversations   ▪ focus  on  human  scale,  
 design  thinking   ▪ people  and  machines  
 work  together  on  teams   ▪ budget  experts’  time  on  
 handling  the  exceptions AI team content ontology ML models attempt to label the data automatically Expert judgement about edge cases, provides examples ML models trained using examples Expert decisions to extend vocabulary ML models have consensus, confidence labels 35
  • 37. Essential  takeaway  idea:   Depending  on  the  organization,  key  ingredients   needed  to  enable  effective  AI  apps  may  come   from  non-­‐traditional  “tech”  sources  …   In  other  words,  based  on  human-­‐in-­‐the-­‐loop   design  pattern,  AI  expertise  may  emerge  from   your  Sales,  Marketing,  and  Customer  Service   teams  –  which  have  crucial  insights  about  your   customers’  needs.
  • 38. Looking  ahead:   some  trends  at  work
  • 39. Looking  ahead  2018:  hardware  trends Indications:    progressively  more  advanced  mathematics   moves  into  hardware  and  low-­‐level  software,  as  use   cases  and  ROI  become  established  over  time  –  optimizing   for  the  speed  of  calculations  and  capacity  of  data  storage   Contra:    programming  languages  which  use  abstraction   layers  that  obscure  access  to  hardware  features,  aka  Java 38 … … … … …
  • 40. Indications: moves  into  hardware  and  low-­‐level  software,  as  use   cases  and  ROI  become  established  over  time  –  optimizing   for  the  speed  of  calculations  and  capacity  of  data  storage Contra: layers  that  obscure  access  to  hardware  features,  aka  Java Looking  ahead  2018:  hardware  trends 39 … … … … … Realistically,  current  use  of  math  in  ML  suffers  from  some   “legacy  software”  aspects:    underlying  libraries  generally   focus  on  linear  algebra,  optimizing  for  1-­‐2  variables,  etc.     Meanwhile  our  use  cases  require  graphs,  multivariate   problems,  and  other  compelling  cases  for  more  advanced   math.  We  will  see  these  eventually  move  into  hardware  
 and  low-­‐level  libraries:    tensor  decomposition,  homology,   hypervolume  optimization,  etc.
  • 41. Looking  ahead  2018:  software  trends Indications:    cognitive  subsystems  progressively  becoming   automated,  e.g.,  sensory  perception,  pattern  recognition,   decisions,  gaming,  mimicry,  optimization,  knowledge   representation,  language,  complex  movements,  planning,   scheduling,  etc.   Contra:    merely  incremental  changes  for  practices  in  
 software  engineering  and  product  management  –  within  the   context  of  AI  apps  –  which  has  suffered  from  being    too“linear” 40
  • 42. Indications: automated,  e.g.,  sensory  perception,  pattern  recognition,   decisions,  gaming,  mimicry,  optimization,  knowledge   representation,  language,  complex  movements,  planning,   scheduling,  etc. Contra: software  engineering  and  product  management  –  within  the   context  of  AI  apps  –  which  has   Looking  ahead  2018:  software  trends 41 Enormous  upside  from  AI,  across  verticals;  however,  to  be  
 in  the  game,  an  organization  must  already  have  Big  Data   infrastructure  and  related  practices  in  place:  (1)  cloud  and   SRE;  (2)  eliminating  data  silos;  (3)  cleaning  data  /  repairing   metadata;  (4)  embracing  contemporary  data  science.   Those  are  prerequisites,  there  are  no  short  cuts  in  AI.  
 Plus,  there’s  an  ongoing  talent  crunch.   –  consensus  among  major  consulting  firms,  
      Strata  2017  Exec  Briefings
  • 43. Looking  ahead  2018:  people  trends Indications:    organizations  embracing  circularities,  focused   on  optimizing  for  fitness  functions  (populations  of  priorities,   longer-­‐term  ROI)  in  lieu  of  optimizing  for  objective  functions   (singular  goals,  linear  cognition,  short-­‐term  ROI)   Contra:    conflict  defined  by  “confident  personalities  vs.   confidence  intervals”,  see  goo.gl/GPYZ6v 42
  • 44. Indications: on  optimizing  for   longer-­‐term  ROI)  in  lieu  of  optimizing  for   (singular  goals,  linear  cognition,  short-­‐term  ROI) Contra: confidence  intervals”,  see   Looking  ahead  2018:  people  trends 43 Peter  Norvig:    disruptions  in  software  process  for  uncertain   domains  –  the  workflow  of  the  AI  researcher  has  been  quite   different  from  the  workflow  of  the  software  developer    
 goo.gl/XcDCZ2 François  Chollet:    “casting  the  end  goal  of  intelligence  as   the  optimization  of  an  extrinsic,  scalar  reward  function”    
 goo.gl/q7Je7D
  • 45. Summary Ahead  in  AI:  hardware  advances  force  abrupt   changes  in  software  practices  –  which  has   lagged  due  to  lack  of  infrastructure,  data   quality,  outdated  process,  etc.   HITL  (active  learning)  as  management  strategy   for  AI  addresses  broad  needs  across  industry,   especially  for  enterprise  organizations.   Big  Team  begins  to  take  its  place  in  the  formula   Big  Data  +  Big  Compute  +  Big  Models.
  • 46. Summary The  “game”  is  not  to  replace  people  –  instead  it   is  about  leveraging  AI  to  augment  staff,  so  that   organizations  can  retain  people  with  valuable   domain  expertise,  making  their  contributions   and  experience  even  more  vital.   This  is  a  personal  opinion,  which  does  not   necessarily  reflect  the  views  of  my  employer.   However,  the  views  of  my  employer…
  • 47. Why  we’ll  never  run  out  of  jobs 46
  • 48. Strata  Data   SG,  Dec  4-­‐7
 SJ,  Mar  5-­‐8
 UK,  May  21-­‐24
 CN,  Jul  12-­‐15   The  AI  Conf   CN  Apr  10-­‐13
 NY,  Apr  29-­‐May  2
 SF,  Sep  4-­‐7
 UK,  Oct  8-­‐11   JupyterCon   NY,  Aug  21-­‐24   OSCON   PDX,  Jul  16-­‐19,  2018 47
  • 49. 48 Get  Started  with   NLP  in  Python Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? updates,  reviews,  conference  summaries…   liber118.com/pxn/
 @pacoid