Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
A Simple Introduction To CMMI For Beginer
Next
Download to read offline and view in fullscreen.

Share

Paris ML meetup

Download to read offline

Slides for ML @ Netflix (Paris ML meetup talk)

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Paris ML meetup

  1. Machine Learning @ Netflix (and some lessons learned) Yves Raimond (@moustaki) Research/Engineering Manager Search & Recommendations Algorithm Engineering
  2. Netflix evolution
  3. Netflix scale ● > 69M members ● > 50 countries ● > 1000 device types ● > 3B hours/month ● 36% of peak US downstream traffic
  4. Recommendations @ Netflix ● Goal: Help members find content to watch and enjoy to maximize satisfaction and retention ● Over 80% of what people watch comes from our recommendations ● Top Picks, Because you Watched, Trending Now, Row Ordering, Evidence, Search, Search Recommendations, Personalized Genre Rows, ...
  5. ▪ Regression (Linear, logistic, elastic net) ▪ SVD and other Matrix Factorizations ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Deep Neural Networks ▪ Markov Models and Graph Algorithms ▪ Clustering ▪ Latent Dirichlet Allocation ▪ Gradient Boosted Decision Trees/Random Forests ▪ Gaussian Processes ▪ … Models & Algorithms
  6. Some lessons learned
  7. Build the offline experimentation framework first
  8. When tackling a new problem ● What offline metrics can we compute that capture what online improvements we’ re actually trying to achieve? ● How should the input data to that evaluation be constructed (train, validation, test)? ● How fast and easy is it to run a full cycle of offline experimentations? ○ Minimize time to first metric ● How replicable is the evaluation? How shareable are the results? ○ Provenance (see Dagobah) ○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)
  9. When tackling an old problem ● Same… ○ Were the metrics designed when first running experimentation in that space still appropriate now?
  10. Think about distribution from the outermost layers
  11. 1. For each combination of hyper-parameter (e.g. grid search, random search, gaussian processes…) 2. For each subset of the training data a. Multi-core learning (e.g. HogWild) b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)
  12. When to use distributed learning? ● The impact of communication overhead when building distributed ML algorithms is non-trivial ● Is your data big enough that the distribution offsets the communication overhead?
  13. Example: Uncollapsed Gibbs sampler for LDA (more details here)
  14. Design production code to be experimentation-friendly
  15. Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Missing post- processing logic Performance issues Actual outputProduction environment (A/B test) Code discrepancies Final model Data discrepancies Example development process
  16. Avoid dual implementations Shared Engine Experiment code Production code ProductionExperiment
  17. To be continued...
  18. We’re hiring! Yves Raimond (@moustaki)
  • CarolaineDiasCalhau

    Oct. 21, 2021
  • SanjayHarapanahalli

    Oct. 15, 2021
  • DavesInantay

    Oct. 12, 2021
  • SammyLargoza

    Oct. 11, 2021
  • qianzhao23

    Oct. 11, 2021
  • RyanLimaMarsola

    Oct. 7, 2021
  • AnujSharma868

    Oct. 6, 2021
  • ssuser833465

    Oct. 4, 2021
  • CarlosCordero15

    Oct. 3, 2021
  • ssuser928b54

    Oct. 2, 2021
  • SumayahMZ

    Sep. 29, 2021
  • MILDREYSALGADO

    Sep. 27, 2021
  • RahulY9

    Sep. 27, 2021
  • diezlyvirena

    Sep. 23, 2021
  • EzhilkumarPanchanath

    Sep. 17, 2021
  • AryanRana28

    Sep. 17, 2021
  • MekamMaker

    Sep. 16, 2021
  • JayantManikpure

    Sep. 16, 2021
  • usnbhat

    Sep. 9, 2021
  • SukanyaChakraborty1

    Sep. 8, 2021

Slides for ML @ Netflix (Paris ML meetup talk)

Views

Total views

92,505

On Slideshare

0

From embeds

0

Number of embeds

824

Actions

Downloads

763

Shares

0

Comments

0

Likes

4,497

×