1. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
Spark Summit 2016 recap
クリエーションライン(株)
木内
2016/7/26 Spark Summit2016報告会&データ分析勉強会 講演資料
1
2. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
自己紹介 木内 満歳(きうち みつとし)
クリエーションライン株式会社 シニアコンサルタント
Slideshare: http://www.slideshare.net/mkiuchi4
各種寄稿
a. gihyo.jp: “Mesosphere DCOSでつくるクラウドアプリケーション”
b. 日経クラウドファースト2016年6月 “Azure IoT Suiteの評価”
c. Codezine: “機械学習をクラウドで手軽に体験! BluemixのApache Spark
で異常なセンサーデータを洗い出す”
専門分野:Apache Mesos, Apache Spark, 分散コンピューティング, ク
ラウドコンピューティング, NoSQL DB, グラフDB
O’reilley Certified Developer on Apache Spark
Docker Certified Technical Trainer
2
3. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
Spark 2.0 is (still) comming...
• Easier
– SparkSQL SQL:2003 compliant
– DataFrame
– Machine learning pipeline persistence
• Faster
– 2nd-gen Tungsten Engine
Vitual function call -> “Whole-stage code generation” (=JIT
bytecode generation)
•Smarter
– Structured Streaming
3
4. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
Breakout sessionsより
いくつかピックアップ...
DATA SCIENCE
18
DEVELOPER
12
RESEARCH
18
ECOSYSTEM
18
USECASE
15
ENTERPRISE
9
4
5. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
DATA SCIENCE
http://www.slideshare.net/SparkSummit/graphframes-graph-queries-in-spark-sql
5
6. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
DEVELOPER
http://www.slideshare.net/JenAman/highperformance-python-on-spark
6
7. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
RESEARCH
http://www.slideshare.net/SparkSummit/gpu-support-in-spark-and-gpucpu-mixed-resource-scheduling-at-production-scale-63065895
7
8. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
ECOSYSTEM
http://www.slideshare.net/JenAman/efficient-state-management-with-spark-20-and-scaleout-databases
http://www.slideshare.net/JenAman/elasticsearch-and-apache-lucene-for-apache-spark-and-mllib
http://www.slideshare.net/JenAman/gpu-computing-with-apache-spark-and-python-63064880
http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
http://www.slideshare.net/JenAman/building-realtime-data-pipelines-with-kafka-connect-and-spark-streaming
http://www.slideshare.net/JenAman/netflix-productionizing-spark-on-yarn-for-etl-at-petabyte-scale
USECASE ENTERPRISE
8
9. Copyright ⓒ2016 CREATIONLINE, INC. All Rights Reserved
まとめ
• Spark2.0以降も継続的な進歩が行われる
– 性能 (Tungsten, GPU)
– 利便性・多様なユースケース (GraphFrame,DataFrame)
• エコシステムは継続的に拡充している
• おおよそ考えうる限りのユースケースが出始めている
9