SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Cerrera: in-stream data analytics
cloud platform
Dmitry Kalashnikov,
Artem Bartashev,
Anastasia Mitropolskaya,
Edgar Klimov,
Natalia Gusarova
ITMO University, St.Petersburg, Russia
CERRERA
Big Data Analytics
• Big Data Analytics examines large data sets
containing a variety of data types
• Big Data Analytics has two approaches:
– Batch processing (Hadoop) – store and process later
– Stream processing (Storm) – process on-the-fly
• Stream processing cases: IoT, Sensors, Social
Networks, Logs, Stocks, Personal devices.
CERRERA
Big Data Processing Systems
We consider 3 modern approaches for batch
and stream processing:
– Cloud Solutions for Machine Learning
– Stream Processing Engines
– Cloud Solutions for Stream Data Processing
CERRERA
Cloud Solutions for
Machine Learning
Allows create and run sophisticated machine learning models in a
cloud
Strengths:
• Software as a Service
• Lots of built-in components
• Data visualization
• Visual programming
Weakness:
• Stream processing is not
supported
Azure Machine Learning
Examples
CERRERA
Stream Processing Engines
Stream Processing Engines are distributed real-time computation
systems for processing fast, large streams of data.
Examples:
Strengths:
• Stream processing support
• Fine control
Weaknesses:
• Infrastructure Management issues
• Writing lots of code
• Lack of real-time visualization
CERRERA
Cloud Solutions for
Stream Data Processing
Provides API and libraries to develop enterprise software for
stream processing and executes them in the cloud.
Strengths:
• Stream processing support
• Software as a Service
• Real-time scaling
Weaknesses:
• Writing code
• Lack of embedded real-time
visualization
Examples:
CERRERA
Cerrera: Overview
Cerrera provides a possibility to describe data stream processing
workflow, to run the processing and to get visualized results by
interacting only with the web-browser.
Features:
• Stream Processing Support
• Software as a Service
• Visualization
• Visual Programming
• Built-in components
CERRERA
Comparison between
Cerrera and others
Projects  Features
Stream
support
Visualization SaaS
Built-in
components
Visual
Programming
Cerrera Yes Yes Yes Yes Yes
Cloud Solutions
for Machine
Learning
No Yes Yes Yes Yes
Stream Processing
Engines
Yes No No Depends No
Cloud Solutions
for Stream Data
Processing
Yes No Yes Depends No
CERRERA
Cerrera: User Interface
• The data processing workflow is a directed acyclic graph.
• Nodes are processing unit.
• Edges describe data flow between nodes.
CERRERA
Use Case: Emotion and Finances
Compares sentiments about a company in Twitter with its stock
rates in real-time.
Twitter
Streamer
Yahoo
Streamer
Text
Splitter
Entity
Splitter
Entity
Splitter Text Filter
Sentiment
Analysis
MongoDB
CERRERA
Cerrera: Architecture
• Coordination System manages entire
infrastructure and work of all
subsystems.
• Code Management System translates
visual representation of the processing
workflow into Java code and builds it to
create executable artifact.
• Processing System runs workflow over
data stream.
• NoSQL DB stores processing workflow
and result data
• SQL DB stores secure user information
CERRERA
Future plans
• Early private access
• Teams
• Project sharing
• Marketplace
To stay up-to-date subscribe on:
cerrera.org
or follow
@cerrera_project
CERRERA
Q&A
CERRERA

Weitere ähnliche Inhalte

Was ist angesagt?

Resume_Achhar_Kalia
Resume_Achhar_KaliaResume_Achhar_Kalia
Resume_Achhar_Kalia
Achhar Kalia
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Was ist angesagt? (20)

Future Grid Overview 2018
Future Grid Overview 2018Future Grid Overview 2018
Future Grid Overview 2018
 
Resume_Achhar_Kalia
Resume_Achhar_KaliaResume_Achhar_Kalia
Resume_Achhar_Kalia
 
Microsoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewMicrosoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture View
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
 
Azure databricks by usama whaba khan
Azure databricks by usama whaba khanAzure databricks by usama whaba khan
Azure databricks by usama whaba khan
 
DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service
DF1 - ML - Petukhov - Azure Ml Machine Learning as a ServiceDF1 - ML - Petukhov - Azure Ml Machine Learning as a Service
DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
 
Real time analytics for streaming application v1.2
Real time analytics for streaming application v1.2Real time analytics for streaming application v1.2
Real time analytics for streaming application v1.2
 
Introduction to Azure machine learning
Introduction to Azure machine learningIntroduction to Azure machine learning
Introduction to Azure machine learning
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Disrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDisrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging Technologies
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Azure Machine Learning and Data Journeys
Azure Machine Learning and Data JourneysAzure Machine Learning and Data Journeys
Azure Machine Learning and Data Journeys
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
MCT Summit Azure automated Machine Learning
MCT Summit Azure automated Machine Learning MCT Summit Azure automated Machine Learning
MCT Summit Azure automated Machine Learning
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 

Andere mochten auch

Andere mochten auch (7)

Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Spring boot introduction
Spring boot introductionSpring boot introduction
Spring boot introduction
 
Spring boot
Spring bootSpring boot
Spring boot
 
What is tackled in the Java EE Security API (Java EE 8)
What is tackled in the Java EE Security API (Java EE 8)What is tackled in the Java EE Security API (Java EE 8)
What is tackled in the Java EE Security API (Java EE 8)
 

Ähnlich wie Cerrera DINWC2015

Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
ssuserd23711
 

Ähnlich wie Cerrera DINWC2015 (20)

Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Azure Digital Twins 2.0
Azure Digital Twins 2.0Azure Digital Twins 2.0
Azure Digital Twins 2.0
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
 
Siddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing ImplementationsSiddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing Implementations
 
mysql_pn_heatwave.pdf
mysql_pn_heatwave.pdfmysql_pn_heatwave.pdf
mysql_pn_heatwave.pdf
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 

Kürzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Cerrera DINWC2015

  • 1. Cerrera: in-stream data analytics cloud platform Dmitry Kalashnikov, Artem Bartashev, Anastasia Mitropolskaya, Edgar Klimov, Natalia Gusarova ITMO University, St.Petersburg, Russia CERRERA
  • 2. Big Data Analytics • Big Data Analytics examines large data sets containing a variety of data types • Big Data Analytics has two approaches: – Batch processing (Hadoop) – store and process later – Stream processing (Storm) – process on-the-fly • Stream processing cases: IoT, Sensors, Social Networks, Logs, Stocks, Personal devices. CERRERA
  • 3. Big Data Processing Systems We consider 3 modern approaches for batch and stream processing: – Cloud Solutions for Machine Learning – Stream Processing Engines – Cloud Solutions for Stream Data Processing CERRERA
  • 4. Cloud Solutions for Machine Learning Allows create and run sophisticated machine learning models in a cloud Strengths: • Software as a Service • Lots of built-in components • Data visualization • Visual programming Weakness: • Stream processing is not supported Azure Machine Learning Examples CERRERA
  • 5. Stream Processing Engines Stream Processing Engines are distributed real-time computation systems for processing fast, large streams of data. Examples: Strengths: • Stream processing support • Fine control Weaknesses: • Infrastructure Management issues • Writing lots of code • Lack of real-time visualization CERRERA
  • 6. Cloud Solutions for Stream Data Processing Provides API and libraries to develop enterprise software for stream processing and executes them in the cloud. Strengths: • Stream processing support • Software as a Service • Real-time scaling Weaknesses: • Writing code • Lack of embedded real-time visualization Examples: CERRERA
  • 7. Cerrera: Overview Cerrera provides a possibility to describe data stream processing workflow, to run the processing and to get visualized results by interacting only with the web-browser. Features: • Stream Processing Support • Software as a Service • Visualization • Visual Programming • Built-in components CERRERA
  • 8. Comparison between Cerrera and others Projects Features Stream support Visualization SaaS Built-in components Visual Programming Cerrera Yes Yes Yes Yes Yes Cloud Solutions for Machine Learning No Yes Yes Yes Yes Stream Processing Engines Yes No No Depends No Cloud Solutions for Stream Data Processing Yes No Yes Depends No CERRERA
  • 9. Cerrera: User Interface • The data processing workflow is a directed acyclic graph. • Nodes are processing unit. • Edges describe data flow between nodes. CERRERA
  • 10. Use Case: Emotion and Finances Compares sentiments about a company in Twitter with its stock rates in real-time. Twitter Streamer Yahoo Streamer Text Splitter Entity Splitter Entity Splitter Text Filter Sentiment Analysis MongoDB CERRERA
  • 11. Cerrera: Architecture • Coordination System manages entire infrastructure and work of all subsystems. • Code Management System translates visual representation of the processing workflow into Java code and builds it to create executable artifact. • Processing System runs workflow over data stream. • NoSQL DB stores processing workflow and result data • SQL DB stores secure user information CERRERA
  • 12. Future plans • Early private access • Teams • Project sharing • Marketplace To stay up-to-date subscribe on: cerrera.org or follow @cerrera_project CERRERA

Hinweis der Redaktion

  1. Hello, I’m Kalashnikov Dmitry, ITMO University. Today I’d like to tell you about Cerrera, cloud platform for in-stream data analytics.
  2. In the modern world, big data analysis plays a key role in scientists’ life. Ability to process huge amount of available information opens a way to previously unimaginable researches. There are two main approaches for Big Data Analytics: batch processing and stream processing. Big Data community recognized the usefulness and power of batch-oriented data computation quite a long time ago. Lots of different systems like Hadoop, YARN or Pig were created to simplify Big Data analytics. However in the last couple of years, in-stream processing is becoming more and more important player in the big data arena. Stream Analytics is tremendously different from the standard batch methods. We consider stream as unbound continuous flow of heterogeneous data which should be processed on time. It’s not possible to stop stream or collect data to process them later since often computation should be done on time. In what cases such requirements might arise? At first, getting data using sensors are one of the most noticeable use case. Wide spread of ubiquitous computation and Internet of Things have been driving researches in the field of stream processing for last couple of years. Next, Social networks are one of the main producer of data in the Internet. Data analysts found that social networks trends are extremely volatile therefore without real-time computation it’s possible to become outdated fast. Software logs are another important case since it’s crucial to detect failure or problems as soon as possible. Stocks obviously require immediate response to changes.
  3. Nowadays there are lots of different systems and platforms for big data analytics. We consider 3 specific modern approaches that are in some sense similar to the Cerrera platform: Cloud Solutions for Machine Learning, Stream Processing Engines, Cloud Solutions for Stream Data Processing.
  4. The first group is cloud solutions for machine learning. Such systems greatly simplify data analysis due to the couple of reasons. At first, all infrastructure issues are cared away because of Software as a Service nature. It means that a researcher does not have to think about how to run computation and scale it, where to displace processing systems and store results. Also these solutions provide a great deal of pre-defined machine learning and data analysis components to accelerate model development. Another points are data visualization and visual programming. The latter is quite important because it frees researcher from knowing about specific and disturbing questions about underlying computation systems. Also it allows to concentrate on model development and not on programming. However the huge disadvantage of these systems is weak or lack of stream computation support. Thereby these systems cannot be easily used for on time processing.
  5. Stream Processing Engines take a key position in Big Data Stream Analytics since they are bases for any modern development or research that are connected with in-stream computations. These technologies are fault-tolerant, scalable and allow writing code in several programming languages like Python, Java or Scala. Using these Engines, a researcher has fine control over computation, systems displacement and configuration. On the other hand, it means that all administration issues will disturb the scientist from data analysis and model building. Moreover, it’s necessary to know about tricky programming questions about how to run computation properly and without bottlenecks that are specific for the engine. The last but not least is lack of visualization. Of course, it’s an obvious point but looking for additional systems for visualization usually only increases number of issues.
  6. The third group is cloud solutions for stream data processing. Such platforms bring power of stream processing engines into the cloud and therefore wipe some weaknesses of the previous group. There is no more tough questions about infrastructure. Moreover, most of solutions provide real-time autoscaling and computation redistribution to achieve required performance. This omit administration questions but there are still issues around writing code and visualization. Now researches should investigate particular programming questions of these platforms and in some cases even learn new specific languages. Lack of integrated visualization is also a common issue. To deal with it companies advice to use their other products or third-party solutions.
  7. To address previously mentioned weaknesses, we designed Cerrera. Cerrera is a cloud-based data stream processing platform that lets researcher concentrate on solving their scientific problems but not the administrator’s and developer’s ones. Cerrera provides a possibility to describe data stream processing workflow, to run the processing and to get visualized results by interacting only with the web-browser. To support the features Cerrera has several crucial points. At first, it incorporates stream processing engine with support of real-time data processing. Second, the infrastructure is cloud-based, distributed and fault-tolerant. Next, Cerrera has web interface to control over workflow execution. It also supports real-time visualization and allows describing of the workflow using visual programming technique. Another point is built-in components set of machine learning algorithms and statistical methods. This set can also be expanded by the user with own components.
  8. On the slide you can see a spreadsheet where all previously mentioned systems are compared by specific features: (1) stream processing support, (2) visualization, (3) SaaS, (4) built-in components and (5) visual programming. By “yes” we mean full support of the feautre. “Depends” mean semi-support. For example, there are machine learning libraries for some Stream processing Engines, for example, Spark MLLib. Some Cloud Solutions for stream data processing also provide a few components like aggregation window. However we would not like to say that this is full support of the feature.
  9. On the slide you can see Cerrera’s prototype of web-based user interface. Its functionality is primarily focused on building processing workflow. Besides that, users can manage workflow lifecycle, visualize results of the processing and export data. (1) The main space is taken by workflow construction area. Users can simply drag and drop process units to this area and connect them. (2) Components are displaced at the bottom control panel. (3) Connections between two nodes are immediately checked for consistency, for example, types of inputs and outputs of nodes are verified to be the same. It’s done to decrease a number of mistakes as soon as possible. (4) Process units can be represent different elements: statistic aggregators, xml parsers, NLP modules and so on. (5) It also includes special modules, Streamers, that take data from external sources and bring them for further processing. For example, Twitter Streamer calls Twitter API to get new tweets. Some processing units have specific parameters which tunes units execution. For example, size of sliding window, regular expression or mathematical statement. After user set required parameters computation can be run. We will see a particular example of workflow a bit later. (6) Life cycle is controlled using top buttons. When computation is executed, user may open visualization window (add popup to the slide with plots) using buttons on the top.
  10. Now we will see a particular use case that can be done using Cerrera. Many scientists and analysts investigated connection between stock price changes and people’s opinions about companies on the market. On the slide you can see a workflow which compare people sentiments level about the company and changes of the company stock rates. (1) Tweets are retrieved from Twitter API using our predefined Twitter Streamer and (2) stock information is obtained using Yahoo Streamer which takes bids and asks in real-time from Yahoo Finances. (3) After that information from XML or JSON is extracted using Entity Splitter. (4, 5) Next couple of components prepare text for sentiment analysis (6) which is a job of the right top component. (7) All result data are saved into NoSQL Database for further visualization or export.
  11. Now lets move to the more technical side of the project. Internal Cerrera architecture is depicted on the slide. (1) We’ve already discussed the web-site so we will start with the heart of Cerrera – Coordination system. Coordination System manages entire Cerrera infrastructure and work of all subsystems. Besides that, it encompasses our own balancer to distribute load among homogeneous components. (2) Together, Code Management System, Code Repository and Maven Artifact Repository orchestrate code building processes. Code Management System translates JSON representation of the processing workflow into Java code and builds it in order to create executable Maven artifact. Code Management Repository keeps code of workflow and processing units. We use GitLab for this purpose. Maven Artifacts Repository in its turn stores built packages of processing units and workflows for further access and reuse. In our case, Artifactory is used as the repository. (3) If the coordination systems is a heart, then the Processing System is a brain of the Cerrera. Processing System runs processing workflow over data stream. In general, it retrieves data from external sources (such as Twitter API or sensors), makes all specified computations and saves result data into NoSQL database. (4) NoSQL databases stores results of the processing and workflow description. We use MongoDB for this purpose since workflow and results are represented in JSON format which is natural for MongoDB. It also provides a great deal of performance in our case because none of our document has references to any other one. (5) And the SQL database keeps user information, transactions and other types of strongly structured data.
  12. We are going to continue Cerrera development. Our current main goal is opening early access program for research who are interested in using Cerrera. We will provide test account for anyone who want to use Cerrera and ready to share user experience.  The second our goal is implementation teams and project sharing. We also would like to create a marketplace for processing units where users can exchange, rate and discuss different units. If you would like to participate in early access program, you may subscribe on our newsletter and we include you in it.