SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Data scientists help
companies make solid data-
backed decisions.
Data science - a
multidisciplinary
career
statistics
computer
science
social science
Designing
This job also happens to be the
fastest growing job in the United
States, according to LinkedIn.
It also commands lucrative median
salary of $113,000 among other
fast-growing career paths.
However, there is a
shortage of workers.
As per a report
by McKinsey, we
might soon see a
shortage of up to
250,000 data
scientists.
Hence, it would be very interesting
to look at the type of skills that
someone needs to master in order to
become a data scientist.
Since JobsPikr extracts job data from some
of the popular job boards, we selected the
job listings posted in March, 2018 on
Dice.com (a leading U.S.-based job portal).
The next step involved segregating the job
ads with job title as “Data Scientist”. Finally
we got a data set of close to 8,000 job
listings for data scientists in the US region.
In order to analyze the
skills required for this
role, we found out the
terms present in the
“job requirement”
section of the job ad
Summarizing the skills…
Python
Python has amassed a lot of interest
recently as a choice of language for
data scientists because of the following
factors:
• Open Source
• Rich community
• Lower learning curve
• Powerful libraries for data analytics
• Easier integration with databases
For example, scikit-learn is used for machine learning
algorithms, PyBrain for building Neural Networks, matplotlib for
plotting and iPython notebooks to present the analyses.
SQL
Structured Query Language
(SQL) is essential for data
scientists as it is the standard
language to communicate with
relational database
management systems
(RDBMS).
As a data scientist one has to write both simple and
complex queries to select data from tables apart from
understanding of different data formats for data
management and filtering.
R
R is a powerful language
developed in the early 90’s;
currently it is used widely for
data science, analysis and
statistical computing.
Its popularity can
be largely
attributed to the
following:
Wide range of
libraries
Strong online
community
Open source
Lower learning
curve
Java
Since Java is an old
programming language, many
enterprises already have
systems developed with this
language. This makes it easier
for the models in Java easier to
integrate.
Apart from that leading Big Data frameworks/tools like
Spark, Hive, and Hadoop are written in Java. It is also a
great choice when it comes to scalability and speed.
Hadoop
As a framework Hadoop has
gained massive popularity and
has become the de facto open
source software for reliable,
scalable, distributed computing
involving big data analytics.
SAS
This tool is a leader in the
commercial analytics space. It
has a huge set of in-built
statistical functions, good UI
(Enterprise Guide & Miner) for
any user to quickly learn and
delivers superior technical
support. However, it is
expensive and its certification
programs can also cost a lot.
Spark
Apache Spark is open source and it has
the ability to keep data resident in
memory, which can lead to faster
iterative machine learning workloads.
In addition to this, what makes it adoption stronger in
data science community is its base on Scala and in-
built machine-learning library, MLlib.
C/C++
Similar to Java, C/C++ is also
used write models and it is
critical for writing the
algorithmic extensions for R
and Python.
Scala
Any data scientist looking to
work on large data sets in a
JVM-centric stack will be using
Scala. Many of the high
performance data science
frameworks are written using
Scala owing to its amazing
concurrency support.
NoSQL
Unlike SQL, NoSQL offers an
architectural approach with
lesser constraints. In general, it
is easier to break down NoSQL
data stores, but more
complicated to query them for
complex results.
For data scientists, NoSQL can be somewhat tricky —
although the technology makes it absolutely easy to
rapidly accumulate massive data sets and rapidly
scale data stores to meet demand, it requires de-
normalization of data.
Tableau
VizQL (Visual Query Language)
is Tableau’s database
visualization language which
queries relational databases,
cubes, cloud databases, and
spreadsheets, and then
generates wide range of graphs
and chart.
MATLAB
Although MATLAB is not as
popular as R or Python in the
data science space, it still has a
lot of traction in the academia.
Also, it is a commercial app
with high cost and good
customer support.
Hive
This is a popular data warehouse
software in the Hadoop
ecosystem that helps data
scientists in data transformation
and analysis.
It provides an SQL-like interface to query data stored
in various databases and file systems that integrate
with Hadoop.
Excel
Microsoft Excel can be
considered as a bridge
application for very quick
filtering and data analysis using
in-built statistical methods.
However, it becomes powerful
when combined with Visual
Basic. Check out the examples
for building your own Excel-
based neural
network and Monte Carlo
simulations.
Cassandra
Apache Cassandra is an open source
distributed NoSQL database
management system designed to
handle large amounts of data across
many commodity servers.
As this database was developed for Facebook, where
millions of reads and writes happen at each given
second, its performance is far superior.
MapReduce
It is a programming model that
allows for massive scalability
across hundreds or thousands
of servers in a Hadoop cluster.
Simply going by the name, MapReduce
consists of two steps: Mapping and
Reducing the data:
Mapping sorts and filters
a data set
Reducing it allows a
certain calculation on the
resulting information
TensorFlow
This is the open source
framework developed by
Google Brain Team for machine
learning and deep neural
networks research.
Pig
It is a high level scripting
language used for operating on
large data sets inside Hadoop.
It primarily used to apply
schema and transform data.
JobsPikr
Clean and up-to-date job feeds directly from company websites and job
boards
www.jobspikr.com | sales@promptcloud.com

Weitere ähnliche Inhalte

Was ist angesagt?

Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark ZaranTech LLC
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
 
DW Appliance
DW ApplianceDW Appliance
DW ApplianceShankar R
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked DataJuan Sequeda
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 

Was ist angesagt? (20)

Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
1. what is hadoop part 1
1. what is hadoop   part 11. what is hadoop   part 1
1. what is hadoop part 1
 
Bigdata
BigdataBigdata
Bigdata
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
BigData
BigDataBigData
BigData
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked Data
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 

Ähnlich wie Data science skills summary: Python, SQL, R top requirements

2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadDeborah Gastineau
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceFerdin Joe John Joseph PhD
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Sheena Crouch
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Top 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTop 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTeqforce Solutions
 
Top 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTop 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTeqforce Solutions
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 

Ähnlich wie Data science skills summary: Python, SQL, R top requirements (20)

2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) Had
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
tools
toolstools
tools
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Top 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTop 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & Analytics
 
Top 5 Trends in Big Data & Analytics.
Top 5 Trends in Big Data & Analytics.Top 5 Trends in Big Data & Analytics.
Top 5 Trends in Big Data & Analytics.
 
Top 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & AnalyticsTop 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & Analytics
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
DATA SCIENCE
DATA SCIENCEDATA SCIENCE
DATA SCIENCE
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
963
963963
963
 
00 hadoop welcome_transcript
00 hadoop welcome_transcript00 hadoop welcome_transcript
00 hadoop welcome_transcript
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 

Mehr von JobsPikr

JobsPikr - Automated Job Discovery Tool
JobsPikr - Automated Job Discovery ToolJobsPikr - Automated Job Discovery Tool
JobsPikr - Automated Job Discovery ToolJobsPikr
 
Top Job Trends Going into 2019
Top Job Trends Going into 2019Top Job Trends Going into 2019
Top Job Trends Going into 2019JobsPikr
 
How JobsPikr can be used for Labor Analytics
How JobsPikr can be used for Labor AnalyticsHow JobsPikr can be used for Labor Analytics
How JobsPikr can be used for Labor AnalyticsJobsPikr
 
Fueling your Job Boards using Job Feeds from JobsPikr
 Fueling your Job Boards using Job Feeds from JobsPikr Fueling your Job Boards using Job Feeds from JobsPikr
Fueling your Job Boards using Job Feeds from JobsPikrJobsPikr
 
Top Hiring Companies from around the World
Top Hiring Companies from around the WorldTop Hiring Companies from around the World
Top Hiring Companies from around the WorldJobsPikr
 
How To Use JobsPikr
How To Use JobsPikrHow To Use JobsPikr
How To Use JobsPikrJobsPikr
 

Mehr von JobsPikr (6)

JobsPikr - Automated Job Discovery Tool
JobsPikr - Automated Job Discovery ToolJobsPikr - Automated Job Discovery Tool
JobsPikr - Automated Job Discovery Tool
 
Top Job Trends Going into 2019
Top Job Trends Going into 2019Top Job Trends Going into 2019
Top Job Trends Going into 2019
 
How JobsPikr can be used for Labor Analytics
How JobsPikr can be used for Labor AnalyticsHow JobsPikr can be used for Labor Analytics
How JobsPikr can be used for Labor Analytics
 
Fueling your Job Boards using Job Feeds from JobsPikr
 Fueling your Job Boards using Job Feeds from JobsPikr Fueling your Job Boards using Job Feeds from JobsPikr
Fueling your Job Boards using Job Feeds from JobsPikr
 
Top Hiring Companies from around the World
Top Hiring Companies from around the WorldTop Hiring Companies from around the World
Top Hiring Companies from around the World
 
How To Use JobsPikr
How To Use JobsPikrHow To Use JobsPikr
How To Use JobsPikr
 

Data science skills summary: Python, SQL, R top requirements

  • 1.
  • 2. Data scientists help companies make solid data- backed decisions.
  • 3. Data science - a multidisciplinary career statistics computer science social science Designing
  • 4. This job also happens to be the fastest growing job in the United States, according to LinkedIn.
  • 5. It also commands lucrative median salary of $113,000 among other fast-growing career paths.
  • 6. However, there is a shortage of workers. As per a report by McKinsey, we might soon see a shortage of up to 250,000 data scientists.
  • 7. Hence, it would be very interesting to look at the type of skills that someone needs to master in order to become a data scientist.
  • 8. Since JobsPikr extracts job data from some of the popular job boards, we selected the job listings posted in March, 2018 on Dice.com (a leading U.S.-based job portal).
  • 9. The next step involved segregating the job ads with job title as “Data Scientist”. Finally we got a data set of close to 8,000 job listings for data scientists in the US region.
  • 10. In order to analyze the skills required for this role, we found out the terms present in the “job requirement” section of the job ad
  • 11.
  • 13. Python Python has amassed a lot of interest recently as a choice of language for data scientists because of the following factors: • Open Source • Rich community • Lower learning curve • Powerful libraries for data analytics • Easier integration with databases
  • 14. For example, scikit-learn is used for machine learning algorithms, PyBrain for building Neural Networks, matplotlib for plotting and iPython notebooks to present the analyses.
  • 15. SQL Structured Query Language (SQL) is essential for data scientists as it is the standard language to communicate with relational database management systems (RDBMS).
  • 16. As a data scientist one has to write both simple and complex queries to select data from tables apart from understanding of different data formats for data management and filtering.
  • 17. R R is a powerful language developed in the early 90’s; currently it is used widely for data science, analysis and statistical computing.
  • 18. Its popularity can be largely attributed to the following: Wide range of libraries Strong online community Open source Lower learning curve
  • 19. Java Since Java is an old programming language, many enterprises already have systems developed with this language. This makes it easier for the models in Java easier to integrate.
  • 20. Apart from that leading Big Data frameworks/tools like Spark, Hive, and Hadoop are written in Java. It is also a great choice when it comes to scalability and speed.
  • 21. Hadoop As a framework Hadoop has gained massive popularity and has become the de facto open source software for reliable, scalable, distributed computing involving big data analytics.
  • 22. SAS This tool is a leader in the commercial analytics space. It has a huge set of in-built statistical functions, good UI (Enterprise Guide & Miner) for any user to quickly learn and delivers superior technical support. However, it is expensive and its certification programs can also cost a lot.
  • 23. Spark Apache Spark is open source and it has the ability to keep data resident in memory, which can lead to faster iterative machine learning workloads.
  • 24. In addition to this, what makes it adoption stronger in data science community is its base on Scala and in- built machine-learning library, MLlib.
  • 25. C/C++ Similar to Java, C/C++ is also used write models and it is critical for writing the algorithmic extensions for R and Python.
  • 26. Scala Any data scientist looking to work on large data sets in a JVM-centric stack will be using Scala. Many of the high performance data science frameworks are written using Scala owing to its amazing concurrency support.
  • 27. NoSQL Unlike SQL, NoSQL offers an architectural approach with lesser constraints. In general, it is easier to break down NoSQL data stores, but more complicated to query them for complex results.
  • 28. For data scientists, NoSQL can be somewhat tricky — although the technology makes it absolutely easy to rapidly accumulate massive data sets and rapidly scale data stores to meet demand, it requires de- normalization of data.
  • 29. Tableau VizQL (Visual Query Language) is Tableau’s database visualization language which queries relational databases, cubes, cloud databases, and spreadsheets, and then generates wide range of graphs and chart.
  • 30. MATLAB Although MATLAB is not as popular as R or Python in the data science space, it still has a lot of traction in the academia. Also, it is a commercial app with high cost and good customer support.
  • 31. Hive This is a popular data warehouse software in the Hadoop ecosystem that helps data scientists in data transformation and analysis.
  • 32. It provides an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.
  • 33. Excel Microsoft Excel can be considered as a bridge application for very quick filtering and data analysis using in-built statistical methods. However, it becomes powerful when combined with Visual Basic. Check out the examples for building your own Excel- based neural network and Monte Carlo simulations.
  • 34. Cassandra Apache Cassandra is an open source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers.
  • 35. As this database was developed for Facebook, where millions of reads and writes happen at each given second, its performance is far superior.
  • 36. MapReduce It is a programming model that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster.
  • 37. Simply going by the name, MapReduce consists of two steps: Mapping and Reducing the data: Mapping sorts and filters a data set Reducing it allows a certain calculation on the resulting information
  • 38. TensorFlow This is the open source framework developed by Google Brain Team for machine learning and deep neural networks research.
  • 39. Pig It is a high level scripting language used for operating on large data sets inside Hadoop. It primarily used to apply schema and transform data.
  • 40. JobsPikr Clean and up-to-date job feeds directly from company websites and job boards www.jobspikr.com | sales@promptcloud.com