Introduction to Big Data Analytics

 What is BIG DATA?
 Characteristics of Big Data
 What is BIG DATA Analysis?
 Traditional vs. Current Analytics Trends
 BIG Data using Hadoop!
 Hadoop History
 Hadoop – High Level Architecture
 Hadoop Variants
 Hadoop Skills
 NOSQL Introduction
 Big Data – Case Studies
Topics Covered
Table of Contents
2 | Oh! Session - Introduction to Big Data

What is BIG DATA?
Big Data, simply put, is data which is very BIG!
Big data is new and “ginormous”
& scary – very, very scary term.
No, wait. It is not.
Big data is a term for data sets that are so large or complex that
traditional data processing applications are inadequate.
Examples of Big Data:
SOCIAL MEDIA ACTIVITY – like Facebook, Twitter, LinkedIn, etc.
FINANCIAL TRANSACTIONS – Internet Banking logs, Share Market, etc.
LOCATION TRACKING – Global Positioning System data, etc.
WEB BEHAVIOUR – Internet browsing, Google searches, etc.

Characteristics of BIG DATA?
Big data can be described by the following characteristics:
 Volume
 The Quantity of generated & stored data. Size determines big data.
 Variety
 The Type And Nature of the data.
 Velocity
 The Speed of data generation.
 Variability
 Inconsistency of the data set
 Veracity
 The Quality of captured data can vary greatly, affecting accurate analysis.

What is BIG DATA ANALYSIS?
Big data analytics is the process of examining large data sets containing a variety of data
types i.e. Big Data – to uncover hidden patterns, unknown correlations, market trends, customer
preferences and other useful business information.
Benefits of Big Data Analytics
The analytical findings done on the Big Data can lead to:
•more effective marketing
•new revenue opportunities
•better customer service
•improved operational efficiency
•competitive advantages over rival
organizations
•& other business benefits.

Traditional vs. Current Analytics Trends
Data processing and Analytics: The old way
Traditionally, data processing analytics followed
creation of modest amounts of structured data via
enterprise applications (CRM, ERP, etc.)
The modeled & cleansed data loaded into an
enterprise data warehouse.
The extent of complexity of data analyzed was limited
to relational data only, thus TERADATA, EXADATA &
NETEZZA was running the show.
Data processing and Analytics : The New way
Currently, data is growing exponentially and the
variety has grown from text & relational (i.e.
structured) to a mix of structured, semi-structured &
un-structured data.
The analytical tools-set had to change for handling
the un-structured part of data which is why
technologies like Hadoop, SPARK, NOSQL have
become famous and have reduced the cost by
providing open source systems & resilience with
parallel processing.

BIG Data using Hadoop!
Why Hadoop?
The most well known technology, which is open source, Java-based framework
helping manage structured and unstructured data is Hadoop
It is Flexible, Scalable, Robust, Cost effective, adaptive to upcoming technologies.
Hadoop in Action:
Hadoop is a great framework for advertising companies as well. It keeps a good track of the millions of
clicks on the ads and how the users are responding to the ads posted by the big Ad agencies!
•Facebook – over 1.3 billion active users – storing, managing & keeping track of all profiles along with the
related posts, comments, images, videos, and so on.
•LinkedIn – managing over 1 billion personalized recommendations/week using Map Reduce & HDFS
features!
•Walmart – Helping handle more than 1 million customer transactions/hour
•Twitter – Managing and handling 85 million tweets from users/day
•Google – Managing more than 1 terabyte of data/hour
•eBay – handling and managing 80 terabytes of data/day and suggesting additional suitable products to
their customers
•Spadac.com – helps run spatial intelligence & predictive analytics on huge volumes of data for providing
actionable intelligence to its customers

Hadoop History!!
Brief Historical Timeline of Hadoop

Hadoop – High Level Architecture

Hadoop Variants
Major variants for Hadoop and their distribution
1. Cloudera Hadoop(CDH)
2. HortonWorks
3. MapR

Hadoop Skills

Big Data – Case Studies
1. 2012 US Presidential Election
• Barack Obama's Big Data won the
US election
2. Data Storage
• NetApp
3. Human Sciences
• NextBio

Data in this model is stored inside documents.
Documents are not typically forced to have a schema and
therefore are flexible and easy to change.
No Joins required
MONGODB
What is MONGODB?

MONGODB
Use of HADOOP with MONGODB

MONGODB
 Replicatation Possible
 Horizontal scalable
 Master Slave concept
 We can use Commodity Hardware
MONGODB
Similarities with HADOOP
HADOOP
 Replication Possible
 Horizontal scalable
 Master Slave concept
 We can use Commodity Hardware

MONGODB
 Data stores in a Database
 Data serialize
 Data can be writable any time
MONGODB
Differences with HADOOP
HADOOP
 Data stores in a File system
 Data parallelism
 One time Writable

Thank You
Feel Free to drop your queries to:
Benoy Daniel Benoy.daniel@axa-tech.com
Bibhusisa Pattanaik Bibhusisa.Pattanaik@axa-tech.com

Introduction to Big Data Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Introduction to Big Data Analytics

Similar to Introduction to Big Data Analytics (20)

Recently uploaded

Recently uploaded (20)

Introduction to Big Data Analytics

Editor's Notes