we provide best Hadoop devlopment and hadoop admin online training.
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
hadoop training, hadoop online training, hadoop training in bangalore, hadoop training in hyderabad, best hadoop training institutes, hadoop online training in chicago, hadoop training in mumbai, hadoop training in pune, hadoop training institutes ameerpet
3. What is Big Data?
• There are humungous amount of data, available which have a
lot of meaningful insights – they need to be analysed
• Existing Online Transaction Processing (OLTP) and Business
Intelligence (BI) are not easily scalable considering cost, effort,
and manageability aspect.
• It is not just volume, but also the variety and velocity of data.
• Big data is a terminology that refers to challenges that we are
facing due to exponential volume, variety and velocity of data.
9. Shorter Time to React
• Data that enters your organization and has some kind of value
for a limited window of time
• This window usually shuts well before the data has been
transformed and loaded into a data warehouse for deeper
analysis.
• The higher the volumes of data entering your organization per
second, the bigger your challenge.
10. Data Economics
• Why Volume is good ?
– No individual record is particularly valuable
– Having every record is incredibly valuable
• Why storage decision is important ?
• How much value can I extract from every byte of data verses
the cost to save that data?
– If value > cost – then keep it online, on DB or filer
– If cost > value – I discard it or archive on tape (expensive to
throw data)
11. Data Storage
Schema Structured Un Structured
Storage Medium RDBMS Filers
Storage Reliability Very reliable Very reliable
Processing ability Very reliable unstructured schema
poses challenges
Location of
processing
SQL queries pull data
to server
Random means to
retrieve sense
Impact of data
increase
Cost increases
linearly
Cost increases
linearly
Support for Big Data No No
13. Big Data Approach
Big Data refer to
technologies that
can capture, process
and analyze data.
14. No SQL Database Types
• Key-value store
– Key can be custom or auto generated
– Value can be complex objects like XML, BLOB, JSON
etc
– Popular : DynamoDB, Azure Table Store (ATS), Riak
• Column store
– Data is stored as families of columns; high scalability
with very high performance architecture
– Examples : HBase, Cassandra, Vertica and Hypertable
15. No SQL Database Types
• Document database
– Designed to store, retrieve & manage document
oriented information; expands on key-value store
– Example: MongoDB, CouchDB
• Graph database
– Designed for data that whose relations are well
represented in graphs, usually with nodes
connected to edges
– Examples : Neo4J and Polyglot
16. Analytical Database
• An analytical database is a type of database built to store,
manage, and consume big data.
• Optimized for processing advanced analytics that involves
highly complex queries on terabytes of data and complex
statistical processing, data mining, and NLP (natural language
processing).
• Examples of analytical databases are Vertica (acquired by HP),
Aster Data (acquired by Teradata), Greenplum (acquired by
EMC), and so on.
21. Sears – Competes on Big Data
• They have data of over 100 million customers, which they
analyse to offer real-time, relevant offers to their customers.
• The solution was 3 years in the making, which included
programming that would capture, analyze, and report on
customer activity at an individual level, across all 4,000
locations.
• Sears has a Hadoop cluster of 300-nodes that is populated
with over 2 petabytes of structure customer transaction data,
sales data and supply chain data.
• Results: Sears achieved an active member base in the 8 digits,
exceeding the projected 36 month membership target in 17
months.