2. What is Big Data?
• Big does not have to be always
Petabytes
• Big refers to big enough for traditional
systems to handle efficiently
3. Big Data Facts
• Twitter generates 8TB of data every day
• eBay data warehouse is 10+ PB
• Facebook data warehouse is 36+ PB
• Yahoo! Has 100+ PB data
• Google scans and indexes 500+ PB data
4. Data Types
• Structured
– Pre-defined schema
– Example: relational database system
• Semi Structured
– No identifiable structure
– Cannot be stored in rows and tables in a database
– Examples : logs, tweets,
• Un Structured
– Irregular structure or it lacks structure
– Examples: free-form text, reports, customer feedback
forms
Copyright Hortonworks 2012 4
5. Characteristics of Big Data
• Volume
• Velocity
• Variety
• Value
Copyright Hortonworks 2012 5
6. Problem with Legacy Solution
• Expensive
– Scale up costs lots of $$
• Rigid
• Stale Data
Copyright Hortonworks 2012 6
7. Hadoop Approach
• Process data locally
• Expect Hardware failures
• Handle failover elegantly
• Duplicate a small percentage of the data to
small groups (versus entire database)
Hi, My Name is Abhijit Lele, I am a solutions Engineer @ hortonworks. I support our customers to understand and achieve their business and technical goals with Hadoop and Big data ecosystem in general.
So if we were to turn our original assumptions on their respective heads, we might be able to come up with an alternate set of rules, that allow for a new way of thinking about large data stores.