Why is everyone buzzing about Big Data? Here are some slides I presented at FFMassive at SXSW 2012 regarding what big data is, some of the stats, and some different approaches to solving the problem. This is very high level, oriented at folks who haven't encountered big data much.
6. CRM/customer support
POS/purchases
ERP/accounting
email/documents/collab.
BI & data warehouse
system & network logs
web logs/clickstream
google analytics/omniture
other SaaS products /
APIs
facebook/twitter/yelp/4sq
experian/epsilon/acxiom
mobile devices
sensors
machine-to-machine
product reviews
google search results
?
many terabytes of data,
sometimes many petabytes
more data than ever before
#ffmassive
9. Big Data is a collection of data sets so
large and complex that it becomes
difficult to process using on-hand
database management tools or
traditional data processing applications.
#ffmassive
Source: Wikipedia
10. The challenges include
capture, curation, storage, search,
sharing, analysis, and visualization.
#ffmassive
Source: Wikipedia
12. Who Can
Manage
Innovation and
Complexity to
Deliver Value
Quickly?
Multiple Layers of
Technology to
Integrate
Can Take Months to
Build One Analytic
Application
PROVEN VALUE
FOR RANGE OF
APPLICATIONS
EXPLOSION OF PRODUCTS
AND VENDORS
Current State
Source: Think Big Analytics
18. #1 enterprise cloud for big data
some of our customers our partners
#ffmassive
Hinweis der Redaktion
GARTNERBig data means chiefly means three things: large (big) data volume, large throughput of data per second or minute, and a large variety of different types of data to handle.Variety – the prior slide has just a small subset of the data sources our clients are excited aboutWhat do you need in order to be able to solve these problems?
So let’s dig into it. Big data is a pretty easy idea to explain: we produce data, all the time, constantly, and we produce a lot of it. Data centers now take up 1.3% of global energy usage – as much as the entire continent of Australia. So we have some similarly big challenges and even bigger opportunities.On the left on this slide I’ve listed just a few of the kinds of data sources that might be available to an agency, should they choose to ingest them. Everything from their own clients’ customer databases, to streams of tweets from Twitter, to Google search results and even forum posts, can be ingested in the pursuit of building something that generates insights for their clients.
The last few years has seen rapid pace of innovation in big data. With any new approach, new skills are required.It wasn’t that long ago that web pioneers solved one hard problem (web search ad display) with big data. They quickly rolled big data out to meet a range of applications and create lots of value.And with that came the explosion of products and vendors vying to meet this market.New skills are required that can’t be found internally or from traditional or offshore consulting firms Partnering with a team that helps you make the choices to help you manage innovation and accelerate your time to value.
Despite how well-known Hadoop is, even in the agency ecosystem, there’s still often confusion about what it actually does and what problems it solves. Let me show you an illustrative – if a little silly – example that might help you to understand exactly when and why you would use Hadoop.
Welcome to the Batch Sub Shop! We make sandwiches, lots of sandwiches. If we get a big order for 1,000 subs, we execute that order all at once. Hadoop has two phases in each calculation or job that executes, the map phase and the reduce phase. In the map phase, input data is modified, transformed, parsed, or otherwise altered or prepared. In our Batch Sub Shop, the map phase is when we slice our bread and our veggies and prepare our meat.In the reduce phase, transformed data from the map phase is assembled into the final output we want. In our Batch Sub Shop, the reduce phase is when we assemble all the sandwich orders from the sliced bread, veggies, and meats we prepared in the map phase. In a few hours, we’ll deliver a huge batch of sandwiches, fresh and delicious.For those of you who have started to evaluate Hadoop, I hope you’re getting the joke here. Hadoop is great in the same way that a caterer is great: if you have a big order and you don’t need it right away, it’s the perfect choice. Similarly, if you have a large amount of data that you need analyzed and you don’t need the result right away you should use Hadoop. This is one of the reasons that Hadoop so popular for analyzing historical data in a batch processing paradigm.
But say you were really hungry, and you just really want to eat your sandwich now. Our batch sub shop will make you wait 3 hours! Sure, you’ll get 1000 sandwiches at the 3-hour mark, but that’s not very helpful if you just wanted one right away.Similarly, Hadoop is not the right big data tool to use when you want results right away, in real-time, because not only do you have to assemble all your data in one place, as we assembled all our ingredients in one place in the batch sub shop, but you also have to wait for the full computation to finish before you get any results. This can often take hours. This makes Hadoop appropriate for batch or offline calculations that can run, say, overnight, and whose results we won’t need to see till morning.But what if we need results right away?
Enter the Streaming Sub Shop. This sub shop works like a conveyor belt. Ingredients enter on the left and as they move through the shop, we slice ‘em, dice ‘em, assemble those sandwiches, and get them toasted and served. The first sandwich will come out in just a few minutes and sandwiches will continuously be produced afterwards as they’re continuously fed in.Similarly, there are technologies complementary to Hadoop which enable this kind of stream processing of big data.
But now let’s return to the central challenge of big data: why aren’t you doing it right now? Why aren’t your competitors? It’s because it’s hard, you lack the expertise, and you haven’t or can’t hire the necessary resources – all of whom are rare and expensive.
we are a big data cloud services provider for the enterprise. we bundle together all the analytics infrastructure you need, like Hadoop, real-time analytics, and powerful databases, and provide the hosting, support, and expertise – so that you can focus on analytics and driving those business use cases and apps – not on wrangling with the complex systems