2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
AWS Webcast - Tableau Big Data Solution Showcase
1. From weeks to hours: how Tableau and AWS changed big data analytics
AWS Big Data Solution Showcase
The recording of this webinar is available here:
https://connect.awswebcasts.com/p8hwp1gyvtd
3. Agenda
Everything you need to be up and running AWS + Tableau
– AWS big data related services
– Tableau analytics on AWS
– Live demo
– Customer success story: Mixpo
– Q & A
4. Big data & AWS
Technologies and techniques for working productively
with data, at any scale.
5. Big data Cloud computing
Big data and AWS Cloud computing
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not steady-state
workload; peaks and valleys
• Hard to configure/manage the
Infrastructure
• Massive, virtually unlimited
capacity
• Iterative, experimental style of
infrastructure deployment/usage
• Elasticity for highly variable
workloads
• Managed services for data
storage and analysis
6. AWS Data Services
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day
EBS EC2
Instance
RDS Redshift
Relational
EMR
Hadoop
DynamoDB
NoSQL
Kinesis
Stream
Storage
S3 Glacier
Elasticache
Caching
Data
Pipeline
Orchestrate
7. Store anything
Object storage
Scalable
Designed for 99.999999999% durability
Amazon
S3
8. Real-time processing
High throughput; elastic
Easy to use
EMR, S3, Redshift, DynamoDB Integration
Amazon
Kinesis
9. NoSQL Database
Seamless scalability
Zero admin
Single digit millisecond latency
Amazon
DynamoDB
14. • The Opportunity of the Cloud
Time to Implement
Total Cost of Ownership
Access. Anywhere. Anytime. Any
Device.
15.
16.
17.
18.
19.
20. Amazon Web Services and
Tableau together make seeing,
exploring, analyzing, and reporting
off of Big Data an achievable
everyday task for the everyday
person.
54. Q&A Session Transcription
Question
Answer(s)
Resource(s)
I've noticed that Tableau Extracts for detailed data takes a longtime to create (even using RDS). Any recommendations on how to reduce how long it takes to create the initial extract
Tableau Data Extracts that take a long time to create can usually be traced back to one of two things: 1. A slow data environment, or 2. "Long Data" -a table that has quite a few columns (100+). If RDS is the data source, then it might be a number of columns issue. You might try excluding any data columns you're not using in your analysis when you take a Tableau Data Extract. While setting up the extract, there is an option to hide unused columns. This effectively doesn't bring them into the Tableau Data Extract.
* http://bensullins.com/leveraging-your-tableau-server- to-create-large-data-extracts/
Can we host tableau server locally within our internal network?
Tableau Server can absolutely be hosted internally within your organization's network and still take full advantage of hosted Amazon Web Service data environments like Redshift, EMR, and RDS. Depening on your organization's use case for sharing interactive analytics, some Tableau customers will deploy one instance of Tableau Server to an internal network for internal reporting and/or staging. They will also choose to host a second instance of Tableau Server in an EC2 instance in order to serve customers or partners with analytic reports and applications without having to open ports in their fire wall.
*http://www.tableausoftware.com/learn/whitepapers/ensuring-high-availability*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf
What challenges do you find for organizations to adopt Tableau, do you run into embedded structures that might be threatened by how it empowers non-technical end users?
Many organizations will begin adopting both Tableau Desktop and Tableau Server from the business side and then after some time IT will become involved to help manage and further support Tableau deployments. Often times the IT group is very excited to help support Tableau adoption once they realize that it has the power to let them focus strategic projects as opposed to needing to support analytic efforts (refreshing locla data sources, reporting queue, etc.). Since Tableau supports a true self-service Business Inteligence model where business users can engage with data directly, this results in IT being able to stay focused on platform health. When Tableau is combined with AWS solutions like Redshift, EMR, and RDS the overhead for IT to manage data environments becomes even less. Hosting Tableau Server in AWS EC2 goes even further to help IT organizations manage the capabilities and costs of their overall platform.
* http://www.tableausoftware.com/drive
Where can I get more information about Tableau Server on VPC
Tableau has a published quickstart guide on hosting Tableau Server in the AWS cloud leveraging a VPC. You can also refer to our walk through guide on our community forum page.
*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf*http://community.tableausoftware.com/thread/135464
Is this HIPAA secure?
Tableau Answer: Tableau is used by many Healthcare organizations in the United States who must meet HIPAA compliance. This is accomplished in several ways -all depending on the unique data environments and requirements of each institution. Please the Tableau Forum thread where this is discussed by several of those healthcare institutions. AWS Answer:Yes, Redshift is HIPAA complaint...and you can take advantage of feautures like built in encryption to run HIPAA compliant workloads on AWS
*http://community.tableausoftware.com/message/194129
55. Q&A Session Transcription (cont.)
Question
Answer(s)
Resource(s)
Using Tableau 8, it was not possible to mix data from Orace, SQL Server in one analysis. Is this still true?.
Tableau has the ability to take query results from multiple data sources such as Redshift, SQL Server, Oracle, Salesforce, Splunk, Hadoop (to name a few) and actually aggregate them on the fly. We call this process Data Blending and it requresno SQL query writing to accomplish since Tableau can dynamically detect like fields and use those as blending keys. This capability is incredibly powerful especially quickly needing to evaluate the value/veracity of data sources that may want to be added to an Amazon Redshift environment.
* http://www.tableausoftware.com/videos/data- integration
I'm using Tableau with Redshift with some billions of rows of aggregated data. The queries, especially when using joins, are tens of seconds or minutes --which is just too much for explorative analysis (I'd want max 10 seconds per query). Are there easy ways to sample the data in Tableau?
Tableau doesn't have an automatic way for sampling data from a connection. If performance is an issue with queries coming from a Redshift environment I highly suggest exploring some of the tuning techniques listed in the joint Tableau and Amazon Whitepaper.
*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau-software- deployment-better-performance
Can Ibuild analytics in tableau by connecting to a MDM source and Big data information from AWS cloud services? How are the keys and joins resolved?
Tableau Answer:Tableau helps both business and IT groups jointly keep data safe and secure inside organizations. MDM solutions often play a role in how this is accomplished and often differ depeningon the technology, approach, and goal of the ogranizationitself.
Any university teach about Tableau?
Many Universities have started incorporating Tableau into their acamdemicprograms for a variety of courses. In support of academic institutions using Tableau for learning environments, Tableau has started the "Tableau for Teching" program which allows any full time student (elementary, high school, collegiate) as well as instrcutorsat fully acreditedinstitutions to use Tableau for free.
* http://www.tableausoftware.com/academic
Is it possible to get what is the # of CPU on the Tableau server which was handling 23 million rows ?
Technical specification recommendations for Tableau Server implementations are readily available.
* http://www.tableausoftware.com/products/techspecs
Is this how it looks for an end user or is this the admin interface?
The majority of the demonstration during the webinar was Tableau Desktop which would be considered the report author's view. Hosted Tableau server views designed purely for interactive consumption do not offer the creation aspect seen in Tableau Desktop. Please see the accompanying link that shows a final Tableau Server example.
*https://demodepot.tableausoftware.com/views/SecuritiesTechnical/1#1
56. Q&A Session Transcription (cont.)
Question
Answer(s)
Resource(s)
Can you share a dashboard with another tableau professional desktop user without creating an extract? (by sharing the connection to redshift)?
Tableau allows for workbook files to be shared between Tableau Desktop users that do not require extracted data. The Tableau Workbook file (extension .twb) contatins the analytics, but no local data - just a memory of how to connect back up to Amazon Redshift.
*http://www.theinformationlab.co.uk/2013/12/02/tableau-file-types-and-extensions/
How does the speed of querying a dataset on Redshift compare with querying a Tableau Data Extract data source on a Tableau server?
Performance of Tableau queries against Amazon Redshift as a datasource vs. a Tableau Data Extract hosted on Tableau Server is totally dependent on the type of data and complexity of the query. From a scalability standpoint, Amazon Redshift may be the better choice for bigger datasets given it's ability to elastically provision more compute power.
If you are building out a data model for tableau dashboards, should you use vertical or horizontal data structures for your data marts?
Tableau works best with vertical data structures.
How mac version of tableau connects with Redshift?
Tableau Desktop Professional for the Mac leverages the same ODBC based connection approach for working with Amazon Redshift as it does Tableau Desktop for Windows.
You can find the drivers here: http://www.tableausoftware.com/support/drivers
Do you have a testing version of Tableau? With some testing datasets, that would allow one to practice design dashboards and day to day analytics, please :)
For anyone interested in using Tableau to experiment with building visual analytics and leveraging Amazon Redshift, I highly recommend trying the AWS test drive page set up by Slalom Consulting.
* https://www.slalom.com/aws
We have a non performing platform hosted locally with slow response times when you interact in Tableau. Would simply putting the tableau extract on Redshift result in a boost in performance?
Tableau Answer:If the data environment your organization is using internally is slow or not set up for analytics, I would recommend looking into Amazon Redshift or RDS. Neither of these options would require you to even need to take a Tableau Data Extract. Tableau customer Mixpo, had almost exactly this scenario and saw tremendous results leveraging Redshift.
*http://www.tableausoftware.com/learn/webinars/explore-big-data-analytics-amazon-redshift
Can Tableau Server be clustered for HA ?
Tableau Answer:Tableau can absolutely be clustered to ensure an HA (Highly Available) environment. No restrictions on the Tableau side but there are cursor limitations on the Redshift side. Please refer to the whitepaper for more details
*http://www.tableausoftware.com/sites/default/files/whitepapers/high_availablility_reduced_downtime. pdf
What are the challenges one can encounter while working with tableau on redshift? How complete is the integration of tableau and redshift? For example, will all the analytical functions that tableau generates in its SQL available in redshift?
Tableau Answer: Every organization's data and analytical requirements are unique. Knowing how to tune performanc in both Tableau and Redshift is very helpful and is covered in the joint Tableau and Amazon Redshift whitepaper
*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau- software-deployment-better-performance