4. Whois Nick Galbreath
Director of Engineering at Etsy covering:
– Fraud
– Security
2
– Support Engineering -06-1
– (and other stuff outside of this talk) 2012 y
is m
Software Development background in two year
e rsary
E-Commerce and Social Media anniv tsy
Books, Patents, Oh My… http://client9.com/ at E
4
5. $525,000,000 in community sales
875,000 active sellers
41MM unique visitors
15MM registered members
150 countries
5
6. What Could Possibly Go Wrong?
• Marketplace Risk like Big Auction Site
• Payment Risk like Payments Company
• Social Risk like that Big Social Network
With a member base frequently:
• New to Etsy
• New to Running a Business
• New to the Internet
Photo Credit: Rod Ramsey http://bit.ly/KnI8uB
6
7. To Make It More Interesting:
Continuous Deployment
On average, there are 50+ production code changes per day. So when we
have a problem:
Is it an operations problem?
Is it a development problem? Learn more
Is it a product problem causing http://bit.ly/KFYYlZ
complaints to come in?
Or is it an attack?
7
8. Old Workflow: #notwinning
Logging into production network (!)
Finding the right file
Unzipping the right file
Grepping
• Writing very clever scripts to extract data
• Writing more clever scripts to merge data
• Making a report – in plain text 34 minutes for
• Alerting
one day’s log
for nothing!
8
9. Splunk installed at Etsy mid-2010
"Hey. .. let's go try this NEW thing!"
(door slamming shut)
"Sorry.... we're closed.”
Steve Martin. Comedy is not pretty. 1979.
Track 8 ~2:45
Serious New Technology Fatigue
Why don’t we use a Real Database with SQL?
Grep technology works
&^#*&@^*#^%^ YAQL – Yet Another Query Language
9
10. L’Outrage
Then a colleague:
• Didn’t know Etsy’s stack (new)
• Remote and out of office
• Didn’t have production access
• Didn’t know any of my very clever scripts
• Not experienced with Splunk
• In about 30 minutes
whips up a real-time email alert for
a velocity check on a particular URL
I only have one thing
to say about this…..
10
11. OH, YEAAHH!
400+GB indexed per day
30+ TB total storage
60+ data sources from
“hundreds of servers” (via
central syslog aggregation)
11
12. Data-Driven Security
Three examples of how we use data and Splunk
to help make Etsy a safer place to conduct
business.
•Web Application Security
•Account Takeover
•Payments and PCI
That said we are barely scratching the surface
of Splunk!
Data-Driven By Mat Edelson. John Hopkins Engineering Magazine, Fall 2011
http://eng.jhu.edu/wse/magazine-fall-11/item/data-driven/
Illustration by Mark McGinnis No association, just a great article & illustration
12
14. Make Security Visible
Your peers actually are interested in security.
But are you letting them?
Turn security from a binary event into
a continuous event.
14
15. Detect the Steps
A journey of a thousand miles begins with a single step.
Lao-tzu, China 600BC
A single breach begins with a journey of a thousand steps.
Nick Galbreath, USA 2012AD
15
18. The Dumbest Check Possible for SQLi
We have some snazzy technology for detecting SQLi in Splunk, but you
don’t need it to get started:
source=access.log
(uri="*UNION+ALL*" OR uri="*UNION%20ALL*”)
Will wildly undercount but also low false positive rate
Will detect scans from various tools
Will get you started in making security visible
18
19. SQLi and Database Errors
source="error.log" ( "syntax error" NOT "smarty" NOT "ClientLogger" ) | eval event=_raw | table event'
• We use Splunk to alert on any database syntax errors too.
• SQLi attacks and probes will likely trigger a bust of syntax errors
if code doesn’t properly sanitize data
was
That e
clos
Do the same with
server 500 errors,
core dumps
19
20. Investigating Rent-A-CPU Traffic
source=“access.log”
| lookup datacenter-cidrs provider_cidr AS true_client_ip
OUTPUTNEW provider_name
| where isnotnull(provider_name)
| top provider_name
Publi
c Dat
a
S ee
Appe
ndix
20
21. SANS ISC 10K Sources
source=“access.log”
| where isnotnull(true_client_ip)
| lookup isc-bad-ips src_ip AS true_client_ip
| where isnotnull(rank)
| table true_client_ip, rank, reports, attacks, last_seen
| stats count by true_client_ip,rank
| sort rank
Public D
a
See App ta
endix
21
22. Attacker-Driven Testing
“I thought I found something but then it stopped
working…” Email to security-reports@etsy.com from ethical hacker
Attacker-driver testing augments Etsy’s proactive security measures
Splunk alerts us on potential attacks using a number of parameters
What URLs are being targeted?
Maybe they found something?
Can it be reproduced? (sometimes completely automated validation)
Fixes can be pushed out that day, if not within minutes.
22
23. Security Post-Mortems
For any security vulnerability, found either external or internal,
exploited or not, we hold “blameless post-mortems”
Use to teach about security issues
e.g. review OWASP Top 10 http://bit.ly/fXsJg6
Can we make it so this mistake doesn’t happen again or can be
automatically detected?
A Key to post-mortem is know when something started and when it
ended. Logs “at your fingertips” via Splunk helps greatly
(and absolutely essential for actual incidents)
23
25. Account Takeover
• Stolen credentials
• Brute forcing of credentials
• Using account takeover of email to
further takeover other accounts
Horrible for victim and really slow to clean up
25
26. Many Users Failing to Sign-in from One IP
'source=“info.log” log_namespace=“login”
reason="wrong password” true_client_ip!=38.117.156.X X X
| dedup etsy_username,true_client_ip
| transaction true_client_ip
| where eventcount > X X X X
| table true_client_ip,etsy_username
| geoip true_client_ip
| table true_client_ip,true_client_ip_countryname,etsy_username'
26
27. Brute Forcing Passwords?
source=”info.log”
log_namespace="login” Peop
le wil
reason="wrong password" try 10 l
true_client_ip!=38.117.156.X X X passw 0
ords
| transaction etsy_username manu
ally
| where eventcount > XXXX
| table etsy_username,true_client_ip,eventcount
| sort -eventcount
Frequency Buckets set in Splunk Dashboard
27
28. I Forgot My Password x1000
source=“/web/access.log”
request_uri=/forgot_password.php
http_method=POST
| transaction true_client_ip
| where eventcount > X X X
| table true_client_ip,eventcount
o from
| sort –eventcount Hell bia!
Ser
Not just fraud… has disclosed problems in email transport
and product problems with our reset flow
28
29. Apply the same analysis to other
things that should not change much
– Payment cards
– Email addresses
– Passwords (successful change)
– Regular physical addresses
29
30. CAPTCHA
Splunk 2x2 dashboard keeps us in-the-know on
how often CAPTCHAs are being shown,
to whom, and how often they pass.
reCAPTCHA http://www.google.com/recaptcha
30
31. Integrated into Support Tools
Splunk is glued into our internal tools used by
General Support and MITS (Marketplace Integrity /
Trust & Safety) teams.
31
33. Payments @ Etsy
Ramping up on our own payments platform
Full PCI Environment
With separate Splunk installation
This space intentionally left blank.
33
34. Alerting on Unusual Payment Activity
All the WebApp security and account take-
over rules apply, along with special checks
for payment activity
Abnormally large payments
Part of
Payment velocity a larger
paymen
Very small payments (skimming?) t risk
solution
The usual IP address checks.
34
35. Compliance and Reporting
Instead of building custom applications with fuzzy requirements
“Log it, let Splunk figure it out later”
Even the business guys can use it for ad-hoc queries.
Unexpected side effect: removing and/or changing data is really hard.
This is good. Compare to SQL. (Splunk also has a secure log system)
Easy to make reports
PCI QSA so far says this meets PCI requirements.
35
36. Internal Risk
Again, instead of build out of new
application (with fuzzy requirements)
Log It, Splunk it later.
Who, is what making what changes
Who is looking at potentially sensitive
data
And alert on it.
Used in payments and main support Etsy Support and MITS 2012
applications 100% Good Eggs
Team Etsy 2012
36
38. Acknowledgements
This presentation would not be possible without the hard work by:
Marcus Barczak Jerry Soung Zane Lackey
Operations Fraud and Risk Security
Engineering Engineering
Big thanks to everyone at Etsy in Engineering, Payments, Operations,
Support and MITS
And of course, the fine folks at Splunk!
38
39. Data and References
Datacenter IP List:
https://github.com/client9/ipcat
ISC Top Troublemaker IPs:
http://isc.sans.edu/ipsascii.html
http://isc.sans.edu/sources.html
On Security and Continuous Deployment:
http://bit.ly/KFYYlZ
Other presentations on Etsy and Security/Fraud/DevOps:
http://slidesha.re/Kw5zdV http://slidesha.re/IMaavq
http://slidesha.re/JGaU2s
39
40. Security Engineering and “Just Culture”
Treating security mistakes as “accidents” (whether exploited or not)
Based originally on health care initiatives
Patient Safety and “Just Culture”, David Marx JD
– http://psnet.ahrq.gov/resource.aspx?resourceID=1582
– http://bit.ly/LhRHaT (presentation)
John Allspaw on Blameless Post-Mortems:
http://codeascraft.etsy.com/2012/05/22/blameless-postmortems/
40
41. www.etsy.com
It’s time for questions!
Nick Galbreath
@ngalbreath
t t p : //s l i d e s h a . r e /K P v H Y u