Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Move Fast and Fix Things
Move Fast and Fix Things
Loading in …3
1 of 85

A Technical Dive into Defensive Trickery



Download to read offline

As delivered at the O'Reilly Security Conference 2016 in New York City.

Related Books

Free with a 30 day trial from Scribd

See all

A Technical Dive into Defensive Trickery

  1. 1. A Technical Dive Into Defensive Trickery Dan Kaminsky Chief Scientist White Ops
  2. 2. Hi! • I’m Dan Kaminsky • Been fixing things for almost two decades • Broke a big thing • People only remember that
  3. 3. Mission of this talk • You may think things are impossible. • You may think some of these specific things are impossible. • I want to challenge your assumptions. • DEATH TO NIHILISM • With the healing power of surprising data  • Also I’ve given quite a few high level keynotes as of late and I’d like to actually discuss the nerdery that’s consumed me this year. • LET’S DANCE
  4. 4. Security Is Hard • Denial of Service Attacks DDoS is hard to remediate • Cryptography TLS is hard to deploy • Data Loss Prevention Attacks are hard to survive • Code Safety Not getting owned is hard
  5. 5. Make Security Easy: What we’re doing about it • Denial of Service Attacks DDoS is hard to remediate Overflowd: Let the victims of network flows, learn from Netflow • Cryptography TLS is hard to deploy JFE: Launch one Daemon, all networking is TLS secured w/ valid cert • Data Loss Prevention Attacks are hard to survive Ratelock: Make the cloud enforce security policies, including hard rate limits • Code Safety Not getting owned is hard Autoclave: Run entire operating systems in tighter sandboxes than Chrome
  6. 6. We can do better • We did do better, at the first O’Reilly Security Hackathon • Led by White Ops Labs (me) • Hosted at Code for America (awesome) • Thanks! • Overflowd: +Cosmo Mielke, Jeff Ward • JFE: +David Strauss of Pantheon • Ratelock: +Andy McMurry of, Mark Shlimovich • Moar! • Stay tuned.
  7. 7. Denial of Service Attacks DDoS is hard to remediate
  8. 8. Someday, systems will not get hacked • That day is not today. • Mirai vs. Dyn == Parts of the Internet actually went down • No defense survives 10M nodes flooding you • When things go wrong, what can we do? • Step 1: Communicate • Step 0: Figure out who we’re suppose to communicate with
  9. 9. The Nocmonkey Curse (Besides being called monkeys) • 1) Spoofed Traffic • Attackers lie about where they are on the network • This will always be possible • 2) Asymmetrically Routed Traffic • Traceroute just shows how to reach your attacker • It doesn’t show how their traffic is reaching you • These are the problematic packets! • 3) Bad Contact Data • IP address ranges are large, “Autonomous systems” aren’t, contact data is stale • Attacks are usually remediated, but it’s hard, slow, unreliable, not scaling • Literally the opposite of what the Net is supposed to be • Can we do better?
  10. 10. The Two Great Hopes • Attacker networks hit victim networks. • They’re not directly connected – many parties in the middle. • 1) Everyone monitors their networks • At least for traffic management and capacity planning • Generally use Netflow – provides source/dest metrics with light protocol analysis • 2) Not everyone on the Internet is a jerk • And even if they are, getting abuse calls is annoying, and the big floods are bad for business • Many would act, if the benefit was incremental and the risk was low
  11. 11. Netflow usually just goes to a network’s own operators, and mass aggregators. Maybe just a little should flow to the networks being affected.
  12. 12. If they already knew, why do we have to call them?
  13. 13. Overflowd: Stochastic Traffic Factoring Utility 1/1M packets cause anti-abuse metadata to be sent to source and dest, by Netflow infrastructure.
  14. 14. Demo • {'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888, 'daddr': '', 'pcount': 17001… • Whitelisted flow metadata, so recipient can match • 'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca… 'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT… • NaCl Signatures, unchained for now • “Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other networks” • 'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time': 1477778027.138109}} • Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS, HUMAN_CONFIRMED_PLEASE_CONTACT, etc • ‘contact’: {‘email’: ‘’}
  15. 15. Still Deciding On Channels • 65535/udp • Theend • Doesn’t require acknowledgement, does need fragmentation • ICMP • Would follow packets further along route, maybe • Might get dropped earlier too • HTTP/HTTPS • Many networks have an easier time picking up .well-known web paths • Can’t just be passively received • TODO
  16. 16. Explicit Plan • We have no idea how precisely this data would be, or should be consumed • We do know we don’t want to share more much more data than legitimate person should already know • Not sending raw netflow, not sending at high rates • May send faster on known badness – badness and packet count are not equal! • We think interesting and useful things would be built in the presence over overflowd
  17. 17. Cryptography: TLS is hard to deploy
  18. 18. Crypto is hard.
  19. 19. That’s just one service. Here’s more.
  20. 20. Has Anyone Ever Not Seen This?
  21. 21. Well, at least nobody’s judging you for a not entirely perfect TLS suite…
  22. 22. Those are secure configurations. Here’s the insecure one. •
  23. 23. Reality • TLS required certificate authorities • Certificate authorities required bizdudes • Software vendors couldn’t automate bizdudes • Software vendors couldn’t automate TLS • Software vendors could and did automate listening on standard ports • Just not with security • The TLS mess chains back to the devops non-viability of automatically acquiring certificates
  24. 24. We Live In The (Near) Future • Let’s Encrypt • Free Certificate Authority • Allows Automatic Certificate Provisioning using open ACME protocol • Services can in fact autoprovision certificates now! • Caddy • HAProxy • Nginx
  25. 25. Why Not Both Everything?
  26. 26. JFE: Jump to Full Encryption
  27. 27. # ./jfe -D
  28. 28. # curl http://163.jfe.example hello worl
  29. 29. # curl https://163.jfe.example hello worl
  30. 30. # curl https://163.jfe.example:40080 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
  31. 31. What’s Going On Here (That you didn’t know existed) • iptables -t mangle -A PREROUTING -p tcp --dport 23:65535 ! -d -j TPROXY --tproxy-mark 0x1/0x1 --on-port 1 • Grab all traffic from port 23 through 65K, send it to port 1 • self.sock.setsockopt(socket.SOL_IP, 19, 1) # IP_TRANSPARENT • Allow listener on Port 1, to received traffic from other IPs and Ports • sniff = client.recv(128, socket.MSG_PEEK).split("n")[0] • Sniff the first 128 bytes on the socket, without actually “draining” from it • ctx.set_servername_callback(on_sni) • Do things (like get a new cert) during initial handshaking • cert=free_tls_certificates.certbotClient.issue_certificate(…) • Get cert from Let’s Encrypt (with a little help)
  32. 32. JFE Just Works Full system TLS! Fully patched! Could support other protocols/wrappers!
  33. 33. Bugs! We got ‘em! • Trusts the client for the name to acquire • Zero configuration == Attacker configuration • Some efforts at validation but incomplete for now • Rate limits at Let’s Encrypt can be problematic • Low Performance • Threading model only thing that survives blocking network in free_tls_certs • Other languages have problems missing setsockopt or MSG_PEEK or or or… • Localhost • Connections appear to come from localhost (not great) • Connections are routed to localhost (actually bad, things that bind to are still exposed)
  34. 34. Fixing Localhost: The Plans • IPTables TPROXY is janky and clearly nobody else has fixed this either • Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to just be an intercepting proxy to another host downwire • NFTables clearly the approach to take • New firewalling subsystem in Linux • Could gate packet redirection with IP Address Aliases (eth0:1) • Could gate packet redirection with cgroups (as per containers) • Full system is powerful, full container might be easier • More aligned with how software is generally being deployed nowadays
  35. 35. Also JFE TODO • Would need to find a way to query wrapped sockets for metadata • Should figure out how client socket wrapping might work • Must be mandatory • I have plans here • Could support/detect encrypted backends • Doesn’t matter if backend has janky crypto if it’s wrapped with something better • Could integrate with clouds • Open socket on client == provisioned socket on ELB w/ provisioned cert • Amazon does do all this, other clouds do too • DTLS? IPSec? Websockets? SSH? • Yes, DNSSEC/DANE plays into this. Of course it does. • Many useful things to help on.
  37. 37. Data Loss Prevention Attacks are hard to survive
  38. 38. Risk Management Is Not All Or Nothing • There’s $20 in the Gas Station Cash Register • Not all corporate payroll for the month of July • But we assume if they can get any of our data, they probably got all of our data • Why?
  39. 39. They probably got all of our data.
  40. 40. Our Designs Are Often “All or Nothing” Affairs • Classical JBOS (Just a Bunch Of Servers) design • Shared credentials • Complex services • Full mutual trust – root on one is root on all • Rate limits for a database would be useless in the event of a hack • If you can steal some data… • …you can disable the rate limits… • …and steal all the data. • This is why you’re supposed to salt and stretch stored password hashes • “After your data is lost, make it hard for an attacker to convert it back to passwords”
  41. 41. What is this “After” malarky?
  42. 42. Ratelock: Restricting Data Loss with Serverless Cloud Enforcement
  43. 43. The Clouds are not JBOS. They provide services with authenticated semantics. Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem
  44. 44. ./ add foo bar true (Password stored in DynamoDB, proxied through Lambda)
  45. 45. ./ check foo bar true ./ check foo wrong false Both checks against DynamoDB, proxied. Lambda “invoke” right against function “ratelock” only thing required.
  46. 46. # while [ 1 ]; do ./ check foo bar; sleep 0.25; done true ... true ... true ... true ... false ... false ... false The proxy starts providing false errors. The caller doesn’t have the ability to directly bypass the proxy. The complex server can get completely compromised. The simple policy survives.
  47. 47. “What if you can’t trust Lambda?”
  48. 48. Here’s a string Amazon will verify, but never leak, even to you. USEFUL!
  49. 49. $ ./ add demouser 1234567 $ cat authdb.json {"demouser":"BvL40myloWAo39hbIp RpKOy4Skdtswcaa7WJUzWf"} We actually create an IAM user “demouser” under a special path. We just create the user, we don’t grant privileges. But we do get a secret key…which that isn’t.
  50. 50. add_user aes = (CTR, sha256(userpw)) raw = b64decode(aws_secret) enc = aes.encrypt(raw) saved_pw = b64encode(enc) The secret key is first base64 decoded, and then encrypted with the user’s password. We save that. Why decode?
  51. 51. check_user enc = b64decode(saved_pw) aes = (CTR, sha256(userpw)) raw = aes.decrypt(enc) aws_secret = b64encode(raw) To invert the process, we decrypt the saved value with what is supposed to be the user’s password, and base64 encode.
  52. 52. aws_secret can’t be checked offline. They have to ask IAM. Online. Good luck doing that 100M times.
  53. 53. If there’s one thing Amazon is going to keep online, it’s IAM.
  54. 54. If we didn’t b64decode the Secret Key, there’d be a simple offline attack – post-decrypt, is it Base64? This is why we aren’t using PyNaCl – we need encryption without integrity, for maybe the first time ever!
  55. 55. Some Notes • One of the largest e-commerce sites in the world provided required rates for their password server • 7/sec • Yahoo 500M / 7 per sec = 2.26 years • Who are we building instadump for, anyway? • Backups can go to an asymmetric key – encrypt online, decrypt offline • Not just for passwords, this can rate limit any sort of data loss • Working on this • Not just for rate loss, can apply any policy • Notification, delay, extra approvals • What else can we factor out to the cloud functions? • OpenSSL Engine?
  56. 56. Many server breaches. No known Lambda breaches. No known IAM breaches. Nice table, is it…actuarial?
  57. 57. #NotJustAmazon Somebody at Google App Engine is one of us.
  58. 58. But what if we can’t trust the cloud? (There have been breaches, there are many clouds, even at single providers…)
  59. 59. Code Safety Not getting owned is hard.
  60. 60. “If only users would stop running dangerous code.”
  61. 61. This PDF must be read. By somebody. That is their job.
  62. 62. Stop Victim Shaming. It’s not helping.
  63. 63. “Why isn’t everything run in a sandbox? Or at least AV?”
  64. 64. Have you ever tried to find documentation on sandboxing. Chrome Source Code doesn’t count.
  65. 65. What about Containers? What about Docker?
  66. 66. docker run -it --privileged -p80:80 dakami/guachrome
  67. 67. GREAT FOR DEVELOPERS Security? Is it easy?
  68. 68. There’s just a lot that containers need to secure: That Chrome instance needs 98 syscalls from the host. • accept access arch_prctl bind brk capset chdir chmod clone close connect creat dup epoll_create epoll_ctl epoll_wait execve exit exit_group fchmod fchown fcntl fdatasync fstat ftruncate futex getcwd getdents getegid geteuid getgid getpeername getpid getpriority getrlimit getsockname getsockopt gettid getuid ioctl kill listen lseek lstat madvise mkdir mmap mount mprotect mremap munmap nanosleep newfstatat open openat pipe poll ppoll prctl pread pwrite read readlink recvfrom recvmsg rename rt_sigaction rt_sigprocmask sched_getaffinity sched_setscheduler sched_yield select sendmsg sendto setfsgid setfsuid setitimer setpriority setrlimit set_robust_list setsockopt shmat shmctl shmget shutdown signaldeliver sigreturn socket socketpair stat statfs times umask uname unlink wait4 write writev
  69. 69. 1) Why it’s 122 pages 2) How it’s not easy (for anyone)
  70. 70. Same code, hosted slightly differently…
  71. 71. All of Chrome, Docker, Linux, Java… 13 syscalls. • futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction rt_sigreturn readv writev close • (Yes, shared memory maps and open files are minimal as well.) • It is much easier to secure 13 syscalls than 98. In fact…
  72. 72. Actually, it looks like this. (Plus a bit of goop to further lockdown ioctl.) It could probably be smaller.
  73. 73. AutoClave: Syscall Firewalls for VM Isolation WARNING: Lots of stuff hasn’t been pushed to master. I prioritized the code other people helped with, and I’d do it again.
  74. 74. Live Demo? Sure, go to
  75. 75. You’ll see:
  76. 76. Linux and Windows running fine under extreme syscall firewalls. Fully ephemeral, fully repeatable. (Slightly wider ruleset than just described)
  77. 77. If you’d like to try to break out, here’s hypervisor root (Ctrl-F2)
  78. 78. Who wants to have a PDF parsing party! (They’re even more fun than crypto parties)
  79. 79. What’s going on? • VMs have always required less of the host than containers • Easier to secure kernel-to-kernel than userspace-to-kernel • VMs require many more syscalls to start up, than to continue running • Syscall firewall is thus delayed as long as possible – until VNC/network/explicit post-boot activation • Probably the one significant security contribution here • VMs can be restored from memory, I mean, they actually can • Linux does not really allow process freeze/restore • CRIU tries. Oh, does it try.
  80. 80. Bypass-shared-memory • Patch from crew • I was trying to do this myself, but they actually manage a qemu fork • When restoring from memory, the big part is system memory. It’s read() in during restore, not fast • Better method: Generate memory image incrementally with mmap/MAP_SHARED, execute new restorations with mmap/MAP_PRIVATE • Means 100 instances share the “template state” via Copy on Write • It’s fine, we block madvise • (Well, now we do) • Restores move from 5s to <250ms
  81. 81. Bugs • Need to actually lock user input until system is sufficiently booted • Fails closed, but still fails • Need to integrate lots of usability tweaks • Need to support slightly different syscall firewalls depending on enabled features • Need to integrate with containers • Both want to use virtfs, which requires all the syscalls, both could use virtfs-proxy- helper, not clear fs calls are entirely proxied • Perf, perf, perf – VMs bleed for every bit of it • Need a solution that doesn’t require bare metal. • This is an actual good reason a) for nested virt and b) for making nested virt performant (it’s not) • Add more VMs, figure out how to host this at scale!
  82. 82. Maybe we don’t need unikernels to give every incoming connection a completely fresh/ephemeral VM We like to cheat We like we like to cheat
  83. 83. Security gets a syscall firewall. Performance gets instant boot. Developers get free reign as root. This is not a zero same game! Developer Ergonomics is the best phrase.
  84. 84. Let’s Make Security Easy • Finding an abuse contact was hard. Now you just look for the tracers amongst the noise. Easy. • TLS was hard. Now you run a daemon, and it’s just there. Easy. • Surviving a breach was hard. Now you design your systems to lose an amount you can live with. Easy. • Running dangerous code was…ok, it was always easy. But now not getting infected by that code is also easy.
  85. 85. #MakeSecurityEasy Not just a hashtag. We can do this. • HALP • I can’t write it all! • • • Another hackathon in the very near future is likely, talk to me about interest •