2. Hi!
• I’m Dan Kaminsky
• Been fixing things for almost two decades
• Broke a big thing
• People only remember that
3. Mission of this talk
• You may think things are impossible.
• You may think some of these specific things are impossible.
• I want to challenge your assumptions.
• DEATH TO NIHILISM
• With the healing power of surprising data
• Also I’ve given quite a few high level keynotes as of late and I’d like to
actually discuss the nerdery that’s consumed me this year.
• LET’S DANCE
4. Security Is Hard
• Denial of Service Attacks
DDoS is hard to remediate
• Cryptography
TLS is hard to deploy
• Data Loss Prevention
Attacks are hard to survive
• Code Safety
Not getting owned is hard
5. Make Security Easy: What we’re doing about
it
• Denial of Service Attacks
DDoS is hard to remediate
Overflowd: Let the victims of network flows, learn from Netflow
• Cryptography
TLS is hard to deploy
JFE: Launch one Daemon, all networking is TLS secured w/ valid cert
• Data Loss Prevention
Attacks are hard to survive
Ratelock: Make the cloud enforce security policies, including hard rate limits
• Code Safety
Not getting owned is hard
Autoclave: Run entire operating systems in tighter sandboxes than Chrome
6. We can do better
• We did do better, at the first O’Reilly Security Hackathon
• Led by White Ops Labs (me)
• Hosted at Code for America (awesome)
• Thanks!
• Overflowd: +Cosmo Mielke, Jeff Ward
• JFE: +David Strauss of Pantheon
• Ratelock: +Andy McMurry of getmedal.com, Mark Shlimovich
• Moar!
• Stay tuned.
8. Someday, systems will not get hacked
• That day is not today.
• Mirai vs. Dyn == Parts of the Internet actually went down
• No defense survives 10M nodes flooding you
• When things go wrong, what can we do?
• Step 1: Communicate
• Step 0: Figure out who we’re suppose to communicate with
9. The Nocmonkey Curse
(Besides being called monkeys)
• 1) Spoofed Traffic
• Attackers lie about where they are on the network
• This will always be possible
• 2) Asymmetrically Routed Traffic
• Traceroute just shows how to reach your attacker
• It doesn’t show how their traffic is reaching you
• These are the problematic packets!
• 3) Bad Contact Data
• IP address ranges are large, “Autonomous systems” aren’t, contact data is stale
• Attacks are usually remediated, but it’s hard, slow, unreliable, not scaling
• Literally the opposite of what the Net is supposed to be
• Can we do better?
10. The Two Great Hopes
• Attacker networks hit victim networks.
• They’re not directly connected – many parties in the middle.
• 1) Everyone monitors their networks
• At least for traffic management and capacity planning
• Generally use Netflow – provides source/dest metrics with light protocol
analysis
• 2) Not everyone on the Internet is a jerk
• And even if they are, getting abuse calls is annoying, and the big floods are
bad for business
• Many would act, if the benefit was incremental and the risk was low
11. Netflow usually just goes to a
network’s own operators, and mass
aggregators.
Maybe just a little should flow to the
networks being affected.
13. Overflowd:
Stochastic Traffic Factoring Utility
1/1M packets cause anti-abuse
metadata to be sent to source and
dest, by Netflow infrastructure.
https://github.com/dakami/overflowd
14. Demo
• {'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888,
'daddr': '122.166.77.74', 'pcount': 17001…
• Whitelisted flow metadata, so recipient can match
• 'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca…
'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT…
• NaCl Signatures, unchained for now
• “Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other
networks”
• 'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time':
1477778027.138109}}
• Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS,
HUMAN_CONFIRMED_PLEASE_CONTACT, etc
• ‘contact’: {‘email’: ‘dan@whiteops.com’}
15. Still Deciding On Channels
• 65535/udp
• Theend
• Doesn’t require acknowledgement, does need fragmentation
• ICMP
• Would follow packets further along route, maybe
• Might get dropped earlier too
• HTTP/HTTPS
• Many networks have an easier time picking up .well-known web paths
• Can’t just be passively received
• TODO
16. Explicit Plan
• We have no idea how precisely this data would be, or should be
consumed
• We do know we don’t want to share more much more data than legitimate
person should already know
• Not sending raw netflow, not sending at high rates
• May send faster on known badness – badness and packet count are not
equal!
• We think interesting and useful things would be built in the presence
over overflowd
23. Reality
• TLS required certificate authorities
• Certificate authorities required bizdudes
• Software vendors couldn’t automate bizdudes
• Software vendors couldn’t automate TLS
• Software vendors could and did automate listening on standard ports
• Just not with security
• The TLS mess chains back to the devops non-viability of
automatically acquiring certificates
24. We Live In The (Near) Future
• Let’s Encrypt
• Free Certificate Authority
• Allows Automatic Certificate
Provisioning using open ACME
protocol
• Services can in fact
autoprovision certificates now!
• Caddy
• HAProxy
• Nginx
31. What’s Going On Here
(That you didn’t know existed)
• iptables -t mangle -A PREROUTING -p tcp --dport 23:65535 ! -d 127.0.0.1 -j
TPROXY --tproxy-mark 0x1/0x1 --on-port 1
• Grab all traffic from port 23 through 65K, send it to port 1
• self.sock.setsockopt(socket.SOL_IP, 19, 1) # IP_TRANSPARENT
• Allow listener on Port 1, to received traffic from other IPs and Ports
• sniff = client.recv(128, socket.MSG_PEEK).split("n")[0]
• Sniff the first 128 bytes on the socket, without actually “draining” from it
• ctx.set_servername_callback(on_sni)
• Do things (like get a new cert) during initial handshaking
• cert=free_tls_certificates.certbotClient.issue_certificate(…)
• Get cert from Let’s Encrypt (with a little help)
32. JFE Just Works
Full system TLS! Fully patched!
Could support other protocols/wrappers!
33. Bugs! We got ‘em!
• Trusts the client for the name to acquire
• Zero configuration == Attacker configuration
• Some efforts at validation but incomplete for now
• Rate limits at Let’s Encrypt can be problematic
• Low Performance
• Threading model only thing that survives blocking network in free_tls_certs
• Other languages have problems missing setsockopt or MSG_PEEK or or or…
• Localhost
• Connections appear to come from localhost (not great)
• Connections are routed to localhost (actually bad, things that bind to
127.0.0.1 are still exposed)
34. Fixing Localhost: The Plans
• IPTables TPROXY is janky and clearly nobody else has fixed this either
• Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to
just be an intercepting proxy to another host downwire
• NFTables clearly the approach to take
• New firewalling subsystem in Linux
• Could gate packet redirection with IP Address Aliases (eth0:1)
• Could gate packet redirection with cgroups (as per containers)
• Full system is powerful, full container might be easier
• More aligned with how software is generally being deployed nowadays
35. Also JFE TODO
• Would need to find a way to query wrapped sockets for metadata
• Should figure out how client socket wrapping might work
• Must be mandatory
• I have plans here
• Could support/detect encrypted backends
• Doesn’t matter if backend has janky crypto if it’s wrapped with something better
• Could integrate with clouds
• Open socket on client == provisioned socket on ELB w/ provisioned cert
• Amazon does do all this, other clouds do too
• DTLS? IPSec? Websockets? SSH?
• Yes, DNSSEC/DANE plays into this. Of course it does.
• Many useful things to help on.
38. Risk Management Is Not All Or Nothing
• There’s $20 in the Gas Station
Cash Register
• Not all corporate payroll for the
month of July
• But we assume if they can get
any of our data, they probably
got all of our data
• Why?
40. Our Designs Are Often “All or Nothing” Affairs
• Classical JBOS (Just a Bunch Of Servers) design
• Shared credentials
• Complex services
• Full mutual trust – root on one is root on all
• Rate limits for a database would be useless in the event of a hack
• If you can steal some data…
• …you can disable the rate limits…
• …and steal all the data.
• This is why you’re supposed to salt and stretch stored password hashes
• “After your data is lost, make it hard for an attacker to convert it back to passwords”
43. The Clouds are not JBOS.
They provide services with
authenticated semantics.
Somebody else’s problem Somebody else’s problem Somebody else’s problem
Somebody else’s problem Somebody else’s problem Somebody else’s problem
Somebody else’s problem Somebody else’s problem Somebody else’s problem
Somebody else’s problem Somebody else’s problem Somebody else’s problem
44. ./ratelock.py add foo bar
true
(Password stored in DynamoDB, proxied through Lambda)
45. ./ratelock.py check foo bar
true
./ratelock.py check foo wrong
false
Both checks against DynamoDB, proxied.
Lambda “invoke” right against function “ratelock” only thing required.
46. # while [ 1 ];
do ./ratelock.py check foo bar;
sleep 0.25; done
true ... true ... true ... true ... false ...
false ... false
The proxy starts providing false errors. The caller doesn’t have the ability to directly bypass
the proxy.
The complex server can get completely compromised. The simple policy survives.
48. Here’s a string Amazon will verify, but never
leak, even to you. USEFUL!
49. $ ./walliam.py add
demouser 1234567
$ cat authdb.json
{"demouser":"BvL40myloWAo39hbIp
RpKOy4Skdtswcaa7WJUzWf"}
We actually create an IAM user “demouser” under a special path. We just create
the user, we don’t grant privileges. But we do get a secret key…which that isn’t.
50. add_user
aes = (CTR, sha256(userpw))
raw = b64decode(aws_secret)
enc = aes.encrypt(raw)
saved_pw = b64encode(enc)
The secret key is first base64 decoded, and then encrypted with the user’s
password. We save that. Why decode?
51. check_user
enc = b64decode(saved_pw)
aes = (CTR, sha256(userpw))
raw = aes.decrypt(enc)
aws_secret = b64encode(raw)
To invert the process, we decrypt the saved value with what is supposed to be the
user’s password, and base64 encode.
52. aws_secret can’t be checked offline.
They have to ask IAM. Online.
Good luck doing that 100M times.
53. If there’s one thing Amazon is
going to keep online, it’s IAM.
54. If we didn’t b64decode the Secret
Key, there’d be a simple offline attack
– post-decrypt, is it Base64?
This is why we aren’t using PyNaCl – we need encryption without integrity, for
maybe the first time ever!
55. Some Notes
• One of the largest e-commerce sites in the world provided required rates
for their password server
• 7/sec
• Yahoo 500M / 7 per sec = 2.26 years
• Who are we building instadump for, anyway?
• Backups can go to an asymmetric key – encrypt online, decrypt offline
• Not just for passwords, this can rate limit any sort of data loss
• Working on this
• Not just for rate loss, can apply any policy
• Notification, delay, extra approvals
• What else can we factor out to the cloud functions?
• OpenSSL Engine?
56. Many server breaches.
No known Lambda breaches.
No known IAM breaches.
Nice table, is it…actuarial?
71. All of Chrome, Docker, Linux, Java…
13 syscalls.
• futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction
rt_sigreturn readv writev close
• (Yes, shared memory maps and open files are minimal as well.)
• It is much easier to secure 13 syscalls than 98. In fact…
72. Actually, it looks like this.
(Plus a bit of goop to further lockdown ioctl.)
It could probably be smaller.
73. AutoClave:
Syscall Firewalls for VM Isolation
https://github.com/dakami/autoclave
WARNING: Lots of stuff hasn’t been pushed to master. I prioritized the code other people
helped with, and I’d do it again.
76. Linux and Windows running fine under extreme
syscall firewalls. Fully ephemeral, fully repeatable.
(Slightly wider ruleset than just described)
77. If you’d like to try to break out, here’s
hypervisor root (Ctrl-F2)
78. Who wants to have a PDF parsing
party!
(They’re even more fun than crypto
parties)
79. What’s going on?
• VMs have always required less of the host than containers
• Easier to secure kernel-to-kernel than userspace-to-kernel
• VMs require many more syscalls to start up, than to continue
running
• Syscall firewall is thus delayed as long as possible – until VNC/network/explicit
post-boot activation
• Probably the one significant security contribution here
• VMs can be restored from memory, I mean, they actually can
• Linux does not really allow process freeze/restore
• CRIU tries. Oh, does it try.
80. Bypass-shared-memory
• Patch from hyper.sh crew
• I was trying to do this myself, but they actually manage a qemu fork
• When restoring from memory, the big part is system memory. It’s read() in
during restore, not fast
• Better method: Generate memory image incrementally with
mmap/MAP_SHARED, execute new restorations with mmap/MAP_PRIVATE
• Means 100 instances share the “template state” via Copy on Write
• It’s fine, we block madvise
• (Well, now we do)
• Restores move from 5s to <250ms
81. Bugs
• Need to actually lock user input until system is sufficiently booted
• Fails closed, but still fails
• Need to integrate lots of usability tweaks
• Need to support slightly different syscall firewalls depending on enabled
features
• Need to integrate with hyper.sh/clear containers
• Both want to use virtfs, which requires all the syscalls, both could use virtfs-proxy-
helper, not clear fs calls are entirely proxied
• Perf, perf, perf – VMs bleed for every bit of it
• Need a solution that doesn’t require bare metal.
• This is an actual good reason a) for nested virt and b) for making nested virt
performant (it’s not)
• Add more VMs, figure out how to host this at scale!
82. Maybe we don’t need unikernels
to give every incoming connection
a completely fresh/ephemeral VM
We like to cheat
We like we like to cheat
83. Security gets a syscall firewall.
Performance gets instant boot.
Developers get free reign as root.
This is not a zero same game!
Developer Ergonomics is the best phrase.
84. Let’s Make Security Easy
• Finding an abuse contact was hard. Now you just look for the tracers
amongst the noise. Easy.
• TLS was hard. Now you run a daemon, and it’s just there. Easy.
• Surviving a breach was hard. Now you design your systems to lose an
amount you can live with. Easy.
• Running dangerous code was…ok, it was always easy. But now not
getting infected by that code is also easy.
85. #MakeSecurityEasy
Not just a hashtag. We can do this.
• HALP
• I can’t write it all!
• https://github.com/dakami
• https://labs.whiteops.com
• Another hackathon in the very near future is likely, talk to me about interest
• dan@whiteops.com