A Technical Dive into Defensive Trickery

A Technical Dive
Into Defensive Trickery
Dan Kaminsky
Chief Scientist
White Ops

Hi!
• I’m Dan Kaminsky
• Been fixing things for almost two decades
• Broke a big thing
• People only remember that

Mission of this talk
• You may think things are impossible.
• You may think some of these specific things are impossible.
• I want to challenge your assumptions.
• DEATH TO NIHILISM
• With the healing power of surprising data 
• Also I’ve given quite a few high level keynotes as of late and I’d like to
actually discuss the nerdery that’s consumed me this year.
• LET’S DANCE

Security Is Hard
• Denial of Service Attacks
DDoS is hard to remediate
• Cryptography
TLS is hard to deploy
• Data Loss Prevention
Attacks are hard to survive
• Code Safety
Not getting owned is hard

Make Security Easy: What we’re doing about
it
• Denial of Service Attacks
Overflowd: Let the victims of network flows, learn from Netflow
• Cryptography
JFE: Launch one Daemon, all networking is TLS secured w/ valid cert
• Data Loss Prevention
Ratelock: Make the cloud enforce security policies, including hard rate limits
• Code Safety
Not getting owned is hard
Autoclave: Run entire operating systems in tighter sandboxes than Chrome

We can do better
• We did do better, at the first O’Reilly Security Hackathon
• Led by White Ops Labs (me)
• Hosted at Code for America (awesome)
• Thanks!
• Overflowd: +Cosmo Mielke, Jeff Ward
• JFE: +David Strauss of Pantheon
• Ratelock: +Andy McMurry of getmedal.com, Mark Shlimovich
• Moar!
• Stay tuned.

Denial of Service Attacks

Someday, systems will not get hacked
• That day is not today.
• Mirai vs. Dyn == Parts of the Internet actually went down
• No defense survives 10M nodes flooding you
• When things go wrong, what can we do?
• Step 1: Communicate
• Step 0: Figure out who we’re suppose to communicate with

The Nocmonkey Curse
(Besides being called monkeys)
• 1) Spoofed Traffic
• Attackers lie about where they are on the network
• This will always be possible
• 2) Asymmetrically Routed Traffic
• Traceroute just shows how to reach your attacker
• It doesn’t show how their traffic is reaching you
• These are the problematic packets!
• 3) Bad Contact Data
• IP address ranges are large, “Autonomous systems” aren’t, contact data is stale
• Attacks are usually remediated, but it’s hard, slow, unreliable, not scaling
• Literally the opposite of what the Net is supposed to be
• Can we do better?

The Two Great Hopes
• Attacker networks hit victim networks.
• They’re not directly connected – many parties in the middle.
• 1) Everyone monitors their networks
• At least for traffic management and capacity planning
• Generally use Netflow – provides source/dest metrics with light protocol
analysis
• 2) Not everyone on the Internet is a jerk
• And even if they are, getting abuse calls is annoying, and the big floods are
bad for business
• Many would act, if the benefit was incremental and the risk was low

Netflow usually just goes to a
network’s own operators, and mass
aggregators.
Maybe just a little should flow to the
networks being affected.

If they already knew, why do we
have to call them?

Overflowd:
Stochastic Traffic Factoring Utility
1/1M packets cause anti-abuse
metadata to be sent to source and
dest, by Netflow infrastructure.
https://github.com/dakami/overflowd

Demo
• {'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888,
'daddr': '122.166.77.74', 'pcount': 17001…
• Whitelisted flow metadata, so recipient can match
• 'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca…
'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT…
• NaCl Signatures, unchained for now
• “Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other
networks”
• 'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time':
1477778027.138109}}
• Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS,
HUMAN_CONFIRMED_PLEASE_CONTACT, etc
• ‘contact’: {‘email’: ‘dan@whiteops.com’}

Still Deciding On Channels
• 65535/udp
• Theend
• Doesn’t require acknowledgement, does need fragmentation
• ICMP
• Would follow packets further along route, maybe
• Might get dropped earlier too
• HTTP/HTTPS
• Many networks have an easier time picking up .well-known web paths
• Can’t just be passively received
• TODO

Explicit Plan
• We have no idea how precisely this data would be, or should be
consumed
• We do know we don’t want to share more much more data than legitimate
person should already know
• Not sending raw netflow, not sending at high rates
• May send faster on known badness – badness and packet count are not
equal!
• We think interesting and useful things would be built in the presence
over overflowd

Cryptography:

That’s just one service. Here’s more.

Has Anyone Ever Not Seen This?

Well, at least nobody’s judging you for a not
entirely perfect TLS suite…

Those are secure configurations.
Here’s the insecure one.
•

Reality
• TLS required certificate authorities
• Certificate authorities required bizdudes
• Software vendors couldn’t automate bizdudes
• Software vendors couldn’t automate TLS
• Software vendors could and did automate listening on standard ports
• Just not with security
• The TLS mess chains back to the devops non-viability of
automatically acquiring certificates

We Live In The (Near) Future
• Let’s Encrypt
• Free Certificate Authority
• Allows Automatic Certificate
Provisioning using open ACME
protocol
• Services can in fact
autoprovision certificates now!
• Caddy
• HAProxy
• Nginx

# ./jfe -D
https://github.com/dakami/jfe

# curl http://163.jfe.example
hello worl

# curl https://163.jfe.example
hello worl

# curl
https://163.jfe.example:40080
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

What’s Going On Here
(That you didn’t know existed)
• iptables -t mangle -A PREROUTING -p tcp --dport 23:65535 ! -d 127.0.0.1 -j
TPROXY --tproxy-mark 0x1/0x1 --on-port 1
• Grab all traffic from port 23 through 65K, send it to port 1
• self.sock.setsockopt(socket.SOL_IP, 19, 1) # IP_TRANSPARENT
• Allow listener on Port 1, to received traffic from other IPs and Ports
• sniff = client.recv(128, socket.MSG_PEEK).split("n")[0]
• Sniff the first 128 bytes on the socket, without actually “draining” from it
• ctx.set_servername_callback(on_sni)
• Do things (like get a new cert) during initial handshaking
• cert=free_tls_certificates.certbotClient.issue_certificate(…)
• Get cert from Let’s Encrypt (with a little help)

JFE Just Works
Full system TLS! Fully patched!
Could support other protocols/wrappers!

Bugs! We got ‘em!
• Trusts the client for the name to acquire
• Zero configuration == Attacker configuration
• Some efforts at validation but incomplete for now
• Rate limits at Let’s Encrypt can be problematic
• Low Performance
• Threading model only thing that survives blocking network in free_tls_certs
• Other languages have problems missing setsockopt or MSG_PEEK or or or…
• Localhost
• Connections appear to come from localhost (not great)
• Connections are routed to localhost (actually bad, things that bind to
127.0.0.1 are still exposed)

Fixing Localhost: The Plans
• IPTables TPROXY is janky and clearly nobody else has fixed this either
• Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to
just be an intercepting proxy to another host downwire
• NFTables clearly the approach to take
• New firewalling subsystem in Linux
• Could gate packet redirection with IP Address Aliases (eth0:1)
• Could gate packet redirection with cgroups (as per containers)
• Full system is powerful, full container might be easier
• More aligned with how software is generally being deployed nowadays

Also JFE TODO
• Would need to find a way to query wrapped sockets for metadata
• Should figure out how client socket wrapping might work
• Must be mandatory
• I have plans here
• Could support/detect encrypted backends
• Doesn’t matter if backend has janky crypto if it’s wrapped with something better
• Could integrate with clouds
• Open socket on client == provisioned socket on ELB w/ provisioned cert
• Amazon does do all this, other clouds do too
• DTLS? IPSec? Websockets? SSH?
• Yes, DNSSEC/DANE plays into this. Of course it does.
• Many useful things to help on.

TCP IS NOT HARD TO DEPLOY.
WHY SHOULD TLS BE?

Data Loss Prevention

Risk Management Is Not All Or Nothing
• There’s $20 in the Gas Station
Cash Register
• Not all corporate payroll for the
month of July
• But we assume if they can get
any of our data, they probably
got all of our data
• Why?

They probably got all of our data.

Our Designs Are Often “All or Nothing” Affairs
• Classical JBOS (Just a Bunch Of Servers) design
• Shared credentials
• Complex services
• Full mutual trust – root on one is root on all
• Rate limits for a database would be useless in the event of a hack
• If you can steal some data…
• …you can disable the rate limits…
• …and steal all the data.
• This is why you’re supposed to salt and stretch stored password hashes
• “After your data is lost, make it hard for an attacker to convert it back to passwords”

What is this “After” malarky?

Ratelock:
Restricting Data Loss with
Serverless Cloud Enforcement
https://github.com/dakami/ratelock

The Clouds are not JBOS.
They provide services with
authenticated semantics.
Somebody else’s problem Somebody else’s problem Somebody else’s problem

./ratelock.py add foo bar
true
(Password stored in DynamoDB, proxied through Lambda)

./ratelock.py check foo bar
true
./ratelock.py check foo wrong
false
Both checks against DynamoDB, proxied.
Lambda “invoke” right against function “ratelock” only thing required.

# while [ 1 ];
do ./ratelock.py check foo bar;
sleep 0.25; done
true ... true ... true ... true ... false ...
false ... false
The proxy starts providing false errors. The caller doesn’t have the ability to directly bypass
the proxy.
The complex server can get completely compromised. The simple policy survives.

“What if you can’t trust
Lambda?”

Here’s a string Amazon will verify, but never
leak, even to you. USEFUL!

$ ./walliam.py add
demouser 1234567
$ cat authdb.json
{"demouser":"BvL40myloWAo39hbIp
RpKOy4Skdtswcaa7WJUzWf"}
We actually create an IAM user “demouser” under a special path. We just create
the user, we don’t grant privileges. But we do get a secret key…which that isn’t.

add_user
aes = (CTR, sha256(userpw))
raw = b64decode(aws_secret)
enc = aes.encrypt(raw)
saved_pw = b64encode(enc)
The secret key is first base64 decoded, and then encrypted with the user’s
password. We save that. Why decode?

check_user
enc = b64decode(saved_pw)
aes = (CTR, sha256(userpw))
raw = aes.decrypt(enc)
aws_secret = b64encode(raw)
To invert the process, we decrypt the saved value with what is supposed to be the
user’s password, and base64 encode.

aws_secret can’t be checked offline.
They have to ask IAM. Online.
Good luck doing that 100M times.

If there’s one thing Amazon is
going to keep online, it’s IAM.

If we didn’t b64decode the Secret
Key, there’d be a simple offline attack
– post-decrypt, is it Base64?
This is why we aren’t using PyNaCl – we need encryption without integrity, for
maybe the first time ever!

Some Notes
• One of the largest e-commerce sites in the world provided required rates
for their password server
• 7/sec
• Yahoo 500M / 7 per sec = 2.26 years
• Who are we building instadump for, anyway?
• Backups can go to an asymmetric key – encrypt online, decrypt offline
• Not just for passwords, this can rate limit any sort of data loss
• Working on this
• Not just for rate loss, can apply any policy
• Notification, delay, extra approvals
• What else can we factor out to the cloud functions?
• OpenSSL Engine?

Many server breaches.
No known Lambda breaches.
No known IAM breaches.
Nice table, is it…actuarial?

#NotJustAmazon
Somebody at Google App Engine is one of us.

But what if we can’t trust the cloud?
(There have been breaches, there are
many clouds, even at single
providers…)

Code Safety
Not getting owned is hard.

“If only users would stop running
dangerous code.”

This PDF must be read.
By somebody.
That is their job.

Stop Victim Shaming.
It’s not helping.

“Why isn’t everything run in a
sandbox? Or at least AV?”

Have you ever tried to find
documentation on sandboxing.
Chrome Source Code doesn’t count.

What about Containers?
What about Docker?

docker run -it --privileged -p80:80
dakami/guachrome

GREAT FOR DEVELOPERS
Security? Is it easy?

There’s just a lot that containers need to secure:
That Chrome instance needs 98 syscalls from the
host.
• accept access arch_prctl bind brk capset chdir chmod clone close connect
creat dup epoll_create epoll_ctl epoll_wait execve exit exit_group fchmod
fchown fcntl fdatasync fstat ftruncate futex getcwd getdents getegid
geteuid getgid getpeername getpid getpriority getrlimit getsockname
getsockopt gettid getuid ioctl kill listen lseek lstat madvise mkdir mmap
mount mprotect mremap munmap nanosleep newfstatat open openat
pipe poll ppoll prctl pread pwrite read readlink recvfrom recvmsg rename
rt_sigaction rt_sigprocmask sched_getaffinity sched_setscheduler
sched_yield select sendmsg sendto setfsgid setfsuid setitimer setpriority
setrlimit set_robust_list setsockopt shmat shmctl shmget shutdown
signaldeliver sigreturn socket socketpair stat statfs times umask uname
unlink wait4 write writev

1) Why it’s 122 pages
2) How it’s not easy (for anyone)

Same code, hosted slightly differently…

All of Chrome, Docker, Linux, Java…
13 syscalls.
• futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction
rt_sigreturn readv writev close
• (Yes, shared memory maps and open files are minimal as well.)
• It is much easier to secure 13 syscalls than 98. In fact…

Actually, it looks like this.
(Plus a bit of goop to further lockdown ioctl.)
It could probably be smaller.

AutoClave:
Syscall Firewalls for VM Isolation
https://github.com/dakami/autoclave
WARNING: Lots of stuff hasn’t been pushed to master. I prioritized the code other people
helped with, and I’d do it again.

Live Demo?
Sure, go to https://autoclave.run

Linux and Windows running fine under extreme
syscall firewalls. Fully ephemeral, fully repeatable.
(Slightly wider ruleset than just described)

If you’d like to try to break out, here’s
hypervisor root (Ctrl-F2)

Who wants to have a PDF parsing
party!
(They’re even more fun than crypto
parties)

What’s going on?
• VMs have always required less of the host than containers
• Easier to secure kernel-to-kernel than userspace-to-kernel
• VMs require many more syscalls to start up, than to continue
running
• Syscall firewall is thus delayed as long as possible – until VNC/network/explicit
post-boot activation
• Probably the one significant security contribution here
• VMs can be restored from memory, I mean, they actually can
• Linux does not really allow process freeze/restore
• CRIU tries. Oh, does it try.

Bypass-shared-memory
• Patch from hyper.sh crew
• I was trying to do this myself, but they actually manage a qemu fork
• When restoring from memory, the big part is system memory. It’s read() in
during restore, not fast
• Better method: Generate memory image incrementally with
mmap/MAP_SHARED, execute new restorations with mmap/MAP_PRIVATE
• Means 100 instances share the “template state” via Copy on Write
• It’s fine, we block madvise
• (Well, now we do)
• Restores move from 5s to <250ms

Bugs
• Need to actually lock user input until system is sufficiently booted
• Fails closed, but still fails
• Need to integrate lots of usability tweaks
• Need to support slightly different syscall firewalls depending on enabled
features
• Need to integrate with hyper.sh/clear containers
• Both want to use virtfs, which requires all the syscalls, both could use virtfs-proxy-
helper, not clear fs calls are entirely proxied
• Perf, perf, perf – VMs bleed for every bit of it
• Need a solution that doesn’t require bare metal.
• This is an actual good reason a) for nested virt and b) for making nested virt
performant (it’s not)
• Add more VMs, figure out how to host this at scale!

Maybe we don’t need unikernels
to give every incoming connection
a completely fresh/ephemeral VM
We like to cheat
We like we like to cheat

Security gets a syscall firewall.
Performance gets instant boot.
Developers get free reign as root.
This is not a zero same game!
Developer Ergonomics is the best phrase.

Let’s Make Security Easy
• Finding an abuse contact was hard. Now you just look for the tracers
amongst the noise. Easy.
• TLS was hard. Now you run a daemon, and it’s just there. Easy.
• Surviving a breach was hard. Now you design your systems to lose an
amount you can live with. Easy.
• Running dangerous code was…ok, it was always easy. But now not
getting infected by that code is also easy.

#MakeSecurityEasy
Not just a hashtag. We can do this.
• HALP
• I can’t write it all!
• https://github.com/dakami
• https://labs.whiteops.com
• Another hackathon in the very near future is likely, talk to me about interest
• dan@whiteops.com

A Technical Dive into Defensive Trickery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie A Technical Dive into Defensive Trickery

Ähnlich wie A Technical Dive into Defensive Trickery (20)

Mehr von Dan Kaminsky

Mehr von Dan Kaminsky (15)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Technical Dive into Defensive Trickery