SlideShare ist ein Scribd-Unternehmen logo
1 von 83
Black Ops 2006
Viz Edition
CCC 2006
Dan Kaminsky
Director Of Penetration Testing
IOActive
Thanks and No Thanks
• Thank You To Swissotel Amsterdam, who
provided a net connection with which I
could actually finish these slides
• No Thanks to Delta Hotel of Amsterdam,
which put a TV on a really weak shelf.
– I suppose it’s my fault I put my laptop
underneath.
– The “Star System” is officially meaningless
Who Am I?
• Coauthor of several book series
– Hack Proofing Your Network
– Stealing The Network
• Formerly of Cisco and Avaya
– Presently partnering with IOActive
– One of the “Blue Hat Hackers” that has been
auditing Windows Vista
• Been doing talks for six years now
– TCP/IP, DNS, MD5, SSH, etc.
What Are We Here To Do?
• Break TCP/IP A Little More
– Not in the documentation
– It’s for a good cause ;)
• Analyze Data Linguistically
• Make Pretty Pretty Pictures!
For Various Definitions Of Pretty:
Visual Bindiff
The Ancient Tongue:
TCP/IP
• Can’t all be about pretty pictures 
• A new problem has popped up: Network
oligopolies are threatening to install
firewalls that limit or eliminate bandwidth
on a per-company basis
– Their own media services might be fast,
others will be slow
– Their own VPN services might be fast, others
will be slow
• Question: Is it possible to detect and
locate devices violating network
What’s The Closest Tool We Have?
• Firewalk
– Mike Schiffman’s Firewall Analysis Tool
– Packets elicit a ICMP Time Exceeded error if
they reach a router with TTL=0
• TTL decremented by one for each hop, so you
start low, you can trace the route to a host
– A firewalled packet won’t live long enough to
reach TTL=0
– So you can locate the firewall, and divine
things about its ruleset, based on when your
packets stop getting ICMP Time Exceeded
Limitations of Firewalking
• But Firewalk tells us what, not who is
blocked…and it tells us nothing about who
is allowed to go fast, and who is made to
go slow
– Suddenly, we devolve to a much older
question: Is it possible to find out that a target
firewall is, or is not, blocking against or
accepting traffic from an arbitrary IP address?
TCP Does Speed Measurement
• TCP speed analysis done blindly
– Endpoints do not negotiate with one another
– Everyone sends their packets, routers route
what they will. Endpoints need to adjust to
what the routers are willing to pass.
• Routers communicate with endpoints by dropping
their packets
• Can we combine this router backchannel
w/ Firewalk?
In From The Side
• What causes packets to drop?
– Too many packets
• What are we going to do?
– Send too many packets
• Two channels are set up
– A primary channel, which drops packets at some
known rate
– A secondary channel, whose purpose it is to interfere
(or not) with the primary channel
• When the secondary interferes with the primary,
we get feedback via the primary channel
– The traffic composing the secondary channel can
come from anywhere, be composed of anything, and
can be TTL’d just like in a normal firewalk.
The TTL Channel
• Normally, you don’t know which router
along a path is dropping your packets
 
• If you are the source of the drop-inducing
packets, you can control how far your
noise goes out – thus, you can discover
which router is hitting its limit / censoring
your net connection
 
Scorchmarking
• Why Scorchmarking?
– Routers are burning packets…those that get through
might have a scorch mark or two 
• Basic Model
– Client downloads a file from a site, at some given
speed negotiated via TCP.
– At the same time, traffic is injected from different IP
addresses. This should cause drops.
• If it doesn’t, the network is either penalizing the primary
channel (easy to drop against) or rewarding the secondary
channel (resilient to drops)
Advanced Scorchmarking [0]
• Having to depend on a client is lame
– Wouldn’t it be nice if we could scan the
Internet for these servers?
• What fundamental service is a receiving
client providing?
– It is acknowledging our traffic – letting us
know how much it received, and how many
milliseconds it took to receive it
• Aren’t there other ways we could extract
the same data from hosts?
Advanced Scorchmarking [1]
• What else will acknowledge receiving traffic from
us?
– TCP Servers
• Sting, from Stefan Savage, used this to great effect
– DNS Servers 
– Routers.
• Supposedly, routers won’t send more than a certain number
of ICMP Time Exceeded packets per second
• In reality, they seem to ICMP Time Exceeded ACK however
much you throw at them
• Even if they didn’t, you could use the difference in ICMP
Time Exceeded rates between Primary and Secondary
channel, to determine whether interference was showing up.
• Everyone’s got a NAT – so you can query everyone for
whether certain sorts of traffic are being blocked to them
Advanced Scorchmarking [2]
• So, yes.
– You can scan for violations of Network Neutrality
– You can find networks that are blocking or passing
particular IP ranges
• It’s not exactly efficient though
• Neutrality violations are easier to find than the
standard FW case
– Firewalls are normally between the WAN and the LAN
(Slow Net -> FW -> Fast Net)
– Neutrality violators are mid-WAN (Slow Net -> Fw ->
Slow Net -> Fast Net)
– Easier to overload the slow net after the firewall
• Boxes with max TTL rates override this
Speed Limits
• Fundamental Problem: Have to max out
bandwidth on the link to trigger the backchannel
– No packets dropping, no data
– Means you have to DoS a link – not scalable/legal
• Potential Solution: Find capped acknowledgers
– The mythical ICMP Time Exceeded rate limit works
well
• Primary and Secondary channel both eliciting ITE’s
• When secondary channel gets a packet through, it takes up a
slot on the primary channel’s
• ITE is perfect, since you can TTL limit any packet
• Depends on the firewall passing the primary’s ITE’s
• Maybe Linux / NATs actually implement rate limits?
– Another option: What if we have code on the client?
Windows Media Player:
More Than Just DRM. Really!
• Bulk Transfer: RTP
– Runs over Unicast UDP
– Yes, the same Unicast UDP that penetrates NAT so
well!
• Flow Control / Quality Monitoring: RTCP
• No technical reason RTCP needs to go back to
the same address that RTP stream is coming
from
– So: We pretend to provide media streams from all
sorts of sites, and use WMP to collect traffic stats for
us 
• It might work…
Symbols
• But this is not to be a talk on TCP/IP
hackery…
SSH’s Hex Problem
• $ ssh dan@blah
The authenticity of host 'blah (1.2.3.4)'
can't be established.
RSA key fingerprint is
09:a9:b1:99:84:17:7d:ba:c6:55:46:5a:17:f8:
83:01.
Are you sure you want to continue
connecting (yes/no)?
• 09:a9:b1…am I supposed to do something with this?
– Yes. According to SSH’s design, you’re supposed to
reject the proposed fingerprint if it looks unfamiliar.
(Seriously.)
• The “Two Billion SSH Key” attack (by ADM) just comes up
with 2B keys and emits the visibly closest key. It works.
Hex sucks.
A better mapping must be possible…
Cryptomnemonics
• There are three classes of memory, at least to
the degree as is useful in cryptography
– Rejection: “I’ve never seen that before”
– Recognition: “It’s that one, not that other one”
– Recollection: “Let me describe it to you.”
• SSH just requires rejection – “What? That’s
new.”
• Hex domain clearly does not work. What else is
available?
– To restate the problem: Humans do not operate on
hexadecimal symbols effectively. Are there any
other symbol sets we can use?
Alternative Symbolic Domains
• Abstract Art via déjà vu
• Calculated faces via
Passfaces
• Both have attempted to
address limited capacity
for recollection by moving
authentication to a
recognition problem
• But recognition offers only
a limited number of bits:
9^5=59049 < 2^16
– This is OK, since Passfaces is
online and thus can lock a user
out before 59K attempts are up
– We are not online – but we only
need to reject, not recognize
and certainly not recollect
The Nymic Domain:
Names Are Identity Symbols
• Humans don’t remember arbitrary bits, but we
do remember stories.
• Stories changes (the bits shift over time), but
names stay the same
• Can we map the 160 bits SSH needs us to
accept or reject, to names?
– Take 512 male names: 9 bits of info per male name
– Take 1024 female names: 10 bits of info per female
name
– Take 8192 last names: 13 bits of info per last name
– 9+10+13=32. 5 couples = 160 bits
Demo
• $ ssh dan@blah
Key Data:
julio and epifania dezzutti
luther and rolande doornbos
manual and twyla imbesi
dirk and cuc kolopajlo
omar and jeana hymel
The authenticity of host 'blah (1.2.3.4)'
can't be established.
Are you sure you want to continue connecting
(yes/no)?
• It is critical that the Key Data be shown every time there’s
a connection. The user must become familiar with the
“characters” in the “story”.
– This actually seems to work.
What about Bubble Babble?
• $ ssh-keygen.exe -B -f id_dsa.pub
1024 xegoz-tosys-vusik-masar-cifyc-cyled-kikih-
zukuf-nypok-sezyt-noxax id_dsa.pub
• Problem: Humans do not remember arbitrary
sequences of syllables well
• Names are special sequences – sharing with
pre-existing language logic should improve
retention
– Still, names are arbitrary (Bhoutros-Bhoutros Ghali);
could merge approaches:
Xegoz and Tosys Visuk
Masar and Cifyc Cyled
Kikih and Zukuf Nypok
Sezyt Noxax
– Requires testing
Inverting The Symbol Flow:
Passnyms
• Suppose you have 8 characters with one of 64
characters in each slot.
– aI7$13nM
– 64==2^6, so (2^6^8) == 48 bits
– “Lowercase A, lowercase l, seven, dollar sign, one,
three, lower case n, upper case M”
• This is twenty three syllables!
• What if, instead, you typed:
– dirk and cuc kolopajlo
omar and jeana hymel
– 64 bits of entropy, 14 syllables, can be spell
checked as user types it in
It Is Easier To Interface With
Systems When Symbols Align
• Hacking is a form of interfacing 
• We can break things with garbage symbols
– “Dumb Fuzzing”: Take a file, flip some bits, see what
happens
• We can break more things with meaningful
symbols used in unexpected ways
– “Smart Fuzzing”: Take a file, understand its internal
structure, fuzz the structure, see what happens
• Dumb fuzzing is very easy.
• Smart fuzzing is very labor intensive…requires
smart people, maybe specifications.
• Is there any way we can automatically discover
symbol sets?
File Formats Are Languages
• Kids don’t get documentation when they
learn new languages. They just pick ‘em
up.
– They can do this because they actually design
all sorts of internal structure and redundancy
into them.
• Children make languages.
• Adults make working languages.
• Programmers make barely working
languages.
– Lets autodiscover them!
N’est’ce pas Non Sequitur
• Sequitur: Linear Time Pattern Finder
– Creates hierarchal Context Free Grammars from arbitrary input
• Compression Algorithm in which you can “look under the
covers” to see what’s going on
• Created by Craig Neville-Manning as his PhD thesis a
decade ago
– He’s now Chief Research Scientist at Google
Syntax Highlighting For Hex Dumps
• Trivial Algorithm: In a
hierarchical grammar,
each byte requires
traversing to a certain
depth in order to
recover the raw literal.
• Color each byte by
how deep in the tree
you have to go.
BLUR-O-VISION
What’s Actually Going On?
• (0) -> … (73),b4,(73),ca,(73),e6,(73),02,(74),18,
(74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98,
(74),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20,
(75),30,(75),40,(75),50,(75),64,(75),82,(75),90,
(75),9e,(75)
…
(84),d6,(84),ee,(84),0c,(85),28,(85),3c,(85),4e,
(85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85),be,
(85),ca,(85),ea,(85),08,(86),26,(86),44,(86),56,
(86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc,
(86),de,(86),02,(87)
• Repeated sequence, single byte literal.
Repeated sequence, single byte literal. Rinse,
lather, repeat.
Intersymbol Link Discovery
• Turns code on left into
symbolic set on right;
it’s easy then to link
the symbols together
as per the graph.
• This works for non-textual data
• Sequitur imputes meaningful
symbols from arbitrary input
data
Context Free Grammar Fuzzer:
THE CFG9000
• Reduce input data to a stream of symbols
• Fuzz data at the symbol level, rather than at
pure bytes
– Shuffle
– Drop
– Repeat
– Uniform Corrupt
• Consistently corrupt all instances of a given symbol
• <HEAD> -> <FOOBAR>
• Sequitur is not necessarily the best way to
generate a grammar.
– Doesn’t handle recursion, common in genomic data
– Suffix trees may yield better output
– Sequitur may scale better (100MB input not an issue)
Sample CFG9000 Output
• calculate_rule_usage(p->rulep->rulep->rulep-
>rulep->rulep->rulep->rulep->rulep->rulep-
>rulep->rulep->rulep->rulep->rule() }
• calculate_rule_usage(calculate_rule_usage(calc
ulate_rule_usage(calculate_rule_usage(calculat
e_rule_usage(calculate_rule_usage(calculate_ru
le_usage(calculate_rule_usage(calculate_rule_u
sage(calculate_rule_usage(calculate_rule_usag
e(calculate_rule_usage(calculate_rule_usage(ca
lculate_rule_usage(calculate_rule_usage(calcula
te_rule_usage(calculate_rule_usage(calculate_r
ule_usage(p->rule());
Slashdot Fuzzed
Slashdot Fuzzed (2)
It’s Not The Best CFG Fuzzing
Ever…
• Many physicists would agree that, had it not been for
congestion control, the evaluation of web browsers might
never have occurred. In fact, few hackers worldwide
would disagree with the essential unification of voice-
over-IP and public private key pair. In order to solve this
riddle, we confirm that SMPs can be made stochastic,
cacheable, and interposable.
– Rooter: A Methodology for the Typical Unification of Access
Points and Redundancy
– By A Context-Free Grammar Generating CompSci Papers
• Authors handcoded “meaningful symbols” in CompSci
speak. The eventual goal is the autogeneration of
symbol and inter-symbol patterns.
Symbolic Discovery Is Inevitable
• “An early inference procedure was described by
Chomsky and Miller (1957a), as reported in Solomonoff
(1959). Chomsky proposed a method for detecting loops
in finite state languages. The approach requires a set of
valid sentences, and an oracle that determines whether
a sentence is in the language.
The algorithm proceeds by deleting part of a valid
sentence and asking the oracle whether the sentence is
still valid. If it is, the deleted part is reinserted into the
sequence and repeated, so that it appears twice. If the
sentence is still in the language, a cycle has been
detected.”
– Inferring Sequential Structure, Craig Neville Manning, 1996
– This couldn’t POSSIBLY be useful for building a structure
for a dumb fuzzer to operate against.
• Instead of seeing if the parser crashes, just see if it considers
the input valid
TODO
• “Requitur”; Sequitur implementation optimized
for fuzzer use
– Generate larger symbols
• No two byte symbols please; we’re not trying to compress,
we’re trying to elucidate structure
– Eliminate redundant symbols
• Keiffer-Yang optimization in ~2001: If symbol (x) == symbol
(y), then delete (y) and set all instances of (y) to (x)
• Need to do this to actually consistently fuzz all instances of a
particular trope
– Possibly remove in-memory grammar requirement
• Use mechanisms from Ray, a out-of-memory variant
– Add foreign grammar capability
What’s Out Now
• 8 Bit Clean – Can Analyze Arbitrary Data
• Mergedot – Can create graph from
Sequitur output
How To Think Of Sequitur
• Any time you’re manipulating data as
bytes, think of manipulating it as symbols
– Trigram histograms on bytes -> Trigram
histograms on symbols
– Bayesian probabilities on characters ->
Bayesian probabilities on symbols
– Adapt yourself to more than 256 codes per
symbol and reap the benefit
• If your code is already Unicode aware you might
be one step ahead!
Fuzzy Wuzzy Wuz A Symbol
• Symbol analysis systems (language translators,
etc) have issues w/ TMTOWTDI (There’s More
Than One Way To Do It)
– Very similar messages can be encapsulated in very
different ways
– Very similar messages can be encapsulated in very
similar, but not identical ways
• Sequitur only handles exact matches – fuzzy
grammar imputation doesn’t appear to exist yet
– Are there any systems for analyzing complex, inequal
but somewhat related sets of symbols?
Another Approach: DotPlots
• Popular mechanism in bioinformatics for visual analysis
of genomes.
• Some attempts to apply dotplots outside of
bioinformatics
– Textual analysis
– Audio
• Remembered an old paper, entitled Visualizing Music
And Audio Using Self-Similarity
– Jonathan Foote from Xerox
• Brute Force solution – compare songs to themselves,
splitting them into tiny chunks and marking light for
similar and dark for dissimilar
– Disassociated Studio will do this for you
Day Tripper from the Beatles…
Music shows internal pattern.
•
So does MPEG.
What Exactly Are We Doing
• Jonathan Helman’s
“DotPlot Patterns: A
Literal Look at Pattern
Languages” offers an
introduction
• Instead of “to, be, not” etc, we use chunks
of data from arbitrary files
– The same similarity metric used to
disambiguate names for the SSH hack, is
used to measure similarity here 
There are so many patterns we
might see…
…and no matter how much we’ve
learned of this pattern language…
???
So How Might This Be Useful?
• A) Format Identification
– 1) Do different file formats appear different?
– 2) Do different instances of the same file
format appear similar?
– 3) Does one format embedded in another
make itself apparent?
• B) Fuzzer Guidance
– 1) Can we locate the actual byte offsets
where one section ends and another begins?
– 2) Can we visualize and compare fuzzer
operations via Dotplots?
Format Identification
• 1) Do different files appear different,
and does the appearance reflect the
existence of internal structure?
• 2) Do different instances of the same file format appear
similar?
• 3) Does one format embedded in another make itself
apparent?
Java Class Files
.NET Assemblies
CNN’s Home Page
SMBTorture Traffic
(Packets – Note, Stop/Start Is Visible)
Kernel32.dll
Chromosome 22
(This is, after all, a genomics hack)
The Legend Of Zelda
Format Identification
• 1) Do different files appear different, and
does the appearance reflect the existence
of internal structure?
– Answer: Yes. They do.
• 2) Do different instances of the same
file format appear similar?
• 3) Does one format embedded in another make itself
apparent?
Books from Project Gutenberg:
Consistent
Despite English’s low
information content,
lack of even mildly
related strings causes
little self-similarity
across symbol clusters
US Code:
Moderately Consistent
Legalese is a massively
structured dialect.
Symbols appear in very
distinct patterns that are
more reminiscent of
machine code than text.
HTML:
Consistent
HTML repeats smaller
symbols (tags) and larger
symbol clusters (via
template engines) regularly.
This shows up visually as a
tightly repeating pattern.
Java Class Files (Compared):
Mildly Consistent
Binary code (be it bytecode
or x86) tends to be very
structured. Still, we are
dependent on both the
content and the compiler
to generate distinct
patterns.
x86:
Consistent (In Sections)
x86 tends not to be
handwritten; as such
complex instructions are
emitted in a highly
structured form.
Exception?
• 64 kilobyte graphical
demonstration
• Run through a packer

• Compression
removes patterns
NES Games
6502 Assembly Tends
To Show Consistent
Patterns, But…
Mario Games Look Rather
Different.
1) Output is highly
dependent on the
compiler
2) Output is highly
dependent upon the
actual content
File formats are merely
shells for actual
content. You are
analyzing the content;
the format is just
syntactic sugar.
Format Identification
• 1) Do different files appear different, and does the
appearance reflect the existence of internal structure?
– Answer: Yes. They do.
• 2) Do different instances of the same
file format appear similar?
– Answer: Somewhat. Similar content looks
like itself, but you’re measuring the
fundamental entropy of the underlying
content, not the format of the content
itself.
• 3) Does one format embedded in another make
itself apparent?
File Formats Contain Multiple Subformats
Another Look At Kernel32.DLL
These are all different
parts of Kernel32.
Quickly Browsing Large Files:
Tilt-Shift View
• Instead of measuring
absolute Y against
absolute X, make X
relative
– Advance through the
file going down, look
back a number of
bytes going right
Complain All You Want.
Hex Still Sucks.
Format Identification
• 1) Do different files appear different, and does the
appearance reflect the existence of internal structure?
– Answer: Yes. They do.
• 2) Do different instances of the same file format appear
similar?
– Answer: Somewhat. Similar content looks like itself,
but you’re measuring the fundamental entropy of the
underlying content, not the format of the content itself.
• 3) Does one format embedded in another
make itself apparent?
– Answer: Yes. Multiple, distinct sections
are clearly visible in a way that hex cannot
show.
Fuzzer Guidance
• 1) Can we locate the actual byte offsets
where one section ends and another begins?
– Why would we want to?
• Fuzzers break parsers.
• Many subformats to a format, many subparsers to a parser
• To a rough level of approximation, fuzzing a single subformat
lets you stress a single subparser
• So once we split a file up, we can selectively attack one
subparser at a time.
• 2) Can we visualize and compare fuzzer operations via
Dotplots?
Simple Math
We select an interesting blob
from kernel32.dll. The blob is
at pixel offset 507x507, and
is a square around 570 pixels
wide.
Window size on viz was 32.
507*32 = The interesting
section starts 16224 bytes
into the file.
570*32 = The interesting
section is 18240 bytes long.
Whats The Actual Data?
dd if=kernel32.dll bs=1 skip=16100
| hexdump - | more
Using Hardcorr as a “first knife” to
locate interesting-to-fuzz regions
Fuzzer Guidance
• 1) Can we locate the actual byte offsets where
one section ends and another begins?
– Answer: Yes. We can quickly route from the image
to the byte offset, through basic arithmetic.
• 2) Can we visualize and compare
fuzzer operations via Dotplots?
Differentials
• Major use of dotplots in bioinformatics is to
compare one genome against another
– Autocorrelation: Compare A to A
– Cross-Correlation: Compare A to B
• Most files are sufficiently dissimilar that
not very interesting structure shows up
– Notable exception: Different versions of
the same binary
Visual Bindiff!
MSVCR70.DLL v. MSVCR71.DLL
Fuzzers:
Very Broken Patchers 
Mangle.C – Single Bit
Differences
CFG9000 – Large Scale
Reordering
Fuzzer Guidance
• 1) Can we locate the actual byte offsets where one
section ends and another begins?
– Answer: Yes. We can quickly route from the image
to the byte offset, through basic arithmetic.
• 2) Can we visualize and compare
fuzzer operations via Dotplots?
–Answer: Yes – visual diffing effectively
shows differences between files,
including differences introduced by
various flavors of fuzzers.

Weitere ähnliche Inhalte

Was ist angesagt?

Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Dan Kaminsky
 
Yet Another Dan Kaminsky Talk (Black Ops 2014)
Yet Another Dan Kaminsky Talk (Black Ops 2014)Yet Another Dan Kaminsky Talk (Black Ops 2014)
Yet Another Dan Kaminsky Talk (Black Ops 2014)Dan Kaminsky
 
Domain Key Infrastructure (From Black Hat USA)
Domain Key Infrastructure (From Black Hat USA)Domain Key Infrastructure (From Black Hat USA)
Domain Key Infrastructure (From Black Hat USA)Dan Kaminsky
 
Bugs Aren't Random
Bugs Aren't RandomBugs Aren't Random
Bugs Aren't RandomDan Kaminsky
 
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00tDefcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00tpseudor00t overflow
 
Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Dan Kaminsky
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Alec Muffett
 
A Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryA Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryDan Kaminsky
 
Packaging is the Worst Way to Distribute Software, Except for Everything Else
Packaging is the Worst Way to Distribute Software, Except for Everything ElsePackaging is the Worst Way to Distribute Software, Except for Everything Else
Packaging is the Worst Way to Distribute Software, Except for Everything Elsemckern
 
Improvement in Rogue Access Points - SensePost Defcon 22
Improvement in Rogue Access Points - SensePost Defcon 22Improvement in Rogue Access Points - SensePost Defcon 22
Improvement in Rogue Access Points - SensePost Defcon 22SensePost
 
Move Fast and Fix Things
Move Fast and Fix ThingsMove Fast and Fix Things
Move Fast and Fix ThingsDan Kaminsky
 
First adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteFirst adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteAndrea Telatin
 
The Good News on Cryptography
The Good News on CryptographyThe Good News on Cryptography
The Good News on CryptographyMartijn Grooten
 

Was ist angesagt? (19)

Confidence web
Confidence webConfidence web
Confidence web
 
Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)
 
Yet Another Dan Kaminsky Talk (Black Ops 2014)
Yet Another Dan Kaminsky Talk (Black Ops 2014)Yet Another Dan Kaminsky Talk (Black Ops 2014)
Yet Another Dan Kaminsky Talk (Black Ops 2014)
 
Dmk shmoo2007
Dmk shmoo2007Dmk shmoo2007
Dmk shmoo2007
 
Black opspki 2
Black opspki 2Black opspki 2
Black opspki 2
 
Interpolique
InterpoliqueInterpolique
Interpolique
 
Black ops 2012
Black ops 2012Black ops 2012
Black ops 2012
 
Domain Key Infrastructure (From Black Hat USA)
Domain Key Infrastructure (From Black Hat USA)Domain Key Infrastructure (From Black Hat USA)
Domain Key Infrastructure (From Black Hat USA)
 
Bugs Aren't Random
Bugs Aren't RandomBugs Aren't Random
Bugs Aren't Random
 
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00tDefcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
 
Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5
 
A Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryA Technical Dive into Defensive Trickery
A Technical Dive into Defensive Trickery
 
Packaging is the Worst Way to Distribute Software, Except for Everything Else
Packaging is the Worst Way to Distribute Software, Except for Everything ElsePackaging is the Worst Way to Distribute Software, Except for Everything Else
Packaging is the Worst Way to Distribute Software, Except for Everything Else
 
Os Tucker
Os TuckerOs Tucker
Os Tucker
 
Improvement in Rogue Access Points - SensePost Defcon 22
Improvement in Rogue Access Points - SensePost Defcon 22Improvement in Rogue Access Points - SensePost Defcon 22
Improvement in Rogue Access Points - SensePost Defcon 22
 
Move Fast and Fix Things
Move Fast and Fix ThingsMove Fast and Fix Things
Move Fast and Fix Things
 
First adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteFirst adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram Institute
 
The Good News on Cryptography
The Good News on CryptographyThe Good News on Cryptography
The Good News on Cryptography
 

Ähnlich wie Dmk blackops2006 ccc

Introduction to Computer Networking
Introduction to Computer NetworkingIntroduction to Computer Networking
Introduction to Computer NetworkingAmit Saha
 
Why and How to use Onion Networking - #EMFCamp2018
Why and How to use Onion Networking - #EMFCamp2018Why and How to use Onion Networking - #EMFCamp2018
Why and How to use Onion Networking - #EMFCamp2018Alec Muffett
 
How Secure is TCP/IP - A review of Network Protocol
How Secure is TCP/IP - A review of Network ProtocolHow Secure is TCP/IP - A review of Network Protocol
How Secure is TCP/IP - A review of Network Protocolssuserc49ec4
 
lecture06-link-layer.pdf
lecture06-link-layer.pdflecture06-link-layer.pdf
lecture06-link-layer.pdfEnics
 
Shmoocon Epilogue 2013 - Ruining security models with SSH
Shmoocon Epilogue 2013 - Ruining security models with SSHShmoocon Epilogue 2013 - Ruining security models with SSH
Shmoocon Epilogue 2013 - Ruining security models with SSHAndrew Morris
 
Network security basics
Network security basicsNetwork security basics
Network security basicsSkillspire LLC
 
Crypto Strikes Back! (Google 2009)
Crypto Strikes Back! (Google 2009)Crypto Strikes Back! (Google 2009)
Crypto Strikes Back! (Google 2009)Nate Lawson
 
Defcon 16-pilosov-kapela
Defcon 16-pilosov-kapelaDefcon 16-pilosov-kapela
Defcon 16-pilosov-kapelaHai Nguyen
 
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disasters
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration DisastersBSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disasters
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disastersinfodox
 
Dmk sb2010 web_defense
Dmk sb2010 web_defenseDmk sb2010 web_defense
Dmk sb2010 web_defenseDan Kaminsky
 
HowTheInternetWorks.ppt
HowTheInternetWorks.pptHowTheInternetWorks.ppt
HowTheInternetWorks.pptPrakhar Pandey
 
Alice and Bob are Eff'd
Alice and Bob are Eff'dAlice and Bob are Eff'd
Alice and Bob are Eff'dJason Ross
 
26-security2.ppt
26-security2.ppt26-security2.ppt
26-security2.pptsumita02
 

Ähnlich wie Dmk blackops2006 ccc (20)

Dmk bo2 k8_ccc
Dmk bo2 k8_cccDmk bo2 k8_ccc
Dmk bo2 k8_ccc
 
Introduction to Computer Networking
Introduction to Computer NetworkingIntroduction to Computer Networking
Introduction to Computer Networking
 
Dmk bo2 k7_web
Dmk bo2 k7_webDmk bo2 k7_web
Dmk bo2 k7_web
 
WEEK-01.pdf
WEEK-01.pdfWEEK-01.pdf
WEEK-01.pdf
 
Why and How to use Onion Networking - #EMFCamp2018
Why and How to use Onion Networking - #EMFCamp2018Why and How to use Onion Networking - #EMFCamp2018
Why and How to use Onion Networking - #EMFCamp2018
 
Bh eu 05-kaminsky
Bh eu 05-kaminskyBh eu 05-kaminsky
Bh eu 05-kaminsky
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
 
P2P Lecture.ppt
P2P Lecture.pptP2P Lecture.ppt
P2P Lecture.ppt
 
How Secure is TCP/IP - A review of Network Protocol
How Secure is TCP/IP - A review of Network ProtocolHow Secure is TCP/IP - A review of Network Protocol
How Secure is TCP/IP - A review of Network Protocol
 
lecture06-link-layer.pdf
lecture06-link-layer.pdflecture06-link-layer.pdf
lecture06-link-layer.pdf
 
Shmoocon Epilogue 2013 - Ruining security models with SSH
Shmoocon Epilogue 2013 - Ruining security models with SSHShmoocon Epilogue 2013 - Ruining security models with SSH
Shmoocon Epilogue 2013 - Ruining security models with SSH
 
Network security basics
Network security basicsNetwork security basics
Network security basics
 
Crypto Strikes Back! (Google 2009)
Crypto Strikes Back! (Google 2009)Crypto Strikes Back! (Google 2009)
Crypto Strikes Back! (Google 2009)
 
Defcon 16-pilosov-kapela
Defcon 16-pilosov-kapelaDefcon 16-pilosov-kapela
Defcon 16-pilosov-kapela
 
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disasters
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration DisastersBSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disasters
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disasters
 
Networking.ppt
Networking.pptNetworking.ppt
Networking.ppt
 
Dmk sb2010 web_defense
Dmk sb2010 web_defenseDmk sb2010 web_defense
Dmk sb2010 web_defense
 
HowTheInternetWorks.ppt
HowTheInternetWorks.pptHowTheInternetWorks.ppt
HowTheInternetWorks.ppt
 
Alice and Bob are Eff'd
Alice and Bob are Eff'dAlice and Bob are Eff'd
Alice and Bob are Eff'd
 
26-security2.ppt
26-security2.ppt26-security2.ppt
26-security2.ppt
 

Mehr von Dan Kaminsky

I Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetI Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetDan Kaminsky
 
Chicken Chicken Chicken Chicken
Chicken Chicken Chicken ChickenChicken Chicken Chicken Chicken
Chicken Chicken Chicken ChickenDan Kaminsky
 
Some Thoughts On Bitcoin
Some Thoughts On BitcoinSome Thoughts On Bitcoin
Some Thoughts On BitcoinDan Kaminsky
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingDan Kaminsky
 
232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slidesDan Kaminsky
 
Bh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackopsBh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackopsDan Kaminsky
 

Mehr von Dan Kaminsky (13)

Chicken
ChickenChicken
Chicken
 
I Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetI Want These * Bugs Off My * Internet
I Want These * Bugs Off My * Internet
 
Chicken Chicken Chicken Chicken
Chicken Chicken Chicken ChickenChicken Chicken Chicken Chicken
Chicken Chicken Chicken Chicken
 
Some Thoughts On Bitcoin
Some Thoughts On BitcoinSome Thoughts On Bitcoin
Some Thoughts On Bitcoin
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
 
Interpolique
InterpoliqueInterpolique
Interpolique
 
232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slides
 
Bh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackopsBh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackops
 
Dmk neut toor
Dmk neut toorDmk neut toor
Dmk neut toor
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Bo2004
Bo2004Bo2004
Bo2004
 
Gwc3
Gwc3Gwc3
Gwc3
 
Advanced open ssh
Advanced open sshAdvanced open ssh
Advanced open ssh
 

Dmk blackops2006 ccc

  • 1. Black Ops 2006 Viz Edition CCC 2006 Dan Kaminsky Director Of Penetration Testing IOActive
  • 2. Thanks and No Thanks • Thank You To Swissotel Amsterdam, who provided a net connection with which I could actually finish these slides • No Thanks to Delta Hotel of Amsterdam, which put a TV on a really weak shelf. – I suppose it’s my fault I put my laptop underneath. – The “Star System” is officially meaningless
  • 3. Who Am I? • Coauthor of several book series – Hack Proofing Your Network – Stealing The Network • Formerly of Cisco and Avaya – Presently partnering with IOActive – One of the “Blue Hat Hackers” that has been auditing Windows Vista • Been doing talks for six years now – TCP/IP, DNS, MD5, SSH, etc.
  • 4. What Are We Here To Do? • Break TCP/IP A Little More – Not in the documentation – It’s for a good cause ;) • Analyze Data Linguistically • Make Pretty Pretty Pictures!
  • 5. For Various Definitions Of Pretty: Visual Bindiff
  • 6. The Ancient Tongue: TCP/IP • Can’t all be about pretty pictures  • A new problem has popped up: Network oligopolies are threatening to install firewalls that limit or eliminate bandwidth on a per-company basis – Their own media services might be fast, others will be slow – Their own VPN services might be fast, others will be slow • Question: Is it possible to detect and locate devices violating network
  • 7. What’s The Closest Tool We Have? • Firewalk – Mike Schiffman’s Firewall Analysis Tool – Packets elicit a ICMP Time Exceeded error if they reach a router with TTL=0 • TTL decremented by one for each hop, so you start low, you can trace the route to a host – A firewalled packet won’t live long enough to reach TTL=0 – So you can locate the firewall, and divine things about its ruleset, based on when your packets stop getting ICMP Time Exceeded
  • 8. Limitations of Firewalking • But Firewalk tells us what, not who is blocked…and it tells us nothing about who is allowed to go fast, and who is made to go slow – Suddenly, we devolve to a much older question: Is it possible to find out that a target firewall is, or is not, blocking against or accepting traffic from an arbitrary IP address?
  • 9. TCP Does Speed Measurement • TCP speed analysis done blindly – Endpoints do not negotiate with one another – Everyone sends their packets, routers route what they will. Endpoints need to adjust to what the routers are willing to pass. • Routers communicate with endpoints by dropping their packets • Can we combine this router backchannel w/ Firewalk?
  • 10. In From The Side • What causes packets to drop? – Too many packets • What are we going to do? – Send too many packets • Two channels are set up – A primary channel, which drops packets at some known rate – A secondary channel, whose purpose it is to interfere (or not) with the primary channel • When the secondary interferes with the primary, we get feedback via the primary channel – The traffic composing the secondary channel can come from anywhere, be composed of anything, and can be TTL’d just like in a normal firewalk.
  • 11. The TTL Channel • Normally, you don’t know which router along a path is dropping your packets   • If you are the source of the drop-inducing packets, you can control how far your noise goes out – thus, you can discover which router is hitting its limit / censoring your net connection  
  • 12. Scorchmarking • Why Scorchmarking? – Routers are burning packets…those that get through might have a scorch mark or two  • Basic Model – Client downloads a file from a site, at some given speed negotiated via TCP. – At the same time, traffic is injected from different IP addresses. This should cause drops. • If it doesn’t, the network is either penalizing the primary channel (easy to drop against) or rewarding the secondary channel (resilient to drops)
  • 13. Advanced Scorchmarking [0] • Having to depend on a client is lame – Wouldn’t it be nice if we could scan the Internet for these servers? • What fundamental service is a receiving client providing? – It is acknowledging our traffic – letting us know how much it received, and how many milliseconds it took to receive it • Aren’t there other ways we could extract the same data from hosts?
  • 14. Advanced Scorchmarking [1] • What else will acknowledge receiving traffic from us? – TCP Servers • Sting, from Stefan Savage, used this to great effect – DNS Servers  – Routers. • Supposedly, routers won’t send more than a certain number of ICMP Time Exceeded packets per second • In reality, they seem to ICMP Time Exceeded ACK however much you throw at them • Even if they didn’t, you could use the difference in ICMP Time Exceeded rates between Primary and Secondary channel, to determine whether interference was showing up. • Everyone’s got a NAT – so you can query everyone for whether certain sorts of traffic are being blocked to them
  • 15. Advanced Scorchmarking [2] • So, yes. – You can scan for violations of Network Neutrality – You can find networks that are blocking or passing particular IP ranges • It’s not exactly efficient though • Neutrality violations are easier to find than the standard FW case – Firewalls are normally between the WAN and the LAN (Slow Net -> FW -> Fast Net) – Neutrality violators are mid-WAN (Slow Net -> Fw -> Slow Net -> Fast Net) – Easier to overload the slow net after the firewall • Boxes with max TTL rates override this
  • 16. Speed Limits • Fundamental Problem: Have to max out bandwidth on the link to trigger the backchannel – No packets dropping, no data – Means you have to DoS a link – not scalable/legal • Potential Solution: Find capped acknowledgers – The mythical ICMP Time Exceeded rate limit works well • Primary and Secondary channel both eliciting ITE’s • When secondary channel gets a packet through, it takes up a slot on the primary channel’s • ITE is perfect, since you can TTL limit any packet • Depends on the firewall passing the primary’s ITE’s • Maybe Linux / NATs actually implement rate limits? – Another option: What if we have code on the client?
  • 17. Windows Media Player: More Than Just DRM. Really! • Bulk Transfer: RTP – Runs over Unicast UDP – Yes, the same Unicast UDP that penetrates NAT so well! • Flow Control / Quality Monitoring: RTCP • No technical reason RTCP needs to go back to the same address that RTP stream is coming from – So: We pretend to provide media streams from all sorts of sites, and use WMP to collect traffic stats for us  • It might work…
  • 18. Symbols • But this is not to be a talk on TCP/IP hackery…
  • 19. SSH’s Hex Problem • $ ssh dan@blah The authenticity of host 'blah (1.2.3.4)' can't be established. RSA key fingerprint is 09:a9:b1:99:84:17:7d:ba:c6:55:46:5a:17:f8: 83:01. Are you sure you want to continue connecting (yes/no)? • 09:a9:b1…am I supposed to do something with this? – Yes. According to SSH’s design, you’re supposed to reject the proposed fingerprint if it looks unfamiliar. (Seriously.) • The “Two Billion SSH Key” attack (by ADM) just comes up with 2B keys and emits the visibly closest key. It works.
  • 20. Hex sucks. A better mapping must be possible…
  • 21. Cryptomnemonics • There are three classes of memory, at least to the degree as is useful in cryptography – Rejection: “I’ve never seen that before” – Recognition: “It’s that one, not that other one” – Recollection: “Let me describe it to you.” • SSH just requires rejection – “What? That’s new.” • Hex domain clearly does not work. What else is available? – To restate the problem: Humans do not operate on hexadecimal symbols effectively. Are there any other symbol sets we can use?
  • 22. Alternative Symbolic Domains • Abstract Art via déjà vu • Calculated faces via Passfaces • Both have attempted to address limited capacity for recollection by moving authentication to a recognition problem • But recognition offers only a limited number of bits: 9^5=59049 < 2^16 – This is OK, since Passfaces is online and thus can lock a user out before 59K attempts are up – We are not online – but we only need to reject, not recognize and certainly not recollect
  • 23. The Nymic Domain: Names Are Identity Symbols • Humans don’t remember arbitrary bits, but we do remember stories. • Stories changes (the bits shift over time), but names stay the same • Can we map the 160 bits SSH needs us to accept or reject, to names? – Take 512 male names: 9 bits of info per male name – Take 1024 female names: 10 bits of info per female name – Take 8192 last names: 13 bits of info per last name – 9+10+13=32. 5 couples = 160 bits
  • 24. Demo • $ ssh dan@blah Key Data: julio and epifania dezzutti luther and rolande doornbos manual and twyla imbesi dirk and cuc kolopajlo omar and jeana hymel The authenticity of host 'blah (1.2.3.4)' can't be established. Are you sure you want to continue connecting (yes/no)? • It is critical that the Key Data be shown every time there’s a connection. The user must become familiar with the “characters” in the “story”. – This actually seems to work.
  • 25. What about Bubble Babble? • $ ssh-keygen.exe -B -f id_dsa.pub 1024 xegoz-tosys-vusik-masar-cifyc-cyled-kikih- zukuf-nypok-sezyt-noxax id_dsa.pub • Problem: Humans do not remember arbitrary sequences of syllables well • Names are special sequences – sharing with pre-existing language logic should improve retention – Still, names are arbitrary (Bhoutros-Bhoutros Ghali); could merge approaches: Xegoz and Tosys Visuk Masar and Cifyc Cyled Kikih and Zukuf Nypok Sezyt Noxax – Requires testing
  • 26. Inverting The Symbol Flow: Passnyms • Suppose you have 8 characters with one of 64 characters in each slot. – aI7$13nM – 64==2^6, so (2^6^8) == 48 bits – “Lowercase A, lowercase l, seven, dollar sign, one, three, lower case n, upper case M” • This is twenty three syllables! • What if, instead, you typed: – dirk and cuc kolopajlo omar and jeana hymel – 64 bits of entropy, 14 syllables, can be spell checked as user types it in
  • 27. It Is Easier To Interface With Systems When Symbols Align • Hacking is a form of interfacing  • We can break things with garbage symbols – “Dumb Fuzzing”: Take a file, flip some bits, see what happens • We can break more things with meaningful symbols used in unexpected ways – “Smart Fuzzing”: Take a file, understand its internal structure, fuzz the structure, see what happens • Dumb fuzzing is very easy. • Smart fuzzing is very labor intensive…requires smart people, maybe specifications. • Is there any way we can automatically discover symbol sets?
  • 28. File Formats Are Languages • Kids don’t get documentation when they learn new languages. They just pick ‘em up. – They can do this because they actually design all sorts of internal structure and redundancy into them. • Children make languages. • Adults make working languages. • Programmers make barely working languages. – Lets autodiscover them!
  • 29. N’est’ce pas Non Sequitur • Sequitur: Linear Time Pattern Finder – Creates hierarchal Context Free Grammars from arbitrary input • Compression Algorithm in which you can “look under the covers” to see what’s going on • Created by Craig Neville-Manning as his PhD thesis a decade ago – He’s now Chief Research Scientist at Google
  • 30. Syntax Highlighting For Hex Dumps • Trivial Algorithm: In a hierarchical grammar, each byte requires traversing to a certain depth in order to recover the raw literal. • Color each byte by how deep in the tree you have to go.
  • 32. What’s Actually Going On? • (0) -> … (73),b4,(73),ca,(73),e6,(73),02,(74),18, (74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98, (74),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20, (75),30,(75),40,(75),50,(75),64,(75),82,(75),90, (75),9e,(75) … (84),d6,(84),ee,(84),0c,(85),28,(85),3c,(85),4e, (85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85),be, (85),ca,(85),ea,(85),08,(86),26,(86),44,(86),56, (86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc, (86),de,(86),02,(87) • Repeated sequence, single byte literal. Repeated sequence, single byte literal. Rinse, lather, repeat.
  • 33. Intersymbol Link Discovery • Turns code on left into symbolic set on right; it’s easy then to link the symbols together as per the graph. • This works for non-textual data • Sequitur imputes meaningful symbols from arbitrary input data
  • 34. Context Free Grammar Fuzzer: THE CFG9000 • Reduce input data to a stream of symbols • Fuzz data at the symbol level, rather than at pure bytes – Shuffle – Drop – Repeat – Uniform Corrupt • Consistently corrupt all instances of a given symbol • <HEAD> -> <FOOBAR> • Sequitur is not necessarily the best way to generate a grammar. – Doesn’t handle recursion, common in genomic data – Suffix trees may yield better output – Sequitur may scale better (100MB input not an issue)
  • 35. Sample CFG9000 Output • calculate_rule_usage(p->rulep->rulep->rulep- >rulep->rulep->rulep->rulep->rulep->rulep- >rulep->rulep->rulep->rulep->rule() } • calculate_rule_usage(calculate_rule_usage(calc ulate_rule_usage(calculate_rule_usage(calculat e_rule_usage(calculate_rule_usage(calculate_ru le_usage(calculate_rule_usage(calculate_rule_u sage(calculate_rule_usage(calculate_rule_usag e(calculate_rule_usage(calculate_rule_usage(ca lculate_rule_usage(calculate_rule_usage(calcula te_rule_usage(calculate_rule_usage(calculate_r ule_usage(p->rule());
  • 38. It’s Not The Best CFG Fuzzing Ever… • Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice- over-IP and public private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable. – Rooter: A Methodology for the Typical Unification of Access Points and Redundancy – By A Context-Free Grammar Generating CompSci Papers • Authors handcoded “meaningful symbols” in CompSci speak. The eventual goal is the autogeneration of symbol and inter-symbol patterns.
  • 39. Symbolic Discovery Is Inevitable • “An early inference procedure was described by Chomsky and Miller (1957a), as reported in Solomonoff (1959). Chomsky proposed a method for detecting loops in finite state languages. The approach requires a set of valid sentences, and an oracle that determines whether a sentence is in the language. The algorithm proceeds by deleting part of a valid sentence and asking the oracle whether the sentence is still valid. If it is, the deleted part is reinserted into the sequence and repeated, so that it appears twice. If the sentence is still in the language, a cycle has been detected.” – Inferring Sequential Structure, Craig Neville Manning, 1996 – This couldn’t POSSIBLY be useful for building a structure for a dumb fuzzer to operate against. • Instead of seeing if the parser crashes, just see if it considers the input valid
  • 40. TODO • “Requitur”; Sequitur implementation optimized for fuzzer use – Generate larger symbols • No two byte symbols please; we’re not trying to compress, we’re trying to elucidate structure – Eliminate redundant symbols • Keiffer-Yang optimization in ~2001: If symbol (x) == symbol (y), then delete (y) and set all instances of (y) to (x) • Need to do this to actually consistently fuzz all instances of a particular trope – Possibly remove in-memory grammar requirement • Use mechanisms from Ray, a out-of-memory variant – Add foreign grammar capability
  • 41. What’s Out Now • 8 Bit Clean – Can Analyze Arbitrary Data • Mergedot – Can create graph from Sequitur output
  • 42. How To Think Of Sequitur • Any time you’re manipulating data as bytes, think of manipulating it as symbols – Trigram histograms on bytes -> Trigram histograms on symbols – Bayesian probabilities on characters -> Bayesian probabilities on symbols – Adapt yourself to more than 256 codes per symbol and reap the benefit • If your code is already Unicode aware you might be one step ahead!
  • 43. Fuzzy Wuzzy Wuz A Symbol • Symbol analysis systems (language translators, etc) have issues w/ TMTOWTDI (There’s More Than One Way To Do It) – Very similar messages can be encapsulated in very different ways – Very similar messages can be encapsulated in very similar, but not identical ways • Sequitur only handles exact matches – fuzzy grammar imputation doesn’t appear to exist yet – Are there any systems for analyzing complex, inequal but somewhat related sets of symbols?
  • 44. Another Approach: DotPlots • Popular mechanism in bioinformatics for visual analysis of genomes. • Some attempts to apply dotplots outside of bioinformatics – Textual analysis – Audio • Remembered an old paper, entitled Visualizing Music And Audio Using Self-Similarity – Jonathan Foote from Xerox • Brute Force solution – compare songs to themselves, splitting them into tiny chunks and marking light for similar and dark for dissimilar – Disassociated Studio will do this for you
  • 45. Day Tripper from the Beatles… Music shows internal pattern. •
  • 47. What Exactly Are We Doing • Jonathan Helman’s “DotPlot Patterns: A Literal Look at Pattern Languages” offers an introduction • Instead of “to, be, not” etc, we use chunks of data from arbitrary files – The same similarity metric used to disambiguate names for the SSH hack, is used to measure similarity here 
  • 48. There are so many patterns we might see…
  • 49. …and no matter how much we’ve learned of this pattern language…
  • 50. ???
  • 51. So How Might This Be Useful? • A) Format Identification – 1) Do different file formats appear different? – 2) Do different instances of the same file format appear similar? – 3) Does one format embedded in another make itself apparent? • B) Fuzzer Guidance – 1) Can we locate the actual byte offsets where one section ends and another begins? – 2) Can we visualize and compare fuzzer operations via Dotplots?
  • 52. Format Identification • 1) Do different files appear different, and does the appearance reflect the existence of internal structure? • 2) Do different instances of the same file format appear similar? • 3) Does one format embedded in another make itself apparent?
  • 56. SMBTorture Traffic (Packets – Note, Stop/Start Is Visible)
  • 58. Chromosome 22 (This is, after all, a genomics hack)
  • 59. The Legend Of Zelda
  • 60. Format Identification • 1) Do different files appear different, and does the appearance reflect the existence of internal structure? – Answer: Yes. They do. • 2) Do different instances of the same file format appear similar? • 3) Does one format embedded in another make itself apparent?
  • 61. Books from Project Gutenberg: Consistent Despite English’s low information content, lack of even mildly related strings causes little self-similarity across symbol clusters
  • 62. US Code: Moderately Consistent Legalese is a massively structured dialect. Symbols appear in very distinct patterns that are more reminiscent of machine code than text.
  • 63. HTML: Consistent HTML repeats smaller symbols (tags) and larger symbol clusters (via template engines) regularly. This shows up visually as a tightly repeating pattern.
  • 64. Java Class Files (Compared): Mildly Consistent Binary code (be it bytecode or x86) tends to be very structured. Still, we are dependent on both the content and the compiler to generate distinct patterns.
  • 65. x86: Consistent (In Sections) x86 tends not to be handwritten; as such complex instructions are emitted in a highly structured form.
  • 66. Exception? • 64 kilobyte graphical demonstration • Run through a packer  • Compression removes patterns
  • 67. NES Games 6502 Assembly Tends To Show Consistent Patterns, But…
  • 68. Mario Games Look Rather Different. 1) Output is highly dependent on the compiler 2) Output is highly dependent upon the actual content File formats are merely shells for actual content. You are analyzing the content; the format is just syntactic sugar.
  • 69. Format Identification • 1) Do different files appear different, and does the appearance reflect the existence of internal structure? – Answer: Yes. They do. • 2) Do different instances of the same file format appear similar? – Answer: Somewhat. Similar content looks like itself, but you’re measuring the fundamental entropy of the underlying content, not the format of the content itself. • 3) Does one format embedded in another make itself apparent?
  • 70. File Formats Contain Multiple Subformats Another Look At Kernel32.DLL These are all different parts of Kernel32.
  • 71. Quickly Browsing Large Files: Tilt-Shift View • Instead of measuring absolute Y against absolute X, make X relative – Advance through the file going down, look back a number of bytes going right
  • 72. Complain All You Want. Hex Still Sucks.
  • 73. Format Identification • 1) Do different files appear different, and does the appearance reflect the existence of internal structure? – Answer: Yes. They do. • 2) Do different instances of the same file format appear similar? – Answer: Somewhat. Similar content looks like itself, but you’re measuring the fundamental entropy of the underlying content, not the format of the content itself. • 3) Does one format embedded in another make itself apparent? – Answer: Yes. Multiple, distinct sections are clearly visible in a way that hex cannot show.
  • 74. Fuzzer Guidance • 1) Can we locate the actual byte offsets where one section ends and another begins? – Why would we want to? • Fuzzers break parsers. • Many subformats to a format, many subparsers to a parser • To a rough level of approximation, fuzzing a single subformat lets you stress a single subparser • So once we split a file up, we can selectively attack one subparser at a time. • 2) Can we visualize and compare fuzzer operations via Dotplots?
  • 75. Simple Math We select an interesting blob from kernel32.dll. The blob is at pixel offset 507x507, and is a square around 570 pixels wide. Window size on viz was 32. 507*32 = The interesting section starts 16224 bytes into the file. 570*32 = The interesting section is 18240 bytes long.
  • 76. Whats The Actual Data? dd if=kernel32.dll bs=1 skip=16100 | hexdump - | more
  • 77. Using Hardcorr as a “first knife” to locate interesting-to-fuzz regions
  • 78. Fuzzer Guidance • 1) Can we locate the actual byte offsets where one section ends and another begins? – Answer: Yes. We can quickly route from the image to the byte offset, through basic arithmetic. • 2) Can we visualize and compare fuzzer operations via Dotplots?
  • 79. Differentials • Major use of dotplots in bioinformatics is to compare one genome against another – Autocorrelation: Compare A to A – Cross-Correlation: Compare A to B • Most files are sufficiently dissimilar that not very interesting structure shows up – Notable exception: Different versions of the same binary
  • 82. Fuzzers: Very Broken Patchers  Mangle.C – Single Bit Differences CFG9000 – Large Scale Reordering
  • 83. Fuzzer Guidance • 1) Can we locate the actual byte offsets where one section ends and another begins? – Answer: Yes. We can quickly route from the image to the byte offset, through basic arithmetic. • 2) Can we visualize and compare fuzzer operations via Dotplots? –Answer: Yes – visual diffing effectively shows differences between files, including differences introduced by various flavors of fuzzers.