This deck was presented in a webinar by Everett Sizemore of Inflow with Q&A participation from Gareth Brown and Patrick Hathaway from URL Profiler. Learn more about content audits here: http://www.goinflow.com/digital-content-audits-seo-inbound-marketing/
2. What We’re Going to Cover
• What is a Content Audit
• 14 Reasons to Do A Content Audit
• How to get the data you need
• How to put all of the data together
• How to analyze the data
• How to present your findings &
recommendations
• Resources and links
goInFlow.com
3. What Is a Content Audit?
“A Mind-Numbingly Detailed Odyssey
Through Your Web Site”
~ Jeffery Veen, CEO and co-founder of Typekit. VP,
Products at Adobe. June, 2002
www.adaptivepath.com/ideas/doing-content-inventory/
“I actually think it’s more like an enlightening journey.
What do you have? What do you need? What don’t you
need? Where can things improve?”
~ Kristina Halvorson, Founder of BrainTraffic and Author of
Content Strategy for the Web. March 2nd, 2009
http://blog.braintraffic.com/2009/03/the-content-inventory-is-your-friend/
goInFlow.com
5. Content Audit or Content Inventory?
Inventory = quantitative. Audit = qualitative.
~ Scott Baldwin, Director, UX & Design
@yellowpencilweb, UX Magazine Contributor
Jan. 2010
http://nform.com/blog/2010/01/doing-a-content-audit-or-inventory/
“A content inventory is the process and the result of
cataloging the entire contents of a website. An allied
practice—a content audit—is the process of
evaluating that content. A content inventory and a
content audit are closely related concepts, and they
are often conducted in tandem.”
~ Wikipedia, Quoted as far back as 2012
goInFlow.com
6. What Is a Content Audit in 2015?
But where is the analysis?
goInFlow.com
7. 14 Reasons to Do a Content Audit
1. Determine
the
most
effec0ve
way
to
escape
a
Panda
penalty
2. Determine
which
pages
need
copywri0ng,
copyedi0ng,
design
or
other
improvements
3. Determine
which
pages
need
to
be
updated
and
made
more
current,
and
priori0ze
them
4. Determine
which
pages
should
be
consolidated
due
to
overlapping
topics
5. Determine
which
pages
should
be
removed,
and
what
the
approach
to
pruning
should
be
6. Priori0ze
based
on
a
variety
of
metrics:
Visits,
Conversions,
PA,
Copyscape
Risk…
7. Find
gap
opportuni0es
to
drive
content
idea0on
and
editorial
calendars
8. Determine
which
pages
are
ranking
for
which
keywords
9. Determine
which
pages
"should"
be
ranking
for
which
keywords
10. Find
the
strongest
pages
on
a
domain
and
develop
a
strategy
to
leverage
them
11. Uncover
content
marke0ng
opportuni0es
12. Audi0ng
and
crea0ng
an
inventory
of
content
assets
when
buying/selling
a
website
13. Understanding
the
content
assets
of
a
new
client
(i.e.
what
you
have
to
work
with)
14. Uncover
other
technical
and
site
architecture
issues
goInFlow.com
8. Find “legacy” pages that have been around longer than most of
the employees, including those that don’t even have the same
design template and branding anymore.
ü Determine
the
most
effec0ve
way
to
escape
a
Panda
penalty
ü Uncover
other
technical
and
site
architecture
issues
ü Determine which pages need copywriting, copyediting, design or other improvements
ü Determine
which
pages
need
to
be
updated
and
made
more
current,
and
priori0ze
them
ü Determine
which
pages
should
be
removed,
and
what
the
approach
to
pruning
should
be
ü Find
the
strongest
pages
on
a
domain
and
develop
a
strategy
to
leverage
them
ü Understanding
the
content
assets
of
a
new
client
(i.e.
what
you
have
to
work
with)
goInFlow.com
9. Finding non-canonical URLs, poor site architecture, orphan pages
and other technical issues.
ü Determine
the
most
effec0ve
way
to
escape
a
Panda
penalty
ü Determine
which
pages
should
be
removed,
and
what
the
approach
to
pruning
should
be
ü Uncover
other
technical
and
site
architecture
issues
goInFlow.com
10. Find pages that were once removed from the index, subsequently
disallowed in the robots.txt file, and then found again by Google -
resulting in thousands of these:
ü Uncover
other
technical
and
site
architecture
issues
ü Determine
which
pages
should
be
removed,
and
what
the
approach
to
pruning
should
be
goInFlow.com
11. Consolidating pagerank from date-based event and campaign
landing pages.
goInFlow.com
/URL-‐1/date/
#
External
Links
/URL-‐2/date/
#
External
Links
New/Canonical/URL
Social
media
Templates….
/URL-‐1/date/
#
External
Links
/URL-‐2/date/
#
External
Links
New/Canonical/URL
13. 1
Petabyte
is
1,000,000,000,000,000
Bytes
That’s
1,000,000,000
Megabytes
or
1,000,000
Gigabytes
goInFlow.com
Google
chewed
through
more
than
20
Petabytes
each
day
back
in
2008.
In
2015…
?
14. goInFlow.com
Google
receives
over
4-‐million
search
queries
per
minute
from
an
internet
popula0on
of
2.4
billion
users.
In
that
same
minute…
• Facebook
users
shared
nearly
2.5
million
0mes.
• Pinterest
users
pinned
3,472
images.
• Twicer
users
tweeted
nearly
300,000
0mes.
• Instagram
users
posted
nearly
220,000
photos.
• YouTube
users
uploaded
72
hours
of
video.
• About
164
Wordpress
Blogs
were
created.
• Yelp
received
another
26,380
reviews.
• Googlebot
may
have
been
was0ng
its
precious
0me
and
resources
crawling
your
tag
pages.
• A
Google
user
could’ve
landed
on
your
outdated
post
from
2008
and
clicked
the
“back”
bucon.
22. Customize
the
process
for
your
needs.
Don’t
need
keyword
research
or
a
keyword
matrix?
Skip
‘em!
Need
topics
and
reading
level?
Ask
URLProfiler
for
uClassify
data
too.
goInFlow.com
29. Content Audit Strategies for
Common Scenarios
From
www.goinflow.com/content-‐audit-‐strategies/
goInFlow.com
30. <segue
type=“pet_peeve”
target=“Magento”>
1.
It
Makes
Diagnosing
Issues
More
Difficult
This
is
a
great
query
that
can
provide
you
with
a
wealth
of
informa0on,
but
only
IF
your
URLs
are
structured
correctly:
site:yourdomain.com
inurl:product
.
2.
It
Makes
Redirects
More
Difficult
If
product
pages
in
in
the
root
I
can
tell
the
server
to
redirect
them
easily
without
affec0ng
all
other
pages
in
the
root,
including
the
home
page.
If
they’re
in
a
“products”
folder
I
can.
3.
You
Lose
Some
Breadcrumb
Control…
</segue>
goInFlow.com
31. Large Site with No Penalty Risk
From
www.goinflow.com/content-‐audit-‐strategies/
goInFlow.com
32. Content Audit Strategies for
Common Scenarios
From
www.goinflow.com/content-‐audit-‐strategies/
goInFlow.com
33. Content Audit Strategies for
Common Scenarios
From
www.goinflow.com/content-‐audit-‐strategies/
goInFlow.com
35. Example Summary of Findings
As a result of our comprehensive content audit, we are recommending the following:
Removal of about 624 pages from Google index by deletion or consolidation:
• 203 Pages were marked for Removal with a 404 error (no redirect needed)
• 110 Pages were marked for Removal with a 301 redirect to another page
• 311 Pages were marked for Consolidation of content into other pages followed by a
redirect to the page into which they were consolidated
• Rewriting or improving of 668 pages
• 605 Product Pages are to be rewritten due to use of manufacturer product
descriptions (duplicate content), these being prioritized by opportunity.
• 63 "Other" pages to be rewritten due to low-quality or duplicate content.
• Keeping 26 pages as-is with no rewriting or improvements needed.
These changes reflect an immediate need to "improve or remove" content in order to
avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality
and duplicate content, especially concerning Representative and Dealers
pages with some added risk from Style pages.
goInFlow.com
37. Resources and Links
• Content Audit Spreadsheet Template
• Strategies for Common Audit Scenarios
• How to Do a Content Audit: Step-By-Step
• URL Profiler
• Screaming Frog
• Excel Vlookup Tutorial
• Import.io Web Data Crawler
• Kimono Turn Websites Into Structured APIs
• A1 Website Analyzer by Microsys (Screaming Frog Alternative)
• Google Big Query – Process big data in the cloud
• Crawl Optimization Article by Blind Five Year Old
• Aleyda’s Page Analysis Tools - http://www.allseosoftware.com/page-analysis-tools/
• Deepcrawl - http://deepcrawl.co.uk/
• Botify - https://www.botify.com/
• Strucr - https://strucr.com
@baglibones
and
@goinflow
on
Twicer
::
hcps://plus.google.com/+EverecSizemore
goInFlow.com
38. Bonus Slide #1 – Keep Products Off the Root
Magento:
www.placementedge.com/blog/how-to-add-a-prefix-to-magento-product-urls/
goInFlow.com
BigCommerce
39. Bonus Slide #2 – How Long Do These Take?
How
long
it
takes
doesn’t
only
depend
on
the
size
of
the
site.
Complexity
is
another
major
issue,
as
is
the
working
style
and
experience
of
the
person
performing
the
audit.
But
generally
speaking…
Low
End:
15
Hours
Average:
30
Hours
High
End:
45
Hours
goInFlow.com
40. Bonus Slide #3 – What About Big Data Sites?
• Import.IO Crawler Webinar - Advanced Web Data Extraction
• Kimono, import.io, can maybe be set up to crawl the site like SF but for bigger sites.
• http://www.microsystools.com/products/website-analyzer/
• Google Big Query (https://cloud.google.com/bigquery/what-is-bigquery)
• - - You can set up a database in a cloud and can have calculated metric.
• Internal traffic and stats
• Log files / SPLUNK
• CMS / eCommerce Platform Exports
goInFlow.com
There are lots of plugins, modules and hacks for common eCommerce or CMS platforms which can help you move product URLs off the root. Volusion does this out of the box. Magento requires some tweaking. And BigCommerce just requires a radio-button change.
This changes depending on the project and the person doing the audit. Even then it can be difficult to predict before you actually dig in, which is something you’re going to have to figure out on your own in order to be able to quote the cost for the client. You can typically use this as a rule of thumb.
Let’s say you have an enterprise-level website with millions of pages, where Screaming Frog has problems crawling and indexing even while using the server version. Or where you are beyond the Free Google Analytics data threshold. Or maybe you just have your own tool preferences. While we don’t have time to go over each of these in depth, this slide includes a few other options for data-gathering.