Presented at BrightonSEO September 2021
Did you know that secrets about Google's Web Rendering Service are hiding in plain sight? Discover the relationship between Chromium and Google Search so you can leverage this open-source technology to discover technical SEO issues on your site.
Let us share with you a deep love of Chromium. Chromium runs Chrome. It also runs Google Search's Web Rendering Service. If Chromium adopts it, Google Search adopts it. Join in the love story so you can leverage this open-source technology to discover technical SEO issues on your site.
Do SEOs Need to Know About Chromium? Of CORS! Extended Edition - BrightonSEO 2021
1. @jammer_volts @Christi35135477
Do SEOs Need to Know
About Chromium?
Of CORS!
Christine Brady
Senior Technical SEOs at Deepcrawl
Jamie Indigo
Lead Senior Technical SEOs at Deepcrawl
16. @jammer_volts @ChristineLBrady #BrightonSEO
"When we crawl your page with Googlebot, we
go fetch the content and then we give it to
chrome. Then Chrome runs all the scripts. It
loads additional content.
Once everything's loaded we take a snapshot
of the page and that's the content that
actually gets indexed."
16
- Erik Hendriks, Software Engineer at Google
Rendering (WMConf MTV '19)
Resource:
Rendering
(WMConf
MTV
'19)
19. @jammer_volts @ChristineLBrady #BrightonSEO
19
Chromium is headless browser
"Head" means visual interface.
"Headless" means no visual interface.
Headless browsers can run parallel tests faster and
consume less memory/resources than browsers with
a visual interface (head).
22. @jammer_volts @ChristineLBrady #BrightonSEO
22
Is Chromium the same as Chrome?
No. Chromium & Chrome are Built on the same
framework
● Chromium: is an open source project that’s
mission is to continue building out Chrome to
provide a more secure, fast & reliable
computing service
● Chrome: is a proprietary software that has
the capability to log-in to your Google
account from the browser level
27. @jammer_volts @ChristineLBrady #BrightonSEO
Chromium parses the
crawl DOM and
identifies resources
HTML
*
HTML
Parser
DOM
Tree
<link rel="manifest" href="/manifest.56b1cedc.json">
<link rel="preload" as="font" type="font/woff2"
crossorigin=""
href="/static/media/ZillaSlab-Bold.subset.0beac26b.woff2">
Body
h1 p img
txt txt img
30. @jammer_volts @ChristineLBrady #BrightonSEO
Layout Calculations
DOM
HTML
Style
Sheets
HTML
Parser
CSS
Parser
DOM
Tree
Style
Rules
Render
Tree
Attachment
Layout
Styles rules applied
to elements. Then
Chromium calculates
how much space
each elements takes
up and where it is on
screen
32. @jammer_volts @ChristineLBrady #BrightonSEO
Performance: Core Web Vitals!
Announced
May 5th, 2020
"Earlier this month, the Chrome team announced Core
Web Vitals, a set of metrics related to speed,
responsiveness and visual stability, to help site owners
measure user experience on the web."
Announced
May 28th, 2020
Source
34. @jammer_volts @ChristineLBrady #BrightonSEO
An origin is the exact location of a resource.
Consists of:
1. Protocol (e.g., HTTP or HTTPS)
2. hostname (e.g., hackedu.io)
3. Port (80, 443, 8080, etc.).
Security: Origin policies
Scheme/Protocol Hostname/Domain Port
https:// www.deepcrawl.com :80
35. @jammer_volts @ChristineLBrady #BrightonSEO
Resource:
Same-origin
policy
-
Web
security
|
MDN
The same-origin policy is a critical security mechanism that
restricts how a document or script loaded by one origin can
interact with a resource from another origin.
Requests for same-origin resources are always allowed.
Same-Origin Policy (SOP)
Protocol Hostname Port
https www.deepcrawl.com :80
Protocol Hostname Port
https www.deepcrawl.com :80
I knew I could
rely on you <3
36. @jammer_volts @ChristineLBrady #BrightonSEO
http://store.company.com/dir/page.html
Origin Comparison in action
URL Outcome Reason
http://store.company.com/dir2/other.html Same origin Only the path differs
http://store.company.com/dir/inner/another.html Same origin Only the path differs
https://store.company.com/page.html Failure Different protocol
http://store.company.com:81/dir/page.html Failure Different port (http:// is port 80 by default)
http://news.company.com/dir/page.html Failure Different host
Resource:
Definition
of
an
origin,
Mozilla
37. @jammer_volts @ChristineLBrady #BrightonSEO
Webpages ask for resources on different origins all the time!
Our CORS policy is how we do this safely.
CORS policies protect against cross-site scripting attacks.
Cross-Origin Resource
Sharing (CORS)
Protocol Hostname Port
https www.company.com :80
Protocol Hostname Port
https sketchyscript.biz :80
Denied.
Resource:
Cross-Origin
Resource
Sharing
(CORS)
-
HTTP
|
MDN
38. @jammer_volts @ChristineLBrady #BrightonSEO
Why Should SEOs Care about CORS Errors?
While the Same-Origin Policy can complicate resource
sharing it helps by making the internet a little more secure.
Luckily, CORS exists so we can safely make requests from our
browser and get the necessary data to provide us with a
richer user experience.
By learning about Chromium, and how to respect the
Same-Origin Policy by utilizing CORS, technical SEOs are
helping to do their part in keeping the web more secure and
providing a great user experience by ensuring all of their
content is properly rendered.
That is in addition to the possible profit loss due to resources
or paid ads being blocked… But let’s stick to “doing our part
to make the web a better place.” ;)
39. @jammer_volts @ChristineLBrady #BrightonSEO
Cross Origin Resource Sharing (CORS)
http header request based mechanism that lets the server tell the
browser it’s permitted to use additional origins for loading resources
Server
https://api.deepcrawl.com
https://www.deepcrawl.com
Client
HTTP Header
Access-Control-Allow-Origin :
https://www.deepcrawl.com
40. @jammer_volts @ChristineLBrady #BrightonSEO
The Cross-origin sharing standard can enable cross-site HTTP
requests for:
● Invocations of the XMLHttpRequest or Fetch APIs
● Web Fonts (for cross-domain font usage in @font-face within
CSS)
● WebGL textures.
● Images/video frames drawn to a canvas using drawImage().
● CSS Shapes from images.
What requests use CORS?
Resource:
Cross-Origin
Resource
Sharing
(CORS)
-
HTTP
|
MDN
42. @jammer_volts @ChristineLBrady #BrightonSEO
1. Access-Control-Allow-Origin
2. CORS disabled
3. Reason: CORS request did not succeed
4. Reason: CORS header ‘Origin’ cannot be added
5. Reason: CORS request external redirect not allowed
6. Reason: CORS request not http
7. Reason: CORS header ‘Access-Control-Allow-Origin’ missing
8. Reason: CORS header ‘Access-Control-Allow-Origin’ does not
match ‘xyz’
Common CORS Errors
Resource:
CORS
errors
-
HTTP
|
MDN
44. @jammer_volts @ChristineLBrady #BrightonSEO
CORS: Access-Control-Allow-Origin
CORS HTTP Header Response
Access-Control-Allow-Origin:
● Indicates what client domains are allowed
Accepted Values:
● ( * ) wildcard--allow any origin to access the resource
● specified origins/domain e.g., https://www.deepcrawl.com/
Syntax:
● Access-Control-Allow-Origin: https://www.deepcrawl.com/
Resources: Access-Control-Allow-Origin
45. @jammer_volts @ChristineLBrady #BrightonSEO
How to Test CORS in Chrome Dev Tools
1. Open Chrome Dev Tools
a. PC: Control + Shift
b. Mac: Command + Option + J
2. Select Network Panel
a. Errors will be listed in red, and
the status column will show
“CORS error”
3. Hover over the error, the tooltip will
show the error code such as
“MissingAllowOriginHeader”
No Errors: to view header details click on
the resource file which opens up the
following display
Resources: (1) Dev Chrome 88 CORs Release
46. @jammer_volts @ChristineLBrady #BrightonSEO
Improved CORS Debugging in Chrome Dev Tools
Keeping Chrome Dev Tools Open >
● CORS related ‘TypeErrors’ in the
Console panel link to the Network
panel.
● To view the error messages &
potential solutions, click on the two
far right icons.
Resources: (2) Dev Chrome 93 CORS Release
47. @jammer_volts @ChristineLBrady #BrightonSEO
Resource Allocation: Intensive
Resources
Ads can consume a disproportionate
amount of device resources.
Intensive Resources Result in:
● Drained battery life
● Strained networks/data plans
● Poor user experience
Resources:
(1)
Chromium
Resource
Heavy
Ads
(2)
Eli
Schwartz
Tweet
48. @jammer_volts @ChristineLBrady #BrightonSEO
Intensive Resources Criteria
Ads that meet the following criteria are considered
“heavy”
● Use the main thread for > than 60 seconds total
● Use the main thread for > 15 seconds in any 30
second window (50% utilization over 30 seconds)
● Use more than 4 megabytes of network
bandwidth
Intervention is Projected to Save:
● 12.8% of the network usage
● 16.1% of all CPU usage
Resource:
Handling
Heavy
Ad
Interventions
|
Web
Heavy
Ad
Intervention
Criteria
49. @jammer_volts @ChristineLBrady #BrightonSEO
Resource:
Increasing
HTTPS
adoption
Coming Soon: HTTPS-first future
Beginning in M94, Chrome will offer HTTPS-First Mode. This will:
● Attempt to upgrade all page loads to HTTPS
● Display a full-page warning before loading sites that don’t
support it.
● Limit the ability for sites to opt out of security policies over
insecure connections
● Restrict how, and for how long, Chrome stores site content
provided over insecure connections
51. @jammer_volts @ChristineLBrady #BrightonSEO
What are Workers?
Browsers use a single Main Thread to
run all Javascript in a page.
Javascript was designed around the
idea of a single main thread therefore
has its limitations.
Overcoming this can be done with
“workers.”
There are two different types of
“workers”
● Web Workers
● Service Workers
Webpage
Web Worker
Service Worker
Resource:
Web.dev
Workers
Overview
52. @jammer_volts @ChristineLBrady #BrightonSEO
Web Worker
Are the means for web content to run scripts in
background threads.
The “worker” thread can perform tasks without
interfering with the user interface.
● Workers are only accessible by the script
that called it.
● Workers can be shared by multiple
scripts even in different windows
● Subworkers can be spawned and hosted
within the same origin as the parent
page
Webpage
Web Worker
Resource:
Web.dev
Using
Web
Workers
53. @jammer_volts @ChristineLBrady #BrightonSEO
Service Worker
Webpage
Service Worker
Installing
Installed
Activating
Activated
Redundant
helps to use cached assets first,
providing a default experience, before
getting more data from the network.
Restrictions:
● Runs only across HTTPS for security
purposes
Service Worker Events:
● Install
● Activate
● Message
Functional Events:
● Fetch
● Sync
● Push
Resource:
Web.dev
Using
Service
Workers
54. @jammer_volts @ChristineLBrady #BrightonSEO
Bot tamer Dave Smart created an experiment looking at 4 types of
web workers:
1. Worker 1: Returns the timestamps of when the worker was called and
when it responded (this happens in almost an instant), this is then
displayed on the page.
2. Worker 2: The worker uses the fetch API to request a simple endpoint
to get a timestamp and then returns that and displays it.
3. Worker 3: Does the same thing as 2, but using XMLHttpRequest
instead of fetch.
4. Worker 4: This worker contains a long-running calculation and
returns timestamps when the worker was called, and when it
responded (like worker 1)
Will it index: Web Workers
Resource:
Web
Worker
Content
-
Will
It
Index?
55. @jammer_volts @ChristineLBrady #BrightonSEO
Are Web Workers
indexed?
1. Worker 1 (basic timestamp): Yes!
2. Worker 2 (fetch API): No
3. Worker 3 (XMLHttpRequest): No
4. Worker 4 (long calculation): It Depends
For workers 2 & 3, XHR call were always
reported as other error.
Worker 4 was only indexed if the calculation
took less than 1 second.
Resource:
Web
Worker
Content
-
Will
It
Index?
56. @jammer_volts @ChristineLBrady #BrightonSEO
Nope. This works for human users but Google has no intentions to
support service workers.
Will it index: Service Workers
Source
Resource:
Ask
Me
Anything
about
JS
and
Google
Search,
u/splitti
Reddit