SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
@hamletbatista
SCALING AUTOMATED
QUALITY TEXT GENERATION
FOR ENTERPRISE SITES
HAMLET BATISTA
@hamletbatista
LET’S CRAWL A FEW PAGES FROM THIS SITE
@hamletbatista
WE ARE MISSING KEY META TAGS!
@hamletbatista
AND SOME PAGES ARE LACKING CONTENT
@hamletbatista
LET’S FIX THAT WITH AUTOMATION!
We want to address 4 scenarios common in enterprise websites.
For large ecommerce sites, we will focus on:
1. Pages with large images and no text.
2. Pages with large images and some text.
For large web publishers, we will focus on:
1. Pages with a lot of quality text and no metadata.
2. Pages with very little text.
@hamletbatista
AGENDA
We are going to explore different text generation strategies and
recommend the best one for each problem.
Specifically, we will cover:
1. Image captioning
2. Visual question and answering
3. Text summarization
4. Question and answering from text (short answers)
5. Long-form question and answering
6. Full article generation
@hamletbatista
AGENDA
We are going to build two models from scratch:
1. We will build an image captioning and visual question and answering
model
2. We will also build a state of the art text summarization model
At the end, I will share some resources to learn more about these topics.
@hamletbatista
HOW TO FIND RELEVANT RESEARCH AND CODE
Papers
with Code
@hamletbatista
HOW TO FIND RELEVANT RESEARCH AND CODE
Papers
with Code
(SOTA)
@hamletbatista
TEXT GENERATION FOR ECOMMERCE SITES
@hamletbatista
IMAGE CAPTIONING AND VISUAL QUESTION ANSWERING
Bottom-
Up and
Top-
Down
Attention
for Image
Captionin
g and
Visual
Question
Answering
@hamletbatista
THE PYTHIA MODULAR FRAMEWORK
Pythia
Github
@hamletbatista
THE PYTHIA MODULAR FRAMEWORK
Pythia
@hamletbatista
IMAGE CAPTIONING AND VISUAL ANSWERING RESULTS
Pythia
@hamletbatista
LET’S BUILD A CAPTIONING MODEL!
Pythia
captionin
g demo
@hamletbatista
RUN ALL CELLS
Pythia
captionin
g demo
@hamletbatista
USE THE LAST SECTION TO TEST IMAGES
Pythia
captionin
g demo
@hamletbatista
USE THE LAST SECTION TO CAPTION IMAGES
“a group
of people
in a boat
on the
water”
@hamletbatista
THERE IS MORE TEXT HIDDEN IN THE OTHER IMAGES!
@hamletbatista
TRY THE NEW TITLES FROM CAPTIONS IN CLOUDFLARE
FIRST
Cloudflare
App
@hamletbatista
YOU CAN ALSO ASK QUESTIONS ABOUT IMAGES
“how
many
people?”
“4 with
62.9
confidenc
e”
@hamletbatista
YOU CAN ALSO ASK QUESTIONS ABOUT IMAGES
“what are
these
people
riding?”
“boat with
99.96
confidenc
e”
@hamletbatista
CAPTIONING AND VISUAL QUESTION ANSWERING PAPER
Bottom-
Up and
Top-
Down
Attention
for Image
Captionin
g and
Visual
Question
Answerin
g
@hamletbatista
CAPTIONING AND VISUAL QUESTION ANSWERING
RESULTS
Bottom-
Up and
Top-
Down
Attention
for Image
Captionin
g and
Visual
Question
Answerin
g
@hamletbatista
WHERE DID I LEARN ABOUT THIS?
Advanced
Machine
Learning
Specializa
tion
@hamletbatista
WHERE DID I LEARN ABOUT THIS?
Introducti
on to
Deep
Learning
@hamletbatista
TEXT GENERATION FOR WEB PUBLISHERS
@hamletbatista
AI TEXT GENERATOR: TALKTOTRANSFORMER.COM
Talk to
transform
er
@hamletbatista
TEXT SUMMARIZATION
Papers
with Code
(Text
Summariz
ation)
@hamletbatista
TEXT SUMMARIZATION PAPER (EXTRACTIVE)
Papers
with Code
(Extractiv
e Text
Summariz
ation)
@hamletbatista
TEXT SUMMARIZATION RESULTS (EXTRACTIVE)
Fine-tune
BERT for
Extractive
Summariz
ation
@hamletbatista
TEXT SUMMARIZATION PAPER (ABSTRACTIVE)
Papers
with Code
(Abstracti
ve Text
Summariz
ation)
@hamletbatista
TEXT SUMMARIZATION RESULTS (ABSTRACTIVE)
MASS:
Masked
Sequence
to
Sequence
Pre-
training
for
Language
Generatio
n
@hamletbatista
LET’S BUILD AN EXTRACTIVE TEXT SUMMARIZATION
MODEL!
https://github.
com/nlpyang/B
ertSum
@hamletbatista
LET’S BUILD AN EXTRACTIVE TEXT SUMMARIZATION
MODEL!
https://github.
com/nlpyang/B
ertSum
@hamletbatista
BERTSUM MODEL OVERVIEW
"Meanwhile,
although BERT has
segmentation
embeddings for
indicating different
sentences, it only
has
two labels (sentence
A or sentence B),
instead of
multiple sentences
as in extractive
summarization.
Therefore, we
modify the input
sequence and
embeddings of BERT
to make it possible
@hamletbatista
BERTSUM DOWNLOAD AND SETUP
BERTSUM Colab
notebook
@hamletbatista
BERTSUM TRAINING BERT+TRANSFORMER MODEL
BERTSUM Colab
notebook
@hamletbatista
BERTSUM TRAINING BERT+TRANSFORMER MODEL
BERTSUM Colab
notebook
#first run
#Change -visible_gpus 0,1,2 -gpu_ranks 0,1,2 -
world_size 3 to -visible_gpus 0 -gpu_ranks 0 -world_size
1,
#after downloading, you could kill the process and rerun
the code with multi-GPUs.
#BERT+Transformer model
!python train.py -mode train -encoder transformer -
dropout 0.1 
-bert_data_path ../bert_data/cnndm 
-model_path ../models/bert_transformer -lr 2e-3 -
visible_gpus 0 -gpu_ranks 0 -world_size 1 
-report_every 50 -save_checkpoint_steps 1000 
-batch_size 3000 -decay_method noam -train_steps
50000 
-accum_count 2 -log_file ../logs/bert_transformer 
-use_interval true -warmup_steps 10000 -ff_size 2048
@hamletbatista
BERTSUM TRAINING BERT+TRANSFORMER MODEL
We are simply
following the
instructions in
the Github
repository
@hamletbatista
BERTSUM TRAINING BERT+TRANSFORMER MODEL
1. Training takes two days on Colab (with interruptions)
2. Saving progress and resuming is critical
@hamletbatista
BERTSUM TRAINING BERT+TRANSFORMER MODEL
BERTSUM Colab
notebook
@hamletbatista
BERTSUM RESUMING TRAINING
BERTSUM Colab
notebook
#resume run
#Change -visible_gpus 0,1,2 -gpu_ranks 0,1,2 -world_size 3 to -visible_gpus 0 -
gpu_ranks 0 -world_size 1,
#after downloading, you could kill the process and rerun the code with multi-GPUs.
#BERT+Transformer model
!python train.py -mode train -encoder transformer -dropout 0.1 
-train_from ../../drive/My Drive/Presentations/DeepCrawl
Webinar/models/bert_transformer/model_step_49000.pt 
-bert_data_path ../bert_data/cnndm 
-model_path ../../drive/My Drive/Presentations/DeepCrawl
Webinar/models/bert_transformer 
-lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -world_size 1 
-report_every 50 -save_checkpoint_steps 1000 
-batch_size 3000 -decay_method noam -train_steps 50000 
-accum_count 2 
-log_file ../../drive/My Drive/Presentations/DeepCrawl Webinar/logs/bert_transformer 
-use_interval true -warmup_steps 10000 -ff_size 2048
@hamletbatista
SIMPLER: JUST GET A TRAINED MODEL FROM THE
INVENTOR!
BERTSUM Colab
notebook
@hamletbatista
BERTSUM TESTING BERT+TRANSFORMER MODEL
BERTSUM Colab
notebook
@hamletbatista
BERTSUM TESTING RESULTS
BERTSUM Colab
notebook
Gold Summary: 'click on the brilliant
interactive graphic below for details on each
hole of the masters 2015 course',
Candidate Summary after 50,000 training
steps: 'click on the graphic below to get a
closer look at what the biggest names in the
game will face when they tee off on thursday
.',
@hamletbatista
TEXT GENERATION FOR WEB PUBLISHERS
@hamletbatista
QUESTION ANSWERING
Papers
with Code
(Question
Answering
)
@hamletbatista
QUESTION ANSWERING PAPER: XLNET
Papers
with Code
(Question
Answering
)
@hamletbatista
QUESTION ANSWERING RESULTS: XLNET
XLNet:
Generalized
Autoregressi
ve
Pretraining
for Language
Understandi
ng
@hamletbatista
XLNET CODE
https://github.
com/zihangdai
/xlnet
@hamletbatista
LONG-FORM QUESTION ANSWERING
Introducin
g long-
form
question
answering
@hamletbatista
LONG-FORM QUESTION ANSWERING
Subreddit:
Explain it
Like I'm
Five
@hamletbatista
LONG-FORM QUESTION ANSWERING
Scripts
and links
to
recreate
the ELI5
dataset.
@hamletbatista
LONG-FORM QUESTION ANSWERING BASELINE
@hamletbatista
FINALLY, LET’S GO FOR SOMETHING MORE AMBITIOUS
Generatin
g
Wikipedia
by
Summarizi
ng Long
Sequences
@hamletbatista
GENERATING WIKIPEDIA BY SUMMARIZING LONG
SEQUENCES
Generatin
g
Wikipedia
by
Summarizi
ng Long
Sequences
@hamletbatista
GENERATING WIKIPEDIA BY SUMMARIZING LONG
SEQUENCES
CONCLUSION
“We have shown that generating Wikipedia can be approached as a
multi-document summarization
problem with a large, parallel dataset, and demonstrated a two-stage
extractive-abstractive framework for carrying it out. The coarse
extraction method used in the first stage appears to have a significant
effect on final performance, suggesting further research on improving it
would be fruitful.
We introduce a new, decoder-only sequence transduction model for the
abstractive stage, capable of
handling very long input-output examples. This model significantly
outperforms traditional encoder/decoder architectures on long
sequences, allowing us to condition on many reference documents and
to generate coherent and informative Wikipedia articles.”
Generatin
g
Wikipedia
by
Summarizi
ng Long
Sequences
@hamletbatista
CAN WE HAVE THE SOURCE CODE? YES!
Github
link
@hamletbatista
HERE ARE SOME TRAINING COST ESTIMATES
Github
link
@hamletbatista
RESOURCES TO LEARN MORE
Faster Data Science Education
https://www.kaggle.com/learn/overview
Data Scientist’s Guide to Summarization
https://towardsdatascience.com/data-scientists-guide-to-summarization-fc0db952e363
An open source neural machine translation system
http://opennmt.net/
Bottom-Up Abstractive Summarization
http://opennmt.net/OpenNMT-py/Summarization.html
Abstractive Text Summarization (tutorial 2) , Text Representation made very easy
https://hackernoon.com/abstractive-text-summarization-tutorial-2-text-representation-made-very-easy-ef4511a1a46
@hamletbatista
RESOURCES TO LEARN MORE
Build an Abstractive Text Summarizer in 94 Lines of Tensorflow !! (Tutorial 6)
https://hackernoon.com/build-an-abstractive-text-summarizer-in-94-lines-of-tensorflow-tutorial-6-f0e1b4d88b55
What Is ROUGE And How It Works For Evaluation Of Summarization Tasks?
https://rxnlp.com/how-rouge-works-for-evaluation-of-summarization-tasks/
Introducing Eli5: How Facebook is Tackling Long-Form Question-Answering Conversations
https://towardsdatascience.com/introducing-eli5-how-facebook-is-tackling-long-form-question-answering-
conversations-4f8e59374717
Pythia’s Documentation
https://learnpythia.readthedocs.io/en/latest/

Weitere ähnliche Inhalte

Was ist angesagt?

Scaling Keyword Research to Find Content Gaps
Scaling Keyword Research to Find Content GapsScaling Keyword Research to Find Content Gaps
Scaling Keyword Research to Find Content GapsHamlet Batista
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google LighthouseHamlet Batista
 
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...How to scale SEO work NOBODY wants to do (including your competitors) to rapi...
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...Hamlet Batista
 
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server LogsTechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server LogsCatalyst
 
Debugging rendering problems at scale
Debugging rendering problems at scaleDebugging rendering problems at scale
Debugging rendering problems at scaleGiacomo Zecchini
 
TechSEO Boost 2018: Python for SEOs
TechSEO Boost 2018: Python for SEOsTechSEO Boost 2018: Python for SEOs
TechSEO Boost 2018: Python for SEOsCatalyst
 
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021Alex Wright
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceGiacomo Zecchini
 
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEO
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEOUse Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEO
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEOGerry White
 
TechSEO Boost: Machine Learning for SEOs
TechSEO Boost: Machine Learning for SEOsTechSEO Boost: Machine Learning for SEOs
TechSEO Boost: Machine Learning for SEOsCatalyst
 
Software Testing for SEO
Software Testing for SEOSoftware Testing for SEO
Software Testing for SEOMichael King
 
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick StoxPubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stoxpatrickstox
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...Catalyst
 
Hey Googlebot, did you cache that ?
Hey Googlebot, did you cache that ?Hey Googlebot, did you cache that ?
Hey Googlebot, did you cache that ?Petra Kis-Herczegh
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...Charly Wargnier
 
Brighton SEO July 2021 How JavaScript is preventing you from passing Core W...
Brighton SEO July 2021   How JavaScript is preventing you from passing Core W...Brighton SEO July 2021   How JavaScript is preventing you from passing Core W...
Brighton SEO July 2021 How JavaScript is preventing you from passing Core W...Izabela Wisniewska
 
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesTechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesCatalyst
 
SEO for Angular - BrightonSEO 2018
SEO for Angular - BrightonSEO 2018SEO for Angular - BrightonSEO 2018
SEO for Angular - BrightonSEO 2018Jamie Indigo
 
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptCatalyst
 
The Rise of JavaScript and What it Means for SEO
The Rise of JavaScript and What it Means for SEOThe Rise of JavaScript and What it Means for SEO
The Rise of JavaScript and What it Means for SEOPatrick Hathaway
 

Was ist angesagt? (20)

Scaling Keyword Research to Find Content Gaps
Scaling Keyword Research to Find Content GapsScaling Keyword Research to Find Content Gaps
Scaling Keyword Research to Find Content Gaps
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google Lighthouse
 
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...How to scale SEO work NOBODY wants to do (including your competitors) to rapi...
How to scale SEO work NOBODY wants to do (including your competitors) to rapi...
 
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server LogsTechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
 
Debugging rendering problems at scale
Debugging rendering problems at scaleDebugging rendering problems at scale
Debugging rendering problems at scale
 
TechSEO Boost 2018: Python for SEOs
TechSEO Boost 2018: Python for SEOsTechSEO Boost 2018: Python for SEOs
TechSEO Boost 2018: Python for SEOs
 
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering service
 
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEO
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEOUse Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEO
Use Google Docs to monitor SEO by pulling in Google Analytics #BrightonSEO
 
TechSEO Boost: Machine Learning for SEOs
TechSEO Boost: Machine Learning for SEOsTechSEO Boost: Machine Learning for SEOs
TechSEO Boost: Machine Learning for SEOs
 
Software Testing for SEO
Software Testing for SEOSoftware Testing for SEO
Software Testing for SEO
 
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick StoxPubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
 
Hey Googlebot, did you cache that ?
Hey Googlebot, did you cache that ?Hey Googlebot, did you cache that ?
Hey Googlebot, did you cache that ?
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
 
Brighton SEO July 2021 How JavaScript is preventing you from passing Core W...
Brighton SEO July 2021   How JavaScript is preventing you from passing Core W...Brighton SEO July 2021   How JavaScript is preventing you from passing Core W...
Brighton SEO July 2021 How JavaScript is preventing you from passing Core W...
 
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesTechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
 
SEO for Angular - BrightonSEO 2018
SEO for Angular - BrightonSEO 2018SEO for Angular - BrightonSEO 2018
SEO for Angular - BrightonSEO 2018
 
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
 
The Rise of JavaScript and What it Means for SEO
The Rise of JavaScript and What it Means for SEOThe Rise of JavaScript and What it Means for SEO
The Rise of JavaScript and What it Means for SEO
 

Ähnlich wie Scaling automated quality text generation for enterprise sites

Doing More with Less: Automated, High-Quality Content Generation
Doing More with Less: Automated, High-Quality Content GenerationDoing More with Less: Automated, High-Quality Content Generation
Doing More with Less: Automated, High-Quality Content GenerationHamlet Batista
 
Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP yucefmerhi
 
Dependency Injection for PHP
Dependency Injection for PHPDependency Injection for PHP
Dependency Injection for PHPmtoppa
 
RailsConf 2018 - Webpacking for the Journey Ahead
RailsConf 2018 - Webpacking for the Journey AheadRailsConf 2018 - Webpacking for the Journey Ahead
RailsConf 2018 - Webpacking for the Journey AheadTaylor Jones
 
10 WordPress Theme Hacks to Improve Your Site
10 WordPress Theme Hacks to Improve Your Site10 WordPress Theme Hacks to Improve Your Site
10 WordPress Theme Hacks to Improve Your SiteMorten Rand-Hendriksen
 
WordPress theme frameworks
WordPress theme frameworksWordPress theme frameworks
WordPress theme frameworksEddie Johnston
 
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)Ben Limmer
 
Advanced Thesis Techniques and Tricks
Advanced Thesis Techniques and TricksAdvanced Thesis Techniques and Tricks
Advanced Thesis Techniques and TricksBrad Williams
 
How to use a blog for publishing scientific research: A training guide part 2
How to use a blog for publishing scientific research: A training guide part 2How to use a blog for publishing scientific research: A training guide part 2
How to use a blog for publishing scientific research: A training guide part 2AfricanCommonsProject
 
WordPress Standardized Loop API
WordPress Standardized Loop APIWordPress Standardized Loop API
WordPress Standardized Loop APIChris Jean
 
What’s New in Rails 5.0?
What’s New in Rails 5.0?What’s New in Rails 5.0?
What’s New in Rails 5.0?Unboxed
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
 
WordPress Development Confoo 2010
WordPress Development Confoo 2010WordPress Development Confoo 2010
WordPress Development Confoo 2010Brendan Sera-Shriar
 
ClassicPress / WordPress
ClassicPress / WordPressClassicPress / WordPress
ClassicPress / WordPressbtopro
 
Lecture 1 - Comm Lab: Web @ ITP
Lecture 1 - Comm Lab: Web @ ITPLecture 1 - Comm Lab: Web @ ITP
Lecture 1 - Comm Lab: Web @ ITPyucefmerhi
 
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009Daryl Koopersmith
 
Ember.js - Harnessing Convention Over Configuration
Ember.js - Harnessing Convention Over ConfigurationEmber.js - Harnessing Convention Over Configuration
Ember.js - Harnessing Convention Over ConfigurationTracy Lee
 
Drupal 7 Theming - what's new
Drupal 7 Theming - what's newDrupal 7 Theming - what's new
Drupal 7 Theming - what's newMarek Sotak
 
Agular in a microservices world
Agular in a microservices worldAgular in a microservices world
Agular in a microservices worldBrecht Billiet
 

Ähnlich wie Scaling automated quality text generation for enterprise sites (20)

Doing More with Less: Automated, High-Quality Content Generation
Doing More with Less: Automated, High-Quality Content GenerationDoing More with Less: Automated, High-Quality Content Generation
Doing More with Less: Automated, High-Quality Content Generation
 
Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP
 
Dependency Injection for PHP
Dependency Injection for PHPDependency Injection for PHP
Dependency Injection for PHP
 
Atomic design
Atomic designAtomic design
Atomic design
 
RailsConf 2018 - Webpacking for the Journey Ahead
RailsConf 2018 - Webpacking for the Journey AheadRailsConf 2018 - Webpacking for the Journey Ahead
RailsConf 2018 - Webpacking for the Journey Ahead
 
10 WordPress Theme Hacks to Improve Your Site
10 WordPress Theme Hacks to Improve Your Site10 WordPress Theme Hacks to Improve Your Site
10 WordPress Theme Hacks to Improve Your Site
 
WordPress theme frameworks
WordPress theme frameworksWordPress theme frameworks
WordPress theme frameworks
 
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)
[OLD] Understanding Github PR Merge Options (1up-ing your git skills part 2)
 
Advanced Thesis Techniques and Tricks
Advanced Thesis Techniques and TricksAdvanced Thesis Techniques and Tricks
Advanced Thesis Techniques and Tricks
 
How to use a blog for publishing scientific research: A training guide part 2
How to use a blog for publishing scientific research: A training guide part 2How to use a blog for publishing scientific research: A training guide part 2
How to use a blog for publishing scientific research: A training guide part 2
 
WordPress Standardized Loop API
WordPress Standardized Loop APIWordPress Standardized Loop API
WordPress Standardized Loop API
 
What’s New in Rails 5.0?
What’s New in Rails 5.0?What’s New in Rails 5.0?
What’s New in Rails 5.0?
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
WordPress Development Confoo 2010
WordPress Development Confoo 2010WordPress Development Confoo 2010
WordPress Development Confoo 2010
 
ClassicPress / WordPress
ClassicPress / WordPressClassicPress / WordPress
ClassicPress / WordPress
 
Lecture 1 - Comm Lab: Web @ ITP
Lecture 1 - Comm Lab: Web @ ITPLecture 1 - Comm Lab: Web @ ITP
Lecture 1 - Comm Lab: Web @ ITP
 
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
 
Ember.js - Harnessing Convention Over Configuration
Ember.js - Harnessing Convention Over ConfigurationEmber.js - Harnessing Convention Over Configuration
Ember.js - Harnessing Convention Over Configuration
 
Drupal 7 Theming - what's new
Drupal 7 Theming - what's newDrupal 7 Theming - what's new
Drupal 7 Theming - what's new
 
Agular in a microservices world
Agular in a microservices worldAgular in a microservices world
Agular in a microservices world
 

Mehr von Hamlet Batista

A Deep Dive Into SEO Tactics For Modern Javascript Frameworks
A Deep Dive Into SEO Tactics For Modern Javascript FrameworksA Deep Dive Into SEO Tactics For Modern Javascript Frameworks
A Deep Dive Into SEO Tactics For Modern Javascript FrameworksHamlet Batista
 
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCHamlet Batista
 
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 MinutosCreando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 MinutosHamlet Batista
 
The Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy MarketerThe Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy MarketerHamlet Batista
 
Agile SEO: Faster SEO Results
Agile SEO: Faster SEO ResultsAgile SEO: Faster SEO Results
Agile SEO: Faster SEO ResultsHamlet Batista
 
Python for Data-driven Storytelling
Python for Data-driven StorytellingPython for Data-driven Storytelling
Python for Data-driven StorytellingHamlet Batista
 
Data and Evidence-driven SEO
Data and Evidence-driven SEOData and Evidence-driven SEO
Data and Evidence-driven SEOHamlet Batista
 
Why Pay for Performance When You Can Lead the World To Your Door for Free?
Why Pay for Performance When You Can Lead the World To Your Door for Free?Why Pay for Performance When You Can Lead the World To Your Door for Free?
Why Pay for Performance When You Can Lead the World To Your Door for Free?Hamlet Batista
 
Gettin' It Up And Keepin' It Up in Google
Gettin' It Up And Keepin' It Up in GoogleGettin' It Up And Keepin' It Up in Google
Gettin' It Up And Keepin' It Up in GoogleHamlet Batista
 
Batista, Hamlet, Beyond The Usual Link Building
Batista, Hamlet, Beyond The Usual Link BuildingBatista, Hamlet, Beyond The Usual Link Building
Batista, Hamlet, Beyond The Usual Link BuildingHamlet Batista
 

Mehr von Hamlet Batista (12)

A Deep Dive Into SEO Tactics For Modern Javascript Frameworks
A Deep Dive Into SEO Tactics For Modern Javascript FrameworksA Deep Dive Into SEO Tactics For Modern Javascript Frameworks
A Deep Dive Into SEO Tactics For Modern Javascript Frameworks
 
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGC
 
SEO Meets Automation
SEO Meets AutomationSEO Meets Automation
SEO Meets Automation
 
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 MinutosCreando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
Creando una Sección de FAQS y su Marcado de Datos Estructurados en 30 Minutos
 
The Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy MarketerThe Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy Marketer
 
Agile SEO: Faster SEO Results
Agile SEO: Faster SEO ResultsAgile SEO: Faster SEO Results
Agile SEO: Faster SEO Results
 
Python for Data-driven Storytelling
Python for Data-driven StorytellingPython for Data-driven Storytelling
Python for Data-driven Storytelling
 
Data and Evidence-driven SEO
Data and Evidence-driven SEOData and Evidence-driven SEO
Data and Evidence-driven SEO
 
Why Pay for Performance When You Can Lead the World To Your Door for Free?
Why Pay for Performance When You Can Lead the World To Your Door for Free?Why Pay for Performance When You Can Lead the World To Your Door for Free?
Why Pay for Performance When You Can Lead the World To Your Door for Free?
 
Gettin' It Up And Keepin' It Up in Google
Gettin' It Up And Keepin' It Up in GoogleGettin' It Up And Keepin' It Up in Google
Gettin' It Up And Keepin' It Up in Google
 
Batista, Hamlet, Beyond The Usual Link Building
Batista, Hamlet, Beyond The Usual Link BuildingBatista, Hamlet, Beyond The Usual Link Building
Batista, Hamlet, Beyond The Usual Link Building
 
White Hat Cloaking
White Hat CloakingWhite Hat Cloaking
White Hat Cloaking
 

Kürzlich hochgeladen

Crafting High-Converting eCommerce Landing Pages
Crafting High-Converting eCommerce Landing PagesCrafting High-Converting eCommerce Landing Pages
Crafting High-Converting eCommerce Landing PagesVWO
 
A navigation of two creative processes Study
A navigation of two creative processes StudyA navigation of two creative processes Study
A navigation of two creative processes Studystuwilson.co.uk
 
The best Crypto Marketing Strategies pdf
The best Crypto Marketing Strategies pdfThe best Crypto Marketing Strategies pdf
The best Crypto Marketing Strategies pdfShifali roy
 
Digital Marketing Analytics: Driving Hotel Success (2016 May report)
Digital Marketing Analytics: Driving Hotel Success (2016 May report)Digital Marketing Analytics: Driving Hotel Success (2016 May report)
Digital Marketing Analytics: Driving Hotel Success (2016 May report)yaeyukimoto
 
Increase Your Website Sales & Leads Webinar
Increase Your Website Sales & Leads WebinarIncrease Your Website Sales & Leads Webinar
Increase Your Website Sales & Leads WebinarSEO Optimizers
 
SVETLANA YONCHEVA Evolution of digital marketing.pdf
SVETLANA YONCHEVA Evolution of digital marketing.pdfSVETLANA YONCHEVA Evolution of digital marketing.pdf
SVETLANA YONCHEVA Evolution of digital marketing.pdfvikrs213
 
Podvertise.fm - Founder.University - Pitch Deck 2024
Podvertise.fm - Founder.University - Pitch Deck 2024Podvertise.fm - Founder.University - Pitch Deck 2024
Podvertise.fm - Founder.University - Pitch Deck 2024Nedko Nedkov
 
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch Deck
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch DeckPodvertise.fm - Podcast Advertising Marketplace - Startup Pitch Deck
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch DeckNedko Nedkov
 
Run more experiments with fewer resources
Run more experiments with fewer resourcesRun more experiments with fewer resources
Run more experiments with fewer resourcesVWO
 
A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfVWO
 
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdfEastern Online-iSURVEY
 
Product Demo: HubSpot's Coolest AI Tools for B2B Tech Companies
Product Demo: HubSpot's Coolest AI Tools for B2B Tech CompaniesProduct Demo: HubSpot's Coolest AI Tools for B2B Tech Companies
Product Demo: HubSpot's Coolest AI Tools for B2B Tech CompaniesKiwi Creative
 
Converting with Comedy: Research Parallels for CRO
Converting with Comedy: Research Parallels for CROConverting with Comedy: Research Parallels for CRO
Converting with Comedy: Research Parallels for CROVWO
 
Ppt regarding of Digital Marketing cours
Ppt regarding of Digital Marketing coursPpt regarding of Digital Marketing cours
Ppt regarding of Digital Marketing courstegveersingh09
 
2024 Google SERP Features: New Strategies To Gain Visibility
2024 Google SERP Features: New Strategies To Gain Visibility2024 Google SERP Features: New Strategies To Gain Visibility
2024 Google SERP Features: New Strategies To Gain VisibilitySearch Engine Journal
 
Unifying feature management with experiments - Server Side Webinar (1).pdf
Unifying feature management with experiments - Server Side Webinar (1).pdfUnifying feature management with experiments - Server Side Webinar (1).pdf
Unifying feature management with experiments - Server Side Webinar (1).pdfVWO
 
Ice Cream Brand Harmony Study - TINT Emotional Profiling Research
Ice Cream Brand Harmony Study - TINT Emotional Profiling ResearchIce Cream Brand Harmony Study - TINT Emotional Profiling Research
Ice Cream Brand Harmony Study - TINT Emotional Profiling ResearchTINT Marketing
 
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptx
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptxToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptx
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptxivanrazine1
 
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptx
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptxFriends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptx
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptxGregory Edwards
 
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Course
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design CourseElevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Course
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Courseamirshaikhv21realtyp
 

Kürzlich hochgeladen (20)

Crafting High-Converting eCommerce Landing Pages
Crafting High-Converting eCommerce Landing PagesCrafting High-Converting eCommerce Landing Pages
Crafting High-Converting eCommerce Landing Pages
 
A navigation of two creative processes Study
A navigation of two creative processes StudyA navigation of two creative processes Study
A navigation of two creative processes Study
 
The best Crypto Marketing Strategies pdf
The best Crypto Marketing Strategies pdfThe best Crypto Marketing Strategies pdf
The best Crypto Marketing Strategies pdf
 
Digital Marketing Analytics: Driving Hotel Success (2016 May report)
Digital Marketing Analytics: Driving Hotel Success (2016 May report)Digital Marketing Analytics: Driving Hotel Success (2016 May report)
Digital Marketing Analytics: Driving Hotel Success (2016 May report)
 
Increase Your Website Sales & Leads Webinar
Increase Your Website Sales & Leads WebinarIncrease Your Website Sales & Leads Webinar
Increase Your Website Sales & Leads Webinar
 
SVETLANA YONCHEVA Evolution of digital marketing.pdf
SVETLANA YONCHEVA Evolution of digital marketing.pdfSVETLANA YONCHEVA Evolution of digital marketing.pdf
SVETLANA YONCHEVA Evolution of digital marketing.pdf
 
Podvertise.fm - Founder.University - Pitch Deck 2024
Podvertise.fm - Founder.University - Pitch Deck 2024Podvertise.fm - Founder.University - Pitch Deck 2024
Podvertise.fm - Founder.University - Pitch Deck 2024
 
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch Deck
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch DeckPodvertise.fm - Podcast Advertising Marketplace - Startup Pitch Deck
Podvertise.fm - Podcast Advertising Marketplace - Startup Pitch Deck
 
Run more experiments with fewer resources
Run more experiments with fewer resourcesRun more experiments with fewer resources
Run more experiments with fewer resources
 
A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdf
 
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of February 2024-EOLiSurvey (EN).pdf
 
Product Demo: HubSpot's Coolest AI Tools for B2B Tech Companies
Product Demo: HubSpot's Coolest AI Tools for B2B Tech CompaniesProduct Demo: HubSpot's Coolest AI Tools for B2B Tech Companies
Product Demo: HubSpot's Coolest AI Tools for B2B Tech Companies
 
Converting with Comedy: Research Parallels for CRO
Converting with Comedy: Research Parallels for CROConverting with Comedy: Research Parallels for CRO
Converting with Comedy: Research Parallels for CRO
 
Ppt regarding of Digital Marketing cours
Ppt regarding of Digital Marketing coursPpt regarding of Digital Marketing cours
Ppt regarding of Digital Marketing cours
 
2024 Google SERP Features: New Strategies To Gain Visibility
2024 Google SERP Features: New Strategies To Gain Visibility2024 Google SERP Features: New Strategies To Gain Visibility
2024 Google SERP Features: New Strategies To Gain Visibility
 
Unifying feature management with experiments - Server Side Webinar (1).pdf
Unifying feature management with experiments - Server Side Webinar (1).pdfUnifying feature management with experiments - Server Side Webinar (1).pdf
Unifying feature management with experiments - Server Side Webinar (1).pdf
 
Ice Cream Brand Harmony Study - TINT Emotional Profiling Research
Ice Cream Brand Harmony Study - TINT Emotional Profiling ResearchIce Cream Brand Harmony Study - TINT Emotional Profiling Research
Ice Cream Brand Harmony Study - TINT Emotional Profiling Research
 
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptx
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptxToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptx
ToShare_UG 13_03_24_Full_BelgianTrailblazerCommunity.pptx
 
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptx
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptxFriends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptx
Friends of Search '24 - Scaling SEO_ Lessons for All Types of Sites.pptx
 
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Course
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design CourseElevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Course
Elevate Your Design Skills: Enroll in Pune's Premier UI/UX Design Course
 

Scaling automated quality text generation for enterprise sites

Hinweis der Redaktion

  1. Writing quality content and meta data at scale is a big problem for most enterprise sites. In this webinar we are going to explore what is possible given the latest advances in deep learning and natural language processing. Our main focus is going to be about generating metadata: titles, meta descriptions, h1s, etc that are critical for technical SEO performance. But, we will cover full article generation as well.
  2. I will also cover the concepts you need to understand to get practical value out of these advanced techniques.
  3. I will also cover the concepts you need to understand to get practical value out of these advanced techniques.
  4. I love the site Papers with Code. It has a clearly organized and frequently updated list of the latest deep learning papers that include code to reproduce their results.
  5. Feel free to browse the SOTA (state of the art section) that has many of the best papers. That is where we found several of the examples we will be reviewing Today.
  6. When it comes to generating quality metadata for web publisher sites, we are mostly talking about article pages. Let’s explore two examples: one with a lot of text, and another with very little text.
  7. When it comes to generating metadata text and there is a lot of it, the most appropriate approach is text summarization. We have two types of automated text summarization techniques: extractive and abstractive. Extractive copies the most relevant sentences in the text and abstractive generates new sentences.
  8. The paper I used for this example is this one: Fine-tune BERT for Extractive Summarization by Yang Liu. Not only he shared the code needed to reproduce his paper results, he emailed me a trained model when I asked. I found his paper on the Papers with code website. We will walk step by step on how to put this code to practical use for our problem.
  9. You can copy my notebook and follow my steps
  10. You can copy my notebook and follow my steps
  11. You can copy my notebook and follow my steps
  12. You can copy my notebook and follow my steps
  13. You can copy my notebook and follow my steps
  14. You can copy my notebook and follow my steps
  15. You can copy my notebook and follow my steps
  16. You can copy my notebook and follow my steps
  17. You can copy my notebook and follow my steps
  18. When it comes to generating quality metadata for web publisher sites, we are mostly talking about article pages. Let’s explore two examples: one with a lot of text, and another with very little text.
  19. When it comes to generating metadata text and there is a lot of it, the most appropriate approach is text summarization. We have two types of automated text summarization techniques: extractive and abstractive. Extractive copies the most relevant sentences in the text and abstractive generates new sentences.