SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Natural Language Processing—An Introduction
Colleen M. Farrelly, Staticlysm
Brief bio –
Colleen M. Farrelly is a machine learning scientist whose expertise includes
supervised learning, unsupervised learning, psychometrics, topological data
analysis, and natural language processing. She has an analytics book in review
that touches upon the analysis of text data with topological data analysis tools.
Introduction
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• What do all of these have in
common?
• Clinical case notes
• Chatbot conversations
• Client email interactions
• Court case
summaries/transcripts
• Published research articles
• Tweets
• Voice recordings
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• Commonalities
• Text data
• Contain potentially-
informative features for
predicting an outcome or
categorizing data
• May contain information
not available in structured
datasets
• Linguistic insight on the
speaker/writer
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Example
Legal
• Imagine both the witness and the robber in these two examples.
• How might these observations impact the outcome of a police investigation?
• Statement 1:
• She pulled the gun, took the money, and ran.
• Statement 2:
• The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the
register, and absconded with the money and a handful of pens.
• How many suspects might the police have to stop to find Bonnie and Clyde?
Which witness statement might have more impact on a jury?
• How might differences in clinical case notes by clinicians inform health outcome
models? How might they reflect on the individual clinician?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Making Sense of Text Data
• Natural language
processing (NLP)
• Collection of tools to parse
human language into
something understandable by
algorithms
• What is said
• Computational linguistics
• Deriving insight about human
behavior or traits based on
text data
• How it’s said
Common NLP Tools
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Parsing Documents/Sentences
An Example
• Tokens (words or punctuation)
• Punctuation (non-word tokens)
• Stop words (less important words)
• Root words (stemming/lemmatizing)
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Tagging Features
• Parts of speech
• Clauses
• Grammatical relations
• Entity recognition
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Deriving Sentiment
• Language-dependent
• Sentiment dictionaries
• Positive/negative/neutral
(afinn, for instance)
• Emotion groups from
psychological models
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Vectorizing/Summarizing Results
• Many options for turning
NLP results into usable
data in machine learning
and statistical tools:
• Vectorization
• Word frequency matrices
• Summary tables
Bonnie hopped into Clyde’s new car.
Using Statistical Tools to Understand NLP
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Summary Statistics
• Common summary
statistic uses
1. Conversation length
(example: engagement
metric)
2. Swear count (example:
escalation marker)
3. Conversation sentiment
over time (example:
engagement and
satisfaction)
4. Key word frequency
(example: products with
most issues)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Use as Machine Learning Features
• Examples combining
NLP data with data
from structured
databases
1. Clustering (example:
types of churn from
client feedback and
account data)
2. Predictive modeling
(example: patient
outcomes from case
notes and medical
records)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Psychometric Applications
• Some published papers:
1. Personality trait
identification in industrial
psychology research
2. Author identification in
plagiarism software
3. Quantification of release
risk in justice systems
4. Quantification of relapse
risk in mental health
applications
Other Uses of NLP
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Other Common NLP Applications
• Chatbots
• Personal assistants
• Translation services
• Sentence completion
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
In General
Useful References/Software
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Main NLP Software Options
• NLTK (Python)
• spaCy (Python)
• Stanford CoreNLP (Java)
• John Snow Labs/Spark NLP (Spark)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Some NLP Literature
• Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ...
& Ré, C. (2020). Cross-modal data programming enables rapid medical machine
learning. Patterns, 100019.
• Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June).
Learning word vectors for sentiment analysis. In Proceedings of the 49th annual
meeting of the association for computational linguistics: Human language
technologies (pp. 142-150).
• Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828),
42-45.
• Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a
system for automated summarization of legal texts. In Proceedings of COLING
2016, the 26th international conference on Computational Linguistics: System
Demonstrations (pp. 258-262).
• Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... &
Chapman, W. (2018). Using clinical Natural Language Processing for health
outcomes research: Overview and actionable suggestions for future advances.
Journal of biomedical informatics, 88, 11-19.
Thank you!
Contact Information
cfarrelly@med.miami.edu
SAS Global 2021 Introduction to Natural Language Processing

Weitere ähnliche Inhalte

Ähnlich wie SAS Global 2021 Introduction to Natural Language Processing

SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiWilliam Nadolski
 
Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely AgileBI Guru
 
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxCase StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxwendolynhalbert
 
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...DATUM LLC
 
Top Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoTop Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoMark Walker
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchBurtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchLinda Burtch
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - ExperimentsGaurav Marwaha
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Intellipaat
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business AnalysisTechShiv
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101MaRS Discovery District
 
NuanceChoosingACodingPartner
NuanceChoosingACodingPartnerNuanceChoosingACodingPartner
NuanceChoosingACodingPartnerLisa Hazen
 
Careers Chamblee 2011
Careers Chamblee 2011Careers Chamblee 2011
Careers Chamblee 2011achamblee
 

Ähnlich wie SAS Global 2021 Introduction to Natural Language Processing (20)

SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William Nadolski
 
Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely
 
Top Tips for eDiscovery Software Demo iControl ESI
Top Tips for eDiscovery Software Demo iControl ESITop Tips for eDiscovery Software Demo iControl ESI
Top Tips for eDiscovery Software Demo iControl ESI
 
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxCase StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
 
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
 
Top Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoTop Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software Demo
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchBurtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
 
ncV
ncVncV
ncV
 
OSAE data final
OSAE data finalOSAE data final
OSAE data final
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business Analysis
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
 
NuanceChoosingACodingPartner
NuanceChoosingACodingPartnerNuanceChoosingACodingPartner
NuanceChoosingACodingPartner
 
Careers Chamblee 2011
Careers Chamblee 2011Careers Chamblee 2011
Careers Chamblee 2011
 
SRECO_Profile
SRECO_ProfileSRECO_Profile
SRECO_Profile
 

Mehr von Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceColleen Farrelly
 

Mehr von Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Kürzlich hochgeladen

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Kürzlich hochgeladen (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

SAS Global 2021 Introduction to Natural Language Processing

  • 1.
  • 2. Natural Language Processing—An Introduction Colleen M. Farrelly, Staticlysm Brief bio – Colleen M. Farrelly is a machine learning scientist whose expertise includes supervised learning, unsupervised learning, psychometrics, topological data analysis, and natural language processing. She has an analytics book in review that touches upon the analysis of text data with topological data analysis tools.
  • 4. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • What do all of these have in common? • Clinical case notes • Chatbot conversations • Client email interactions • Court case summaries/transcripts • Published research articles • Tweets • Voice recordings
  • 5. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • Commonalities • Text data • Contain potentially- informative features for predicting an outcome or categorizing data • May contain information not available in structured datasets • Linguistic insight on the speaker/writer
  • 6. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Example Legal • Imagine both the witness and the robber in these two examples. • How might these observations impact the outcome of a police investigation? • Statement 1: • She pulled the gun, took the money, and ran. • Statement 2: • The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the register, and absconded with the money and a handful of pens. • How many suspects might the police have to stop to find Bonnie and Clyde? Which witness statement might have more impact on a jury? • How might differences in clinical case notes by clinicians inform health outcome models? How might they reflect on the individual clinician?
  • 7. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Making Sense of Text Data • Natural language processing (NLP) • Collection of tools to parse human language into something understandable by algorithms • What is said • Computational linguistics • Deriving insight about human behavior or traits based on text data • How it’s said
  • 9. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Parsing Documents/Sentences An Example • Tokens (words or punctuation) • Punctuation (non-word tokens) • Stop words (less important words) • Root words (stemming/lemmatizing) Bonnie hopped into Clyde’s new car.
  • 10. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Tagging Features • Parts of speech • Clauses • Grammatical relations • Entity recognition Bonnie hopped into Clyde’s new car.
  • 11. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Deriving Sentiment • Language-dependent • Sentiment dictionaries • Positive/negative/neutral (afinn, for instance) • Emotion groups from psychological models Bonnie hopped into Clyde’s new car.
  • 12. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Vectorizing/Summarizing Results • Many options for turning NLP results into usable data in machine learning and statistical tools: • Vectorization • Word frequency matrices • Summary tables Bonnie hopped into Clyde’s new car.
  • 13. Using Statistical Tools to Understand NLP An Overview
  • 14. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Summary Statistics • Common summary statistic uses 1. Conversation length (example: engagement metric) 2. Swear count (example: escalation marker) 3. Conversation sentiment over time (example: engagement and satisfaction) 4. Key word frequency (example: products with most issues)
  • 15. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Use as Machine Learning Features • Examples combining NLP data with data from structured databases 1. Clustering (example: types of churn from client feedback and account data) 2. Predictive modeling (example: patient outcomes from case notes and medical records)
  • 16. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Psychometric Applications • Some published papers: 1. Personality trait identification in industrial psychology research 2. Author identification in plagiarism software 3. Quantification of release risk in justice systems 4. Quantification of relapse risk in mental health applications
  • 18. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Other Common NLP Applications • Chatbots • Personal assistants • Translation services • Sentence completion
  • 19. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. In General
  • 21. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Main NLP Software Options • NLTK (Python) • spaCy (Python) • Stanford CoreNLP (Java) • John Snow Labs/Spark NLP (Spark)
  • 22. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Some NLP Literature • Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ... & Ré, C. (2020). Cross-modal data programming enables rapid medical machine learning. Patterns, 100019. • Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 142-150). • Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828), 42-45. • Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations (pp. 258-262). • Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... & Chapman, W. (2018). Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. Journal of biomedical informatics, 88, 11-19.