SlideShare a Scribd company logo
1 of 67
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Text Mining Using LDA with Context
Christoph Kling, Steffen Staab
Web and Internet Science Group · ECS · University of Southampton, UK &
Text Mining Using LDA with Context 2/68Steffen Staab
Text Mining Documents
Documents are
 PDFs, emails, tweets,
Flickr photo tags, CVs, ...
Documents consist of
 bag of words
 metadata
- author(s)
- timestamp
- geolocation
- publisher
- booktitle
- device
...
Chinese
food
Vegan
food
Break
-
fast
dimsum
duck
eggs
...
vegan
tofu
...
eggs
ham
...
Objective:
Cluster, categorize,
& explain
Text Mining Using LDA with Context 3/68Steffen Staab
Latent Dirichlet Allocation (LDA)
Text Mining Using LDA with Context 4/68Steffen Staab
Latent Dirichlet Allocation (LDA)
Document-topic distributions
Topic-word distributions
K topics
M documents
Each doc m M has length Nm
Text Mining Using LDA with Context 5/68Steffen Staab
Use Metadata to Help Topic Prediction
 Improve topic detection
→ Morning times may help to improve the breakfast topic
 Describe dependencies: metadata ↔ topics
→ breakfast topic happens
during morning hours Chinese
food
Vegan
food
Break
-
fast
dimsum
duck
eggs
...
vegan
tofu
...
eggs
ham
...
Text Mining Using LDA with Context 6/68Steffen Staab
Use Metadata to Help Topic Prediction
 Improve topic detection
→ Morning times may help to improve the breakfast topic
 Describe dependencies: metadata ↔ topics
→ breakfast topic happens
during morning hours
 Usage
 Autocompletion
→ From words to words
 Prediction of search queries
→ From metadata to words
→ From words to metadata
Chinese
food
Vegan
food
Break
-
fast
dimsum
duck
eggs
...
vegan
tofu
...
eggs
ham
...
Text Mining Using LDA with Context 7/68Steffen Staab
 Nominal
 Ordinal
 Cyclic
 Spherical
 Networked
Structures of Metadata Spaces Nejdl
Staa
b
Kling
Text Mining Using LDA with Context 8/68Steffen Staab
Challenges for Using Metadata for Text Mining
 Generalizing the Text Mining Model
Creating a special text mining model for every dataset with its
kind of metadata spaces is impractical
→ we need flexible models!
Text Mining Using LDA with Context 9/68Steffen Staab
Challenges for Using Metadata for Text Mining
 Generalizing the Text Mining Model
 Efficiency of the Text Mining Model
Rich metadata
→ complex models
→ complex inference, slow convergence of samplers
→ analysis of big datasets impossible
Text Mining Using LDA with Context 10/68Steffen Staab
Challenges for Using Metadata for Text Mining
 Generalizing the Text Mining Model
 Efficiency of the Text Mining Model
 Explaining the Result
Importance of Metadata
→ learn how to weight metadata
→ exclude irrelevant metadata (improves efficiency!)
Complex dependencies & complex probability functions
→ Learned parameters incomprehensible
→ Reduced usefulness for data analysis / visualisation
→ No sanity checks on parameters
Text Mining Using LDA with Context 11/68Steffen Staab
Topic Models for Arbitrary Metadata
Text Mining Using LDA with Context 12/68Steffen Staab
Topic Models for Arbitrary Metadata
 Predict document-topic distributions using metadata
→ Gaussian Process Regression Topic Model
(Agovic & Banerjee, 2012)
→ Dirichlet-Multinomial Regression Topic Model
(Mimno & McCallum, 2012)
→ Structural Topic Model (logistic normal regression)
(Roberts et al., 2013)
Text Mining Using LDA with Context 13/68Steffen Staab
Topic Models for Arbitrary Metadata
 Predict document-topic distributions using metadata
→ Gaussian Process Regression Topic Model
→ Dirichlet-Multinomial Regression Topic Model
→ Structural Topic Model (logistic normal regression)
Regression input: Metadata
Regression output: Topic distribution
Text Mining Using LDA with Context 14/68Steffen Staab
Topic Models for Arbitrary Metadata
Dirichlet-multinomial regression
Metadata
Document-topic distributions
Text Mining Using LDA with Context 15/68Steffen Staab
Topic Models for Arbitrary Metadata
Gaussian process regression
Metadata
Document-topic distributions
Text Mining Using LDA with Context 16/68Steffen Staab
Topic Models for Arbitrary Metadata
Logistic normal regression
Metadata
Document-topic distributions
Text Mining Using LDA with Context 17/68Steffen Staab
Topic Models for Arbitrary Metadata
 Alternating inference:
 Estimate topics
 Estimate regression model
 Use prediction for re-estimating topics
 Re-estimate regression model with new topics
 ...
Text Mining Using LDA with Context 18/68Steffen Staab
Topic Models for Arbitrary Metadata
 Alternating inference:
 Estimate topics
 Estimate regression model
 Use prediction for re-estimating topics
 Re-estimate regression model with new topics
 ...
Text Mining Using LDA with Context 19/68Steffen Staab
Topic Models for Arbitrary Metadata
 Applicable to a wide range of metadata!
 Estimation of regression parameters relatively expensive
 Learned parameters have no natural interpretation
 Alternating process of paramter estimation is expensive
Text Mining Using LDA with Context 20/68Steffen Staab
Topic Models for Arbitrary Metadata
 Dirichlet-multinomial and logistic-normal regression do not
support complex input data
(i.e. geographical data, temporal cycles, …)
 Gaussian process regression topic models are very
powerful with the right kernel function
...but require expert knowledge for kernel selection and
efficient inference!
Text Mining Using LDA with Context 21/68Steffen Staab
Hierarchical
Multi-Dirichlet Process
Topic Models
The Idea
Text Mining Using LDA with Context 22/68Steffen Staab
Topic Prediction
TopicProbability
Metadata (e.g. time)
Documents, e.g. emails
Text Mining Using LDA with Context 23/68Steffen Staab
Dirichlet-Multinomial Regression
TopicProbability
Metadata (e.g. time)
Text Mining Using LDA with Context 24/68Steffen Staab
Gaussian Process Regression
TopicProbability
Metadata (e.g. time)
TopicProbability
Text Mining Using LDA with Context 25/68Steffen Staab
Cluster-Based Prediction
TopicProbability
Metadata (e.g. time)
Text Mining Using LDA with Context 26/68Steffen Staab
Cluster-Based Prediction
TopicProbability
Metadata (e.g. time)
Text Mining Using LDA with Context 27/68Steffen Staab
Cluster-Based Prediction
TopicProbability
Metadata (e.g. time)
TopicProbabilityTopicProbabilityTopicProbability
Text Mining Using LDA with Context 28/68Steffen Staab
Cluster-Based Prediction
TopicProbability
Metadata (e.g. time)
TopicProbabilityTopicProbabilityTopicProbability
Text Mining Using LDA with Context 29/68Steffen Staab
Idea
 Two-step model:
1)Cluster similar documents
2)Learn topics for clusters and documents simultaneously
▪ Learn topic distributions of document clusters
▪ Use cluster-topic distributions for topic prediction
Text Mining Using LDA with Context 30/68Steffen Staab
Performance, Complex Metadata
 Cluster documents for each metadata
Text Mining Using LDA with Context 31/68Steffen Staab
Performance, Complex Metadata
 Cluster documents for each metadata
Text Mining Using LDA with Context 32/68Steffen Staab
Performance, Complex Metadata
 Cluster documents for each metadata
+ nominal, ordinal, cyclic, spherical data
+ any data which can be clustered!
Text Mining Using LDA with Context 33/68Steffen Staab
Performance, Complex Metadata
 Metadata clusters are associated with topics
German Beer
Party
Text Mining Using LDA with Context 34/68Steffen Staab
Mixture of Metadata Predictions
 Metadata clusters are associated with topics
German Beer
Party
 The topic prediction for a single document is a mixture of
the prediction of its metadata clusters
Text Mining Using LDA with Context 35/68Steffen Staab
Smoothing of HMDP
Text Mining Using LDA with Context 36/68Steffen Staab
Cluster-Based Prediction vs Outliers and noisy data
TopicProbability
Metadata (e.g. time)
Text Mining Using LDA with Context 37/68Steffen Staab
Adjacency Smoothing
 Naive approach: Smoothed value of a cluster is the mean
of the cluster and its adjacent clusters
 Repeat n times
Text Mining Using LDA with Context 38/68Steffen Staab
Smoothing topics associated with metadata clusters
 Documents receive topics from their own and neighboring
metadata clusters
Text Mining Using LDA with Context 39/68Steffen Staab
Performance, Complex Metadata
 Smooth topics associated with metadata clusters
Text Mining Using LDA with Context 40/68Steffen Staab
 Nominal
 Ordinal
 Cyclic
 Spherical
 Networked
Text Mining Using LDA with Context 41/68Steffen Staab
Smoothing
 Smoothing-strength is learned during inference
Similar clusters → stronger smoothing
Dissimilar clusters → softer smoothing
 Smoothing-strength alternatively can be predefined by user
Text Mining Using LDA with Context 42/68Steffen Staab
Metadata Weighting in HMDP's
Text Mining Using LDA with Context 43/68Steffen Staab
Feature Weighting
 One variable governs the influence of metadata cluster on
documents
 If η < threshold, ignore variable.
η
Text Mining Using LDA with Context 44/68Steffen Staab
Metadata Weighting
 Importance of metadata is learned during inference,
answering the question:
How many percent of the topics are explained by a given
metadata? (e.g. time, geographical coordinates, ...)
→ Interpretable parameter!
 Metadata with a low weight can be removed during
inference
Text Mining Using LDA with Context 45/68Steffen Staab
Example Application
Text Mining Using LDA with Context 46/68Steffen Staab
Dataset
 Linux Kernel Mailinglist
3,400,000 emails with timestamps and mailinglist ID
Text Mining Using LDA with Context 47/68Steffen Staab
Dataset
 Linux Kernel Mailinglist
3,400,000 emails with timestamps and mailinglist ID
 Timeline
 Yearly cycle
 Weekly cycle
 Daily cycle
 Mailing list
Text Mining Using LDA with Context 48/68Steffen Staab
Topics
Text Mining Using LDA with Context 49/68Steffen Staab
Topics
Text Mining Using LDA with Context 50/68Steffen Staab
Topics
 Professional topics:
 Hobbyist topics:
Text Mining Using LDA with Context 51/68Steffen Staab
Topics
 Metadata weighting:
Text Mining Using LDA with Context 52/68Steffen Staab
Topics
 Metadata weighting:
can be removed during inference
Text Mining Using LDA with Context 53/68Steffen Staab
Efficient Inference in HMDP
Text Mining Using LDA with Context 54/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Cluster-topic distributions
Document-topic distributions
Metadata
Text Mining Using LDA with Context 55/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Inference:
Nearly completely collapsed
inference!
Text Mining Using LDA with Context 56/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
We only need to learn
 Global topic distribution
 Topic assignments to words
Text Mining Using LDA with Context 57/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
We only need to learn
 Global topic distribution
 Topic assignments to words
 Dirichlet parameters
Text Mining Using LDA with Context 58/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Approximations:
 Variational
 Practical
 Stochastic
→ low memory consumption
→ online inference
Text Mining Using LDA with Context 59/68Steffen Staab
Parameters of HMDP
 Cluster-topic distributions:
How many documents of a cluster contain topic x?
Text Mining Using LDA with Context 60/68Steffen Staab
Parameters of HMDP
 Cluster-topic distributions:
How many documents of a cluster contain topic x?
 Metadata-weights
How many of the topics of documents are explained
by metadata x?
Text Mining Using LDA with Context 61/68Steffen Staab
Parameters of HMDP
 Cluster-topic distributions:
How many documents of a cluster contain topic x?
 Metadata-weights
How many of the topics of documents are explained
by metadata x?
 Dirichlet process scaling parameters
How many pseudo-counts do we add to the topic
distributions?
Text Mining Using LDA with Context 62/68Steffen Staab
Properties of HMDP
 Interpretable parameters
 Simultaneous inference of topics and metadata-topic
dependencies
 Efficient online inference
Text Mining Using LDA with Context 63/68Steffen Staab
Comparison of
Topic Models for Arbitrary Metadata
Text Mining Using LDA with Context 64/68Steffen Staab
Comparison
 Gaussian Process Topic Model
The “perfect” model:
 Can cope with arbitrary metadata
 Models dependencies between metadata
 Parameter learning is very expensive
 Kernel selection and inference require expert knowledge
 Parameters of Gaussian processes hard to interpret
Text Mining Using LDA with Context 65/68Steffen Staab
Comparison
 Multinomial Regression Topic Model
The “straight-forward” model:
 Can cope with many metadata
 Parameter learning is cheaper than for Gaussian
processes but still expensive (due to alternating inference
and repeated distance calculations)
 Can not cope with complex metadata
(e.g. geographical, cyclic, ...)
 Does not model dependencies between metadata
 Regression weights of Dirichlet-multinomial regression
hard to interpret
Text Mining Using LDA with Context 66/68Steffen Staab
Comparison
 Hierarchical Multi-Dirichlet Process Topic Model
The “fast” model:
 Can cope with arbitrary metadata
 Fast inference (simultaneously for topics and topic
predictions)
 All parameters have natural interpretations as probabilities
or pseudo-counts
 Requires a (simple) pre-clustering of documents
 Does not model dependencies between metadata
Text Mining Using LDA with Context 67/68Steffen Staab
THANK YOU FOR YOUR
ATTENTION!

More Related Content

What's hot

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analyticsFarheen Nilofer
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...AIST
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 

What's hot (20)

Topic Models
Topic ModelsTopic Models
Topic Models
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Text categorization
Text categorizationText categorization
Text categorization
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
Konstantin Vorontsov - BigARTM: Open Source Library for Regularized Multimoda...
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 

Viewers also liked

LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionCory Andrew Henson
 
Topic Models, LDA and all that
Topic Models, LDA and all thatTopic Models, LDA and all that
Topic Models, LDA and all thatZhibo Xiao
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 
Presentación de la firma
Presentación de la firmaPresentación de la firma
Presentación de la firmaAPelagio
 
Text Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesText Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesMax Kaiser
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsSeth Grimes
 
Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digitallucenerevolution
 
A Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data AnalysisA Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data AnalysisTomonari Masada
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationChristoph Trattner
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)rchbeir
 

Viewers also liked (20)

LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
PCA vs LDA
PCA vs LDAPCA vs LDA
PCA vs LDA
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Topic Models, LDA and all that
Topic Models, LDA and all thatTopic Models, LDA and all that
Topic Models, LDA and all that
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Using Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive ComputingUsing Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive Computing
 
Presentación de la firma
Presentación de la firmaPresentación de la firma
Presentación de la firma
 
Text Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesText Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: Challenges
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and Semantics
 
eDrugTrends: Social Media Analysis to Monitor Cannabis Trends
eDrugTrends: Social Media Analysis to Monitor Cannabis TrendseDrugTrends: Social Media Analysis to Monitor Cannabis Trends
eDrugTrends: Social Media Analysis to Monitor Cannabis Trends
 
Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digital
 
A Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data AnalysisA Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data Analysis
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
LDA
LDALDA
LDA
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 

Similar to Text Mining using LDA with Context

Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataRicard de la Vega
 
Topic modeling using big data analytics
Topic modeling using big data analytics Topic modeling using big data analytics
Topic modeling using big data analytics Farheen Nilofer
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexChris Wilper
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Yves Raimond
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationNick Pentreath
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Chris Fregly
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Intel® Software
 
Sagemaker built_in algorithems.pptx
Sagemaker built_in algorithems.pptxSagemaker built_in algorithems.pptx
Sagemaker built_in algorithems.pptxasifshahzad100
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 

Similar to Text Mining using LDA with Context (20)

Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Topic modeling using big data analytics
Topic modeling using big data analytics Topic modeling using big data analytics
Topic modeling using big data analytics
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource Index
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and Personalization
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*
 
Sagemaker built_in algorithems.pptx
Sagemaker built_in algorithems.pptxSagemaker built_in algorithems.pptx
Sagemaker built_in algorithems.pptx
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 

More from Steffen Staab

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Steffen Staab
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableSteffen Staab
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Steffen Staab
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudSteffen Staab
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagSteffen Staab
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and SpreadingSteffen Staab
 
10 Jahre Web Science
10 Jahre Web Science10 Jahre Web Science
10 Jahre Web ScienceSteffen Staab
 
Wwsss intro2016-final
Wwsss intro2016-finalWwsss intro2016-final
Wwsss intro2016-finalSteffen Staab
 
10 Years Web Science
10 Years Web Science10 Years Web Science
10 Years Web ScienceSteffen Staab
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSteffen Staab
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015Steffen Staab
 
ISWC2015 Opening Session
ISWC2015 Opening SessionISWC2015 Opening Session
ISWC2015 Opening SessionSteffen Staab
 
Bias in the Social Web
Bias in the Social WebBias in the Social Web
Bias in the Social WebSteffen Staab
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Steffen Staab
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySteffen Staab
 

More from Steffen Staab (20)

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, Sustainable
 
Eyeing the Web
Eyeing the WebEyeing the Web
Eyeing the Web
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag Terminologietag
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and Spreading
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
10 Jahre Web Science
10 Jahre Web Science10 Jahre Web Science
10 Jahre Web Science
 
Wwsss intro2016-final
Wwsss intro2016-finalWwsss intro2016-final
Wwsss intro2016-final
 
10 Years Web Science
10 Years Web Science10 Years Web Science
10 Years Web Science
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015
 
ISWC2015 Opening Session
ISWC2015 Opening SessionISWC2015 Opening Session
ISWC2015 Opening Session
 
Bias in the Social Web
Bias in the Social WebBias in the Social Web
Bias in the Social Web
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 

Recently uploaded

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 

Recently uploaded (20)

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 

Text Mining using LDA with Context

  • 1. Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Text Mining Using LDA with Context Christoph Kling, Steffen Staab Web and Internet Science Group · ECS · University of Southampton, UK &
  • 2. Text Mining Using LDA with Context 2/68Steffen Staab Text Mining Documents Documents are  PDFs, emails, tweets, Flickr photo tags, CVs, ... Documents consist of  bag of words  metadata - author(s) - timestamp - geolocation - publisher - booktitle - device ... Chinese food Vegan food Break - fast dimsum duck eggs ... vegan tofu ... eggs ham ... Objective: Cluster, categorize, & explain
  • 3. Text Mining Using LDA with Context 3/68Steffen Staab Latent Dirichlet Allocation (LDA)
  • 4. Text Mining Using LDA with Context 4/68Steffen Staab Latent Dirichlet Allocation (LDA) Document-topic distributions Topic-word distributions K topics M documents Each doc m M has length Nm
  • 5. Text Mining Using LDA with Context 5/68Steffen Staab Use Metadata to Help Topic Prediction  Improve topic detection → Morning times may help to improve the breakfast topic  Describe dependencies: metadata ↔ topics → breakfast topic happens during morning hours Chinese food Vegan food Break - fast dimsum duck eggs ... vegan tofu ... eggs ham ...
  • 6. Text Mining Using LDA with Context 6/68Steffen Staab Use Metadata to Help Topic Prediction  Improve topic detection → Morning times may help to improve the breakfast topic  Describe dependencies: metadata ↔ topics → breakfast topic happens during morning hours  Usage  Autocompletion → From words to words  Prediction of search queries → From metadata to words → From words to metadata Chinese food Vegan food Break - fast dimsum duck eggs ... vegan tofu ... eggs ham ...
  • 7. Text Mining Using LDA with Context 7/68Steffen Staab  Nominal  Ordinal  Cyclic  Spherical  Networked Structures of Metadata Spaces Nejdl Staa b Kling
  • 8. Text Mining Using LDA with Context 8/68Steffen Staab Challenges for Using Metadata for Text Mining  Generalizing the Text Mining Model Creating a special text mining model for every dataset with its kind of metadata spaces is impractical → we need flexible models!
  • 9. Text Mining Using LDA with Context 9/68Steffen Staab Challenges for Using Metadata for Text Mining  Generalizing the Text Mining Model  Efficiency of the Text Mining Model Rich metadata → complex models → complex inference, slow convergence of samplers → analysis of big datasets impossible
  • 10. Text Mining Using LDA with Context 10/68Steffen Staab Challenges for Using Metadata for Text Mining  Generalizing the Text Mining Model  Efficiency of the Text Mining Model  Explaining the Result Importance of Metadata → learn how to weight metadata → exclude irrelevant metadata (improves efficiency!) Complex dependencies & complex probability functions → Learned parameters incomprehensible → Reduced usefulness for data analysis / visualisation → No sanity checks on parameters
  • 11. Text Mining Using LDA with Context 11/68Steffen Staab Topic Models for Arbitrary Metadata
  • 12. Text Mining Using LDA with Context 12/68Steffen Staab Topic Models for Arbitrary Metadata  Predict document-topic distributions using metadata → Gaussian Process Regression Topic Model (Agovic & Banerjee, 2012) → Dirichlet-Multinomial Regression Topic Model (Mimno & McCallum, 2012) → Structural Topic Model (logistic normal regression) (Roberts et al., 2013)
  • 13. Text Mining Using LDA with Context 13/68Steffen Staab Topic Models for Arbitrary Metadata  Predict document-topic distributions using metadata → Gaussian Process Regression Topic Model → Dirichlet-Multinomial Regression Topic Model → Structural Topic Model (logistic normal regression) Regression input: Metadata Regression output: Topic distribution
  • 14. Text Mining Using LDA with Context 14/68Steffen Staab Topic Models for Arbitrary Metadata Dirichlet-multinomial regression Metadata Document-topic distributions
  • 15. Text Mining Using LDA with Context 15/68Steffen Staab Topic Models for Arbitrary Metadata Gaussian process regression Metadata Document-topic distributions
  • 16. Text Mining Using LDA with Context 16/68Steffen Staab Topic Models for Arbitrary Metadata Logistic normal regression Metadata Document-topic distributions
  • 17. Text Mining Using LDA with Context 17/68Steffen Staab Topic Models for Arbitrary Metadata  Alternating inference:  Estimate topics  Estimate regression model  Use prediction for re-estimating topics  Re-estimate regression model with new topics  ...
  • 18. Text Mining Using LDA with Context 18/68Steffen Staab Topic Models for Arbitrary Metadata  Alternating inference:  Estimate topics  Estimate regression model  Use prediction for re-estimating topics  Re-estimate regression model with new topics  ...
  • 19. Text Mining Using LDA with Context 19/68Steffen Staab Topic Models for Arbitrary Metadata  Applicable to a wide range of metadata!  Estimation of regression parameters relatively expensive  Learned parameters have no natural interpretation  Alternating process of paramter estimation is expensive
  • 20. Text Mining Using LDA with Context 20/68Steffen Staab Topic Models for Arbitrary Metadata  Dirichlet-multinomial and logistic-normal regression do not support complex input data (i.e. geographical data, temporal cycles, …)  Gaussian process regression topic models are very powerful with the right kernel function ...but require expert knowledge for kernel selection and efficient inference!
  • 21. Text Mining Using LDA with Context 21/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Models The Idea
  • 22. Text Mining Using LDA with Context 22/68Steffen Staab Topic Prediction TopicProbability Metadata (e.g. time) Documents, e.g. emails
  • 23. Text Mining Using LDA with Context 23/68Steffen Staab Dirichlet-Multinomial Regression TopicProbability Metadata (e.g. time)
  • 24. Text Mining Using LDA with Context 24/68Steffen Staab Gaussian Process Regression TopicProbability Metadata (e.g. time) TopicProbability
  • 25. Text Mining Using LDA with Context 25/68Steffen Staab Cluster-Based Prediction TopicProbability Metadata (e.g. time)
  • 26. Text Mining Using LDA with Context 26/68Steffen Staab Cluster-Based Prediction TopicProbability Metadata (e.g. time)
  • 27. Text Mining Using LDA with Context 27/68Steffen Staab Cluster-Based Prediction TopicProbability Metadata (e.g. time) TopicProbabilityTopicProbabilityTopicProbability
  • 28. Text Mining Using LDA with Context 28/68Steffen Staab Cluster-Based Prediction TopicProbability Metadata (e.g. time) TopicProbabilityTopicProbabilityTopicProbability
  • 29. Text Mining Using LDA with Context 29/68Steffen Staab Idea  Two-step model: 1)Cluster similar documents 2)Learn topics for clusters and documents simultaneously ▪ Learn topic distributions of document clusters ▪ Use cluster-topic distributions for topic prediction
  • 30. Text Mining Using LDA with Context 30/68Steffen Staab Performance, Complex Metadata  Cluster documents for each metadata
  • 31. Text Mining Using LDA with Context 31/68Steffen Staab Performance, Complex Metadata  Cluster documents for each metadata
  • 32. Text Mining Using LDA with Context 32/68Steffen Staab Performance, Complex Metadata  Cluster documents for each metadata + nominal, ordinal, cyclic, spherical data + any data which can be clustered!
  • 33. Text Mining Using LDA with Context 33/68Steffen Staab Performance, Complex Metadata  Metadata clusters are associated with topics German Beer Party
  • 34. Text Mining Using LDA with Context 34/68Steffen Staab Mixture of Metadata Predictions  Metadata clusters are associated with topics German Beer Party  The topic prediction for a single document is a mixture of the prediction of its metadata clusters
  • 35. Text Mining Using LDA with Context 35/68Steffen Staab Smoothing of HMDP
  • 36. Text Mining Using LDA with Context 36/68Steffen Staab Cluster-Based Prediction vs Outliers and noisy data TopicProbability Metadata (e.g. time)
  • 37. Text Mining Using LDA with Context 37/68Steffen Staab Adjacency Smoothing  Naive approach: Smoothed value of a cluster is the mean of the cluster and its adjacent clusters  Repeat n times
  • 38. Text Mining Using LDA with Context 38/68Steffen Staab Smoothing topics associated with metadata clusters  Documents receive topics from their own and neighboring metadata clusters
  • 39. Text Mining Using LDA with Context 39/68Steffen Staab Performance, Complex Metadata  Smooth topics associated with metadata clusters
  • 40. Text Mining Using LDA with Context 40/68Steffen Staab  Nominal  Ordinal  Cyclic  Spherical  Networked
  • 41. Text Mining Using LDA with Context 41/68Steffen Staab Smoothing  Smoothing-strength is learned during inference Similar clusters → stronger smoothing Dissimilar clusters → softer smoothing  Smoothing-strength alternatively can be predefined by user
  • 42. Text Mining Using LDA with Context 42/68Steffen Staab Metadata Weighting in HMDP's
  • 43. Text Mining Using LDA with Context 43/68Steffen Staab Feature Weighting  One variable governs the influence of metadata cluster on documents  If η < threshold, ignore variable. η
  • 44. Text Mining Using LDA with Context 44/68Steffen Staab Metadata Weighting  Importance of metadata is learned during inference, answering the question: How many percent of the topics are explained by a given metadata? (e.g. time, geographical coordinates, ...) → Interpretable parameter!  Metadata with a low weight can be removed during inference
  • 45. Text Mining Using LDA with Context 45/68Steffen Staab Example Application
  • 46. Text Mining Using LDA with Context 46/68Steffen Staab Dataset  Linux Kernel Mailinglist 3,400,000 emails with timestamps and mailinglist ID
  • 47. Text Mining Using LDA with Context 47/68Steffen Staab Dataset  Linux Kernel Mailinglist 3,400,000 emails with timestamps and mailinglist ID  Timeline  Yearly cycle  Weekly cycle  Daily cycle  Mailing list
  • 48. Text Mining Using LDA with Context 48/68Steffen Staab Topics
  • 49. Text Mining Using LDA with Context 49/68Steffen Staab Topics
  • 50. Text Mining Using LDA with Context 50/68Steffen Staab Topics  Professional topics:  Hobbyist topics:
  • 51. Text Mining Using LDA with Context 51/68Steffen Staab Topics  Metadata weighting:
  • 52. Text Mining Using LDA with Context 52/68Steffen Staab Topics  Metadata weighting: can be removed during inference
  • 53. Text Mining Using LDA with Context 53/68Steffen Staab Efficient Inference in HMDP
  • 54. Text Mining Using LDA with Context 54/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Model (HMDP) Cluster-topic distributions Document-topic distributions Metadata
  • 55. Text Mining Using LDA with Context 55/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Model (HMDP) Inference: Nearly completely collapsed inference!
  • 56. Text Mining Using LDA with Context 56/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Model (HMDP) We only need to learn  Global topic distribution  Topic assignments to words
  • 57. Text Mining Using LDA with Context 57/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Model (HMDP) We only need to learn  Global topic distribution  Topic assignments to words  Dirichlet parameters
  • 58. Text Mining Using LDA with Context 58/68Steffen Staab Hierarchical Multi-Dirichlet Process Topic Model (HMDP) Approximations:  Variational  Practical  Stochastic → low memory consumption → online inference
  • 59. Text Mining Using LDA with Context 59/68Steffen Staab Parameters of HMDP  Cluster-topic distributions: How many documents of a cluster contain topic x?
  • 60. Text Mining Using LDA with Context 60/68Steffen Staab Parameters of HMDP  Cluster-topic distributions: How many documents of a cluster contain topic x?  Metadata-weights How many of the topics of documents are explained by metadata x?
  • 61. Text Mining Using LDA with Context 61/68Steffen Staab Parameters of HMDP  Cluster-topic distributions: How many documents of a cluster contain topic x?  Metadata-weights How many of the topics of documents are explained by metadata x?  Dirichlet process scaling parameters How many pseudo-counts do we add to the topic distributions?
  • 62. Text Mining Using LDA with Context 62/68Steffen Staab Properties of HMDP  Interpretable parameters  Simultaneous inference of topics and metadata-topic dependencies  Efficient online inference
  • 63. Text Mining Using LDA with Context 63/68Steffen Staab Comparison of Topic Models for Arbitrary Metadata
  • 64. Text Mining Using LDA with Context 64/68Steffen Staab Comparison  Gaussian Process Topic Model The “perfect” model:  Can cope with arbitrary metadata  Models dependencies between metadata  Parameter learning is very expensive  Kernel selection and inference require expert knowledge  Parameters of Gaussian processes hard to interpret
  • 65. Text Mining Using LDA with Context 65/68Steffen Staab Comparison  Multinomial Regression Topic Model The “straight-forward” model:  Can cope with many metadata  Parameter learning is cheaper than for Gaussian processes but still expensive (due to alternating inference and repeated distance calculations)  Can not cope with complex metadata (e.g. geographical, cyclic, ...)  Does not model dependencies between metadata  Regression weights of Dirichlet-multinomial regression hard to interpret
  • 66. Text Mining Using LDA with Context 66/68Steffen Staab Comparison  Hierarchical Multi-Dirichlet Process Topic Model The “fast” model:  Can cope with arbitrary metadata  Fast inference (simultaneously for topics and topic predictions)  All parameters have natural interpretations as probabilities or pseudo-counts  Requires a (simple) pre-clustering of documents  Does not model dependencies between metadata
  • 67. Text Mining Using LDA with Context 67/68Steffen Staab THANK YOU FOR YOUR ATTENTION!