SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Slide 1Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Knowledge Discovery in
Social Media and
Scientific Digital Libraries
Ansgar Scherp
Darmstadt, Feb 9, 2016
Thanks to: Chifumi Nishioka, Falk Böschen
Slide 2Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
KDD Social Media & Digital Libraries
How to deal with the vast amount of content related to
research and innovation?
“Ability to deal with digital information will be an
important cultural technique as reading and writing.”
Slide 3Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
KDD Social Media & Digital Libraries
• Examples of current research
1. Classifying tweets
2. Automated subject indexing
3. Extracting text from scholarly figures
• Today not in covered
–Schema-extraction from Linked Open Data
–Analysis of evolution of Linked Open Data
Slide 4Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Classifying Tweets: Example
How far are there fundamental differences between
different approaches for tweet classification?
Author’s hashtag:
(here: none)
Human: #research
#talk #darmstadt
Machine: #talk
#socialmedia
(e.g., [Nishida et al. 12])(e.g., [Ren et al. 14]
[Yang et al. 14])
[NSD15] C. Nishioka, A. Scherp, and K. Dellschaft: Comparing Tweet Classifications by
Authors' Hashtags, Machine Learning, and Human Annotators, WI, Singapore, 2015.
Slide 5Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Twitter Dataset: TREC Tweets2011
• Contains about 16 million tweets
• Randomly created
10 main topics with
two sub-topics
• Main topic: hashtag
occurs min. 200 times
main topic subtopics
1 #health #nutrition, #news
2 #apple #iphone, #mac
3 #photography #nature, #art
4 #green #solar, #eco
5 #celebrity #news, #gossip
6 #fashion #news, #shoes
7 #fitness #health, #exercise
8 #humor #quotes, #funny
9 #quote #love, #life
10 #travel #lp, #tips
• 5 classes per topic:
, ,
,
,
• Retrieved 3 tweets per class, i.e., 15 tweets per topic
• Task: classify tweets into groups
Slide 6Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Method 1: Hashtag Classifier
• Assign classes to tweets by author’s hashtags
Class ‘#SpendingReview’
Class ‘#TurkeyDayTravel #travel’
Class ‘#TurkeyDayTravel’
Class ‘#travel’
• Multiple hashtags  consider as single class
Slide 7Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Method 2: Machine Classifier
• Latent Dirichlet Allocation (LDA) to represent tweets
as probabilities over latent topics [Blei et al. 03]
• Construct of the model from TREC Tweets2011
– Train topic model over Tweets being aggregated by
their Twitter users [Hong et al. 10]
– Infer probability distribution over topics for each of
the 15 tweets
• Cluster tweets using k-means
– # of clusters optimized by Hartigan’s index
and Average Silhouette [Kaufman et al. 05]
– Using cosine similarity as a distance measure
Slide 8Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Method 3: Human Classifier
• Online experiment: asked 163 human annotators
to manually classify the 15 tweets per each topic
main topic subtopics # annotators
1 #health #nutrition, #news 20
2 #apple #iphone, #mac 18
3 #photography #nature, #art 15
4 #green #solar, #eco 14
5 #celebrity #news, #gossip 15
6 #fashion #news, #shoes 15
7 #fitness #health, #exercise 18
8 #humor #quotes, #funny 15
9 #quote #love, #life 16
10 #travel #lp, #tips 17
∑ 163
Slide 9Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Method 3: Human-classifier
• Annotators can create an arbitrary number of classes
and label them
• Have access to Tweet’s textual content as well as
screenshots of the links, but: hashtag ‘#’ removed
Class label
Slide 10Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Degree of Classifier Agreement
• Methods 1-3 produce groups of Tweets
• Compare groups with Cohen’s kappa [Fu et al. 2012]
• Convert classifications into match tables
– Elements in same group: 1
– Otherwise: 0
• Example: tweets , , , ,
are classified by and in
and
• Compare match table using
• Example: and
Cohen’s
	
	
a b c d
b 1
c 0 0
d 0 0 1
e 0 0 0 0
a b c d
b 1
c 1 1
d 0 0 0
e 0 0 0 1
Classifier Classifier
=>
Slide 11Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Agreements Between Classifiers
• Hashtag/Machine (HaM)
– Almost no agreement
– Except topic 3 “photography”:
11 of 15 tweets use the
hashtags also as a word in texts
• Hashtag/Human (HaHu)
– Slight agreements
• Machine/Human (MHu)
– Almost no agreement
– Except topic 10 “travel”:
agreement on the disagreement
at tweets having the hashtag “#tips”
ID HaM HaHu MHu
1 -0.05 0.12 0.00
2 0.02 0.05 0.05
3 0.24 0.06 0.11
4 0.01 0.11 0.00
5 0.00 0.07 -0.04
6 0.00 0.15 0.04
7 0.04 0.09 0.05
8 -0.04 0.17 0.03
9 -0.02 0.13 0.00
10 0.01 0.10 0.45
M 0.02 0.10 0.07
SD 0.08 0.10 0.12
Slide 12Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Inter-Human-Annotator Agreement
• Fleiss’ kappa : measure agreement
among more than two raters
• Consistently observe larger agreements
among human classifiers than for
HaHu and MHu
• Difference is significant (with )
1 0.17
2 0.10
3 0.13
4 0.16
5 0.53
6 0.20
7 0.14
8 0.31
9 0.33
10 0.38
M 0.25
SD 0.14
Researchers should use ground truth made by human 
annotators rather than hashtags for tweet classification
Slide 13Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Automatic Subject Indexing
[GNS15] G. Große-Bölting, C. Nishioka, A. Scherp: A Comparison of Different
Strategies for Automated Semantic Document Annotation. K-CAP 2015
STW (Standard
Thesaurus Wirtschaft)
Cancer (18899-3)
Research (10436-6)
USA (17829-1)
…
Nomination for Best Paper Award at K-CAP 2015
Award „Prof. Dr. Werner Petersen-Preis der Technik 2015”
Published as
Linked Open Data!
Slide 14Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Automated Subject Indexing
• Scientific search engine GERHARD (‘97-‘99)
• Ontology with ~10,000 classes in three languages
Slide 15Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Experiment Framework
Each strategy is a composition of methods from 1. + 2. + 3.
1. Concept Extraction
detect concepts (candidate annotations) from each document
2. Concept Activation
compute a score for each concept of a document
3. Annotation Selection
select annotations from concepts for each document
4. Evaluation
measure performance of strategies with ground truth
Slide 16Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Configurations
Entity Tri-gram LDARAKE
Statistical
Methods
(2 methods)
Hierarchy-based
Methods
(3 methods)
Graph-based
Methods
(3 methods)
Top-k
(2 methods)
kNN
(1 method)
Concept
Extraction
Annotation
Selection
Concept
Activation
Slide 17Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Configurations: Entity-based
24 strategies
Entity Tri-gram LDARAKE
Statistical
Methods
(2 methods)
Hierarchy-based
Methods
(3 methods)
Graph-based
Methods
(3 methods)
Top-k
(2 methods)
kNN
(1 method)
Concept
Extraction
Annotation
Selection
Concept
Activation
… using a domain-specific taxonomy like STW
Slide 18Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Concept Activation Methods
• Concept frequency:
• CF-IDF as extension of popular TF-IDF model
replacing terms with concepts [Goossen et al. 11]
– IDF lowers weight for concepts appearing in many
documents
• Do actually not “activate” anything …
Slide 19Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Hierarchy-based Methods
• Reveal concepts that are
not explicitly mentioned
by using a hierarchical
knowledge base (KB)
• KBs are of high quality and freely available !
Social
Recommendation
Social
Tagging
Web Searching Web Mining
Site
Wrapping
Web Log
Analysis
World Wide Web
• Base Activation with set of child concepts of
concept and decay parameter
∈
• Example with :
, ,
Slide 20Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Hierarchy-based Methods
• One-hop activation
– Developed with domain experts at ZBW
: set of concepts detected in a document
– Maximum activation distance: one hop
,
	 , ∙ ,
∈
	if	| ∩ | 2
, 															otherwise
Works very well … why?
Slide 21Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Graph-based Methods
• Represent concepts as
co-occurrence graph
Tax
Bank
Interest Rate
Financial Crisis
Central Bank
• HITS for link analysis of web sites [Kleinberg 99]
with
∈
∈
• Degree as number of edges linked with a concept
[Zouaq et al. 12]:
– Example:
Slide 22Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
15 strategies
Entity Tri-gram LDARAKE
Statistical
Method
(2 methods)
Hierarchy-based
Methods
(3 methods)
Graph-based
Methods
(3 methods)
Top-k
(2 methods)
kNN
(1 method)
Concept
Extraction
Annotation
Selection
Concept
Activation
Configurations: n-grams
Slide 23Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
3 strategies
Entity Tri-gram LDARAKE
Statistical
Method
(Frequency)
Hierarchy-based
Methods
(3 methods)
Graph-based
Methods
(3 methods)
Top-k
(2 methods)
kNN
(1 method)
Concept
Extraction
Annotation
Selection
Concept
Activation
Configurations: RAKE
[Rose et al. 10]
Slide 24Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Entity Tri-gram LDARAKE
Statistical
Methods
(Frequency)
Hierarchy-based
Methods
(3 methods)
Graph-based
Methods
(3 methods)
Top-k
(2 methods)
kNN
(1 method)
Concept
Extraction
Annotation
Selection
Concept
Activation
Configuration: LDA
43 strategies in total*
[Blei et al. 03]
Slide 25Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Datasets: 3 Scientific Domains
Economics Politics Computer
Source ZBW FIV SemEval 2010
# documents 62,924 28,324 244
# annotations 5.26 (± 1.84) 12 (± 4.02) 5.05 (± 2.41)
Knowledge
base
STW European
Thesaurus
ACM CCS
# enities 6,335 7,912 2,299
# labels 11,679 8,421 9,086
• Computer science dataset: SemEval 2010 [Kim et al. 10]
• Pre‐processing of author keywords needed [Wang et al. 14]
• Total of ~100,000 scientific documents: largest so far !
Slide 26Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Concept
Extraction
Annotation
Selection
Concept
Activation
Best Performing Configurations
Best strategy: Entity × HITS × kNN
: (economy), (politics), (computer)
Entity Tri-gram LDARAKE
Graph-based
Methods
(3 methods)
kNN
(1 method)
Statistical
Methods
(2 methods)
Hierarchy-based
Methods
(3 methods)
Top-k
(2 methods)
Close ones: OneHop
as well as any other
graph-based method
Slide 27Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Number of Users
15
10
20‐
Total Users
Win (16) Lin (3)
Preferred Operatinq System
N 20
[‐1 Macintosh
Linux
Mac(1)
Windows
5
Textextraction from Scolarly Figures
Binarization
Clustering
Extraction
OCR
Text
[BS15] F. Böschen, A. Scherp: Multi-oriented Text Extraction from Information
Graphics. DocEng 2015: 35-38
Fully-automated TX pipeline
No assumptions, no training
Novel combination of DM & CV
Slide 28Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Challenges for Research
• Different font sizes
• … font colors
• … background colors
• … emphases
• Different angles
• Overlapping elements
Slide 29Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
121 Scolarly Figures in Economics
(from ZBW Open Access Corpus)
Current results: improvement of
text recognition to BL: up to 30%
Slide 30Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Evaluation Setup
item 1 Item 1
{e, i, m, t, 1}
{em, it, te}
{ite, tem}
{e, m, t, I, 1}
{em, te, It}
{tem, Ite}
Unigrams
Bigrams
Trigrams
• How to match output (left) with gold standard (right)?
Slide 31Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Limits of Current Evaluation
• Baseline #1: OCR engine Tesseract (Google)
with layout analysis
• 1 pass per figure
• Baseline #2: OCR engine Tesseract (Google)
with layout analysis
• Multiple, angle-rotated passes
+ + + +
Comparison with related work: very difficult!
Slide 32Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Evaluation: Orientation Distributions
Note: horizontal equals to ±15° (Tesseract’s rotation tolerances)
Slide 33Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Mockup: Use of TX in ZBW’s EconBiz
Slide 34Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Summary: KDD in Social Media & DL
How to deal with the vast amount of content related to
research and innovation?
• H2020 INSO-4 project, duration: 04/2016-03/2019
• Platform with data mining and visualization tools for
enabling information professionals to deal with large
corpora of scientific content, data, social media
New

Weitere ähnliche Inhalte

Was ist angesagt?

Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Dmitry Kan
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesJo-fai Chow
 
DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay Conference by Xebia
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 

Was ist angesagt? (7)

Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 

Andere mochten auch

The Scientific Method
The Scientific MethodThe Scientific Method
The Scientific Methodtscheuch
 
Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014ngibellini
 
Moosa khokhar emperor penguin
Moosa khokhar emperor penguinMoosa khokhar emperor penguin
Moosa khokhar emperor penguinMrs Seo
 
Muscular System_SN.ppt
Muscular System_SN.pptMuscular System_SN.ppt
Muscular System_SN.pptShama
 
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...Mikael Andersson
 
The importance of scientific literacy
The importance of scientific literacyThe importance of scientific literacy
The importance of scientific literacyTest Generator
 
Scientific method procedures (Teach)
Scientific method procedures (Teach)Scientific method procedures (Teach)
Scientific method procedures (Teach)Moira Whitehouse
 
B slide scientific explantion
B slide scientific explantionB slide scientific explantion
B slide scientific explantionAbraham Peled
 
Nutrition: Food, Nutrition and Health
Nutrition: Food, Nutrition and HealthNutrition: Food, Nutrition and Health
Nutrition: Food, Nutrition and HealthBates2ndQuarterLPN
 

Andere mochten auch (11)

The Scientific Method
The Scientific MethodThe Scientific Method
The Scientific Method
 
Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014
 
Anterior Muscles
Anterior MusclesAnterior Muscles
Anterior Muscles
 
Moosa khokhar emperor penguin
Moosa khokhar emperor penguinMoosa khokhar emperor penguin
Moosa khokhar emperor penguin
 
Muscular System_SN.ppt
Muscular System_SN.pptMuscular System_SN.ppt
Muscular System_SN.ppt
 
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...
Hidden survivalmuscle - Find the muscle that flatten youre belly and strength...
 
The importance of scientific literacy
The importance of scientific literacyThe importance of scientific literacy
The importance of scientific literacy
 
Feed for health: a best practice COST action
Feed for health: a best practice COST actionFeed for health: a best practice COST action
Feed for health: a best practice COST action
 
Scientific method procedures (Teach)
Scientific method procedures (Teach)Scientific method procedures (Teach)
Scientific method procedures (Teach)
 
B slide scientific explantion
B slide scientific explantionB slide scientific explantion
B slide scientific explantion
 
Nutrition: Food, Nutrition and Health
Nutrition: Food, Nutrition and HealthNutrition: Food, Nutrition and Health
Nutrition: Food, Nutrition and Health
 

Ähnlich wie Knowledge Discovery in Social Media and Scientific Digital Libraries

Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMOVING Project
 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...MOVING Project
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpADTELLIGENCE GmbH
 
2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emkeDr Martina Emke
 
Project MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIProject MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIbutest
 
Project MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIProject MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIbutest
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Charalampos Chelmis
 
Chocolate Flavoured Data Science
Chocolate Flavoured Data ScienceChocolate Flavoured Data Science
Chocolate Flavoured Data ScienceThilo Stadelmann
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliskizukun
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliskizukun
 
Mohan C R CV
Mohan C R CVMohan C R CV
Mohan C R CVMOHAN C R
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearcheXascale Infolab
 
BSc Computing CSY2026 Modern Networks Date of Issue .docx
BSc Computing  CSY2026 Modern Networks Date of Issue .docxBSc Computing  CSY2026 Modern Networks Date of Issue .docx
BSc Computing CSY2026 Modern Networks Date of Issue .docxAASTHA76
 
Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
Apache Cassandra Lunch #50: Machine Learning with Spark + CassandraApache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
Apache Cassandra Lunch #50: Machine Learning with Spark + CassandraAnant Corporation
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldaelang
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 

Ähnlich wie Knowledge Discovery in Social Media and Scientific Digital Libraries (20)

Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
 
Miso
MisoMiso
Miso
 
2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke
 
Project MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIProject MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AI
 
Project MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AIProject MLExAI: Machine Learning Experiences in AI
Project MLExAI: Machine Learning Experiences in AI
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
Chocolate Flavoured Data Science
Chocolate Flavoured Data ScienceChocolate Flavoured Data Science
Chocolate Flavoured Data Science
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliski
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliski
 
Mohan C R CV
Mohan C R CVMohan C R CV
Mohan C R CV
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web Search
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 
BSc Computing CSY2026 Modern Networks Date of Issue .docx
BSc Computing  CSY2026 Modern Networks Date of Issue .docxBSc Computing  CSY2026 Modern Networks Date of Issue .docx
BSc Computing CSY2026 Modern Networks Date of Issue .docx
 
Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
Apache Cassandra Lunch #50: Machine Learning with Spark + CassandraApache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 

Mehr von Ansgar Scherp

Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Ansgar Scherp
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...Ansgar Scherp
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Ansgar Scherp
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresAnsgar Scherp
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebAnsgar Scherp
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestAnsgar Scherp
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
 
Can you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationCan you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationAnsgar Scherp
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...Ansgar Scherp
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...Ansgar Scherp
 
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Ansgar Scherp
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Ansgar Scherp
 

Mehr von Ansgar Scherp (16)

Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the Web
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interest
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Can you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationCan you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze Information
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triples
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
 
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Knowledge Discovery in Social Media and Scientific Digital Libraries

  • 1. Slide 1Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Knowledge Discovery in Social Media and Scientific Digital Libraries Ansgar Scherp Darmstadt, Feb 9, 2016 Thanks to: Chifumi Nishioka, Falk Böschen
  • 2. Slide 2Prof. Ansgar Scherp – asc@informatik.uni-kiel.de KDD Social Media & Digital Libraries How to deal with the vast amount of content related to research and innovation? “Ability to deal with digital information will be an important cultural technique as reading and writing.”
  • 3. Slide 3Prof. Ansgar Scherp – asc@informatik.uni-kiel.de KDD Social Media & Digital Libraries • Examples of current research 1. Classifying tweets 2. Automated subject indexing 3. Extracting text from scholarly figures • Today not in covered –Schema-extraction from Linked Open Data –Analysis of evolution of Linked Open Data
  • 4. Slide 4Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Classifying Tweets: Example How far are there fundamental differences between different approaches for tweet classification? Author’s hashtag: (here: none) Human: #research #talk #darmstadt Machine: #talk #socialmedia (e.g., [Nishida et al. 12])(e.g., [Ren et al. 14] [Yang et al. 14]) [NSD15] C. Nishioka, A. Scherp, and K. Dellschaft: Comparing Tweet Classifications by Authors' Hashtags, Machine Learning, and Human Annotators, WI, Singapore, 2015.
  • 5. Slide 5Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Twitter Dataset: TREC Tweets2011 • Contains about 16 million tweets • Randomly created 10 main topics with two sub-topics • Main topic: hashtag occurs min. 200 times main topic subtopics 1 #health #nutrition, #news 2 #apple #iphone, #mac 3 #photography #nature, #art 4 #green #solar, #eco 5 #celebrity #news, #gossip 6 #fashion #news, #shoes 7 #fitness #health, #exercise 8 #humor #quotes, #funny 9 #quote #love, #life 10 #travel #lp, #tips • 5 classes per topic: , , , , • Retrieved 3 tweets per class, i.e., 15 tweets per topic • Task: classify tweets into groups
  • 6. Slide 6Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Method 1: Hashtag Classifier • Assign classes to tweets by author’s hashtags Class ‘#SpendingReview’ Class ‘#TurkeyDayTravel #travel’ Class ‘#TurkeyDayTravel’ Class ‘#travel’ • Multiple hashtags  consider as single class
  • 7. Slide 7Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Method 2: Machine Classifier • Latent Dirichlet Allocation (LDA) to represent tweets as probabilities over latent topics [Blei et al. 03] • Construct of the model from TREC Tweets2011 – Train topic model over Tweets being aggregated by their Twitter users [Hong et al. 10] – Infer probability distribution over topics for each of the 15 tweets • Cluster tweets using k-means – # of clusters optimized by Hartigan’s index and Average Silhouette [Kaufman et al. 05] – Using cosine similarity as a distance measure
  • 8. Slide 8Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Method 3: Human Classifier • Online experiment: asked 163 human annotators to manually classify the 15 tweets per each topic main topic subtopics # annotators 1 #health #nutrition, #news 20 2 #apple #iphone, #mac 18 3 #photography #nature, #art 15 4 #green #solar, #eco 14 5 #celebrity #news, #gossip 15 6 #fashion #news, #shoes 15 7 #fitness #health, #exercise 18 8 #humor #quotes, #funny 15 9 #quote #love, #life 16 10 #travel #lp, #tips 17 ∑ 163
  • 9. Slide 9Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Method 3: Human-classifier • Annotators can create an arbitrary number of classes and label them • Have access to Tweet’s textual content as well as screenshots of the links, but: hashtag ‘#’ removed Class label
  • 10. Slide 10Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Degree of Classifier Agreement • Methods 1-3 produce groups of Tweets • Compare groups with Cohen’s kappa [Fu et al. 2012] • Convert classifications into match tables – Elements in same group: 1 – Otherwise: 0 • Example: tweets , , , , are classified by and in and • Compare match table using • Example: and Cohen’s a b c d b 1 c 0 0 d 0 0 1 e 0 0 0 0 a b c d b 1 c 1 1 d 0 0 0 e 0 0 0 1 Classifier Classifier =>
  • 11. Slide 11Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Agreements Between Classifiers • Hashtag/Machine (HaM) – Almost no agreement – Except topic 3 “photography”: 11 of 15 tweets use the hashtags also as a word in texts • Hashtag/Human (HaHu) – Slight agreements • Machine/Human (MHu) – Almost no agreement – Except topic 10 “travel”: agreement on the disagreement at tweets having the hashtag “#tips” ID HaM HaHu MHu 1 -0.05 0.12 0.00 2 0.02 0.05 0.05 3 0.24 0.06 0.11 4 0.01 0.11 0.00 5 0.00 0.07 -0.04 6 0.00 0.15 0.04 7 0.04 0.09 0.05 8 -0.04 0.17 0.03 9 -0.02 0.13 0.00 10 0.01 0.10 0.45 M 0.02 0.10 0.07 SD 0.08 0.10 0.12
  • 12. Slide 12Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Inter-Human-Annotator Agreement • Fleiss’ kappa : measure agreement among more than two raters • Consistently observe larger agreements among human classifiers than for HaHu and MHu • Difference is significant (with ) 1 0.17 2 0.10 3 0.13 4 0.16 5 0.53 6 0.20 7 0.14 8 0.31 9 0.33 10 0.38 M 0.25 SD 0.14 Researchers should use ground truth made by human  annotators rather than hashtags for tweet classification
  • 13. Slide 13Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Automatic Subject Indexing [GNS15] G. Große-Bölting, C. Nishioka, A. Scherp: A Comparison of Different Strategies for Automated Semantic Document Annotation. K-CAP 2015 STW (Standard Thesaurus Wirtschaft) Cancer (18899-3) Research (10436-6) USA (17829-1) … Nomination for Best Paper Award at K-CAP 2015 Award „Prof. Dr. Werner Petersen-Preis der Technik 2015” Published as Linked Open Data!
  • 14. Slide 14Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Automated Subject Indexing • Scientific search engine GERHARD (‘97-‘99) • Ontology with ~10,000 classes in three languages
  • 15. Slide 15Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Experiment Framework Each strategy is a composition of methods from 1. + 2. + 3. 1. Concept Extraction detect concepts (candidate annotations) from each document 2. Concept Activation compute a score for each concept of a document 3. Annotation Selection select annotations from concepts for each document 4. Evaluation measure performance of strategies with ground truth
  • 16. Slide 16Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Configurations Entity Tri-gram LDARAKE Statistical Methods (2 methods) Hierarchy-based Methods (3 methods) Graph-based Methods (3 methods) Top-k (2 methods) kNN (1 method) Concept Extraction Annotation Selection Concept Activation
  • 17. Slide 17Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Configurations: Entity-based 24 strategies Entity Tri-gram LDARAKE Statistical Methods (2 methods) Hierarchy-based Methods (3 methods) Graph-based Methods (3 methods) Top-k (2 methods) kNN (1 method) Concept Extraction Annotation Selection Concept Activation … using a domain-specific taxonomy like STW
  • 18. Slide 18Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Concept Activation Methods • Concept frequency: • CF-IDF as extension of popular TF-IDF model replacing terms with concepts [Goossen et al. 11] – IDF lowers weight for concepts appearing in many documents • Do actually not “activate” anything …
  • 19. Slide 19Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Hierarchy-based Methods • Reveal concepts that are not explicitly mentioned by using a hierarchical knowledge base (KB) • KBs are of high quality and freely available ! Social Recommendation Social Tagging Web Searching Web Mining Site Wrapping Web Log Analysis World Wide Web • Base Activation with set of child concepts of concept and decay parameter ∈ • Example with : , ,
  • 20. Slide 20Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Hierarchy-based Methods • One-hop activation – Developed with domain experts at ZBW : set of concepts detected in a document – Maximum activation distance: one hop , , ∙ , ∈ if | ∩ | 2 , otherwise Works very well … why?
  • 21. Slide 21Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Graph-based Methods • Represent concepts as co-occurrence graph Tax Bank Interest Rate Financial Crisis Central Bank • HITS for link analysis of web sites [Kleinberg 99] with ∈ ∈ • Degree as number of edges linked with a concept [Zouaq et al. 12]: – Example:
  • 22. Slide 22Prof. Ansgar Scherp – asc@informatik.uni-kiel.de 15 strategies Entity Tri-gram LDARAKE Statistical Method (2 methods) Hierarchy-based Methods (3 methods) Graph-based Methods (3 methods) Top-k (2 methods) kNN (1 method) Concept Extraction Annotation Selection Concept Activation Configurations: n-grams
  • 23. Slide 23Prof. Ansgar Scherp – asc@informatik.uni-kiel.de 3 strategies Entity Tri-gram LDARAKE Statistical Method (Frequency) Hierarchy-based Methods (3 methods) Graph-based Methods (3 methods) Top-k (2 methods) kNN (1 method) Concept Extraction Annotation Selection Concept Activation Configurations: RAKE [Rose et al. 10]
  • 24. Slide 24Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Entity Tri-gram LDARAKE Statistical Methods (Frequency) Hierarchy-based Methods (3 methods) Graph-based Methods (3 methods) Top-k (2 methods) kNN (1 method) Concept Extraction Annotation Selection Concept Activation Configuration: LDA 43 strategies in total* [Blei et al. 03]
  • 25. Slide 25Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Datasets: 3 Scientific Domains Economics Politics Computer Source ZBW FIV SemEval 2010 # documents 62,924 28,324 244 # annotations 5.26 (± 1.84) 12 (± 4.02) 5.05 (± 2.41) Knowledge base STW European Thesaurus ACM CCS # enities 6,335 7,912 2,299 # labels 11,679 8,421 9,086 • Computer science dataset: SemEval 2010 [Kim et al. 10] • Pre‐processing of author keywords needed [Wang et al. 14] • Total of ~100,000 scientific documents: largest so far !
  • 26. Slide 26Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Concept Extraction Annotation Selection Concept Activation Best Performing Configurations Best strategy: Entity × HITS × kNN : (economy), (politics), (computer) Entity Tri-gram LDARAKE Graph-based Methods (3 methods) kNN (1 method) Statistical Methods (2 methods) Hierarchy-based Methods (3 methods) Top-k (2 methods) Close ones: OneHop as well as any other graph-based method
  • 27. Slide 27Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Number of Users 15 10 20‐ Total Users Win (16) Lin (3) Preferred Operatinq System N 20 [‐1 Macintosh Linux Mac(1) Windows 5 Textextraction from Scolarly Figures Binarization Clustering Extraction OCR Text [BS15] F. Böschen, A. Scherp: Multi-oriented Text Extraction from Information Graphics. DocEng 2015: 35-38 Fully-automated TX pipeline No assumptions, no training Novel combination of DM & CV
  • 28. Slide 28Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Challenges for Research • Different font sizes • … font colors • … background colors • … emphases • Different angles • Overlapping elements
  • 29. Slide 29Prof. Ansgar Scherp – asc@informatik.uni-kiel.de 121 Scolarly Figures in Economics (from ZBW Open Access Corpus) Current results: improvement of text recognition to BL: up to 30%
  • 30. Slide 30Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Evaluation Setup item 1 Item 1 {e, i, m, t, 1} {em, it, te} {ite, tem} {e, m, t, I, 1} {em, te, It} {tem, Ite} Unigrams Bigrams Trigrams • How to match output (left) with gold standard (right)?
  • 31. Slide 31Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Limits of Current Evaluation • Baseline #1: OCR engine Tesseract (Google) with layout analysis • 1 pass per figure • Baseline #2: OCR engine Tesseract (Google) with layout analysis • Multiple, angle-rotated passes + + + + Comparison with related work: very difficult!
  • 32. Slide 32Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Evaluation: Orientation Distributions Note: horizontal equals to ±15° (Tesseract’s rotation tolerances)
  • 33. Slide 33Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Mockup: Use of TX in ZBW’s EconBiz
  • 34. Slide 34Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Summary: KDD in Social Media & DL How to deal with the vast amount of content related to research and innovation? • H2020 INSO-4 project, duration: 04/2016-03/2019 • Platform with data mining and visualization tools for enabling information professionals to deal with large corpora of scientific content, data, social media New