Submit Search
Upload
Text-mining practical
•
Download as PPT, PDF
•
3 likes
•
929 views
Lars Juhl Jensen
Follow
Text-mining practical
Read less
Read more
Science
Report
Share
Report
Share
1 of 76
Download now
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Biomedical data
Biomedical data
beiko
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
Large-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Biomedical data
Biomedical data
beiko
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
Large-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Cellular Network Biology
Cellular Network Biology
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
Boyles law module in the grade 10 science
Boyles law module in the grade 10 science
floriejanemacaya1
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Sumit Kumar yadav
More Related Content
More from Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Cellular Network Biology
Cellular Network Biology
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
More from Lars Juhl Jensen
(20)
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Cellular Network Biology
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Recently uploaded
Boyles law module in the grade 10 science
Boyles law module in the grade 10 science
floriejanemacaya1
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Sumit Kumar yadav
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
Sumit Kumar yadav
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Sérgio Sacani
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
jana861314
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
pradhanghanshyam7136
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
aasikanpl
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
PRINCE C P
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
AleenaTreesaSaji
The Philosophy of Science
The Philosophy of Science
University of Hertfordshire
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Diwakar Mishra
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
UmerFayaz5
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
Sumit Kumar yadav
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Nistarini College, Purulia (W.B) India
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Satoshi NAKAHIRA
Recently uploaded
(20)
Boyles law module in the grade 10 science
Boyles law module in the grade 10 science
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
The Philosophy of Science
The Philosophy of Science
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Text-mining practical
1.
Text-mining practical Lars Juhl
Jensen
2.
unix primer
3.
the command line
4.
some useful commands
5.
cat
6.
less
7.
head -10
8.
tail -10
9.
grep ‘needle’
10.
cut -f 2
11.
sort
12.
sort -nr
13.
uniq -c
14.
redirecting output
15.
write to file
16.
command > filename
17.
using pipes
18.
command1 | command2
19.
putting it all
together
20.
cut -f 4
infile | sort | uniq -c | sort -nr | head -100 > outfile
21.
the task
22.
disease gene finding
23.
named entity recognition
24.
human genes
25.
gene prioritization
26.
what I have
done
27.
information retrieval
28.
two diseases
29.
prostate cancer
30.
schizophrenia
31.
two sets of
documents
32.
62,755 abstracts
33.
65,588 abstracts
34.
one directory with
each set
35.
one file with
each abstract
36.
dictionary
37.
tab-delimited file
38.
human genes
39.
22,523 entities
40.
synonyms
41.
from many databases
42.
orthographic variation
43.
prefixes and suffixes
44.
automatically generated
45.
2,726,495 names
46.
tagdir program
47.
flexible matching
48.
upper- and lower-case
49.
spaces and hyphens
50.
tab-delimited output
51.
what you will
do
52.
named entity recognition
53.
find unfortunate names
54.
create “black list”
55.
information extraction
56.
co-mentioning
57.
within abstracts
58.
ank genes for
each disease
59.
find shared gene
60.
61.
a helping hand
62.
“black list”
63.
100+ matches
64.
10+ matches
65.
66.
wrap up
67.
Protein kinase B
68.
PKB
69.
Akt
70.
AKT1
71.
same protein
72.
synonyms matter
73.
“black list” is
crucial
74.
text mining is
useful
75.
not black magic
76.
Thanks for your
attention! 76
Download now