SlideShare ist ein Scribd-Unternehmen logo
1 von 28
unknown genes, Community Profiling,& Biotorrents.net Morgan Langille  UC Davis
Genes with unknown function
Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?
Outline of project
Community Profiling
Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
Community Profiling KEGG COG Delong, et al., Science, 2006
Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
So what have I done?	 "all metagenomics peptides" from CAMERA  43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples
Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
Example of phage Pfams clustering together
Measuring functional relatedness  Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above.  PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term  695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis  Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term
Measuring GO similarity G-SESAME  Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster  had to check if terms were in same namespace
Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484  10 clusters had a score of 0.60 or greater.  The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations  Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater
Community Profiling Results ,[object Object]
 10 clusters are > 0.60,[object Object]
  1 or 0 clusters are > 0.60,[object Object]
Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small     pieces between several     users
Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data
What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents
www.biotorrents.net
Browse & Search
Details
Sign Up
Upload
Other Features Forum RSS Feed Top 10 FAQ Links
Who will upload data? Everyone!  Realistically, Large organizations (e.g. NCBI, CAMERA, etc.)  May need some convincing to host their data via torrents in addition to FTP, HTTP, etc.  Scientists that really support open science  Sharing data before formally complete and published
Technical Challenges  Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy

Weitere ähnliche Inhalte

Was ist angesagt?

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network AnalysisMas Kot
 
Full text
Full textFull text
Full textbutest
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WGGenomeInABottle
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 

Was ist angesagt? (20)

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
 
Full text
Full textFull text
Full text
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 

Andere mochten auch

司馬光 對話方塊
司馬光 對話方塊司馬光 對話方塊
司馬光 對話方塊honan4108
 
International Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentInternational Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentKatherine Haxton
 
Infolit day 24_may2016
Infolit day 24_may2016Infolit day 24_may2016
Infolit day 24_may2016HELIGLIASA
 
Comunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasComunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasCasa de la Mujer
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna
 
和菓子ここだけの話
和菓子ここだけの話和菓子ここだけの話
和菓子ここだけの話stucon
 
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013seoinhouse
 
User experience for drupal
User experience for drupalUser experience for drupal
User experience for drupalAnne Stefanyk
 
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...Paul R. DiModica
 

Andere mochten auch (19)

司馬光 對話方塊
司馬光 對話方塊司馬光 對話方塊
司馬光 對話方塊
 
Final presentation 2012
Final presentation 2012Final presentation 2012
Final presentation 2012
 
International Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentInternational Group Work For Sustainable Development
International Group Work For Sustainable Development
 
Infolit day 24_may2016
Infolit day 24_may2016Infolit day 24_may2016
Infolit day 24_may2016
 
Comunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasComunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidas
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 
Evolver Architects
Evolver ArchitectsEvolver Architects
Evolver Architects
 
Squaw Lake
Squaw LakeSquaw Lake
Squaw Lake
 
Shimla Kullu Manali Dalhousie
Shimla Kullu Manali DalhousieShimla Kullu Manali Dalhousie
Shimla Kullu Manali Dalhousie
 
和菓子ここだけの話
和菓子ここだけの話和菓子ここだけの話
和菓子ここだけの話
 
Spring3.1 aop-mvc
Spring3.1 aop-mvcSpring3.1 aop-mvc
Spring3.1 aop-mvc
 
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
 
¿Hablamos de futuro?
¿Hablamos de futuro?¿Hablamos de futuro?
¿Hablamos de futuro?
 
User experience for drupal
User experience for drupalUser experience for drupal
User experience for drupal
 
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
 
Chuong 1 new
Chuong 1 newChuong 1 new
Chuong 1 new
 
Fall Simmer Pot Recipes
Fall Simmer Pot RecipesFall Simmer Pot Recipes
Fall Simmer Pot Recipes
 
NFS: para la gestion de espacios de trabajo
NFS: para la gestion de espacios de trabajoNFS: para la gestion de espacios de trabajo
NFS: para la gestion de espacios de trabajo
 

Ähnlich wie Unknown Genes, Community Profiling, & Biotorrents.net

Genome science intermine
Genome science intermineGenome science intermine
Genome science intermineELIXIR UK
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Syed Lokman
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsCIARD Movement
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
Interpret gene expression results 2013
Interpret gene expression results 2013Interpret gene expression results 2013
Interpret gene expression results 2013Elsa von Licy
 
Build your own gene panels 2013
Build your own gene panels 2013Build your own gene panels 2013
Build your own gene panels 2013Elsa von Licy
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011cmzmasek
 

Ähnlich wie Unknown Genes, Community Profiling, & Biotorrents.net (20)

Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in Boston
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 
Interpret gene expression results 2013
Interpret gene expression results 2013Interpret gene expression results 2013
Interpret gene expression results 2013
 
Build your own gene panels 2013
Build your own gene panels 2013Build your own gene panels 2013
Build your own gene panels 2013
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011
 

Mehr von Morgan Langille

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionMorgan Langille
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionMorgan Langille
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community ProfilingMorgan Langille
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Morgan Langille
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMorgan Langille
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsMorgan Langille
 

Mehr von Morgan Langille (8)

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic composition
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Unknown Genes, Community Profiling, & Biotorrents.net

  • 1. unknown genes, Community Profiling,& Biotorrents.net Morgan Langille UC Davis
  • 3. Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?
  • 6. Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
  • 7. Community Profiling KEGG COG Delong, et al., Science, 2006
  • 8. Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
  • 9. So what have I done? "all metagenomics peptides" from CAMERA 43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples
  • 10. Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
  • 11. Example of phage Pfams clustering together
  • 12. Measuring functional relatedness Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above. PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term 695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term
  • 13. Measuring GO similarity G-SESAME Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster had to check if terms were in same namespace
  • 14. Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484 10 clusters had a score of 0.60 or greater. The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater
  • 15.
  • 16.
  • 17.
  • 18. Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small pieces between several users
  • 19. Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data
  • 20. What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents
  • 26. Other Features Forum RSS Feed Top 10 FAQ Links
  • 27. Who will upload data? Everyone! Realistically, Large organizations (e.g. NCBI, CAMERA, etc.) May need some convincing to host their data via torrents in addition to FTP, HTTP, etc. Scientists that really support open science Sharing data before formally complete and published
  • 28. Technical Challenges Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy