SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
GALAXY
       BIOINFORMATICS WORKFLOW
                    ENVIRONMENT

Rutger Vos, 3 April 2012
Overview
Informatics in the post-genomic era
The past (?)

      Analyses glued together using
       scripting languages, directly on
       the CLI or in GUI
      Sanger sequencing
      Smaller data volumes
      Fewer remote data resources
      Hypothesis-driven
The present

    Graphical or text-
     based workflow tools
    “Next generation”
     sequencing
    Large data sets
    Many remote data
     resources
    “Data-driven”
NGS – Roche 454 pyrosequencing

Pyrosequencing                     Genome Sequencer FLX

    “Emulsion PCR”
    Bead with primer in each
     droplet
    Each bead is placed in a
     well with luciferase
    Plate is analyzed by fiber-
     optic chip
NGS – Illumina/Solexa

Reversible dye-terminator seq    HiSeq2000 (BGI)

    DNA attaches to primer on
     slide and is amplified
    4 RT-bases are added
    Camera detects labeled
     nucleotides
    Next 1-base cycle
NGS – IonTorrent

Ion semiconductor sequencing     Ion Torrent PGM

    “sequencing by synthesis”
    Not light-based, sensor
     detects H+ ions during
     synthesis
    Longer reads
NGS – SOLiD Sequencing

Oligonucleotide ligation and
                                    SOLiD 5500 Genetic Analyzer
detection
    Beads with DNA fragments
    Universal adapter attached
     to fragments
    PCR product attaches to
     slide
    Fluorescent probes ligate to
     the primer
The future


     “Next-next generation”
      sequencing (MinION?)
     Smaller data sets?
     Semantic web?
     Back to hypotheses?
Workflows
Tools for automating bioinformatics analyses
Examples of workflow tools


 Taverna        taverna.org.uk

 Galaxy         usegalaxy.org

 eHive          ensembl.org/info/docs/eHive

 Mobyle         mobyle.pasteur.fr

 Cipres         phylo.org

 Yahoo! Pipes   pipes.yahoo.com

 “make”         gnu.org/software/make
Galaxy
Provenance, histories

    Where do the
     data come from?
    How were they
     altered?
    Galaxy tracks the
     history of data.
    Histories can be
     converted to
     workflows
Creating a workflow
Workflow editor
Reproducibility

  Good science requires that results be reproducible
  Some analyses are run many times
Sharing

    “Standing on the
     shoulders of giants”
    Not re-inventing the
     wheel
    “Executable
     papers” (doi:
     10.1101/gr.
     094508.109)
Data
What data types, expressed in which formats,
does Galaxy operate on? How do I get data
into and out of Galaxy?
Data types
•    Sequences
•    Alignments
•    Intervals
•    Tabular data
File format conversion


    Built-in converters
     between related
     file formats are
     provided
    Additional
     converters can be
     added
Galaxy sequence formats

     FASTA – def line, sequence
     FASTQ – FASTA + quality - SOLEXA:
     ABI/SCF – binary sequence trace (see below)
     SFF (454) – binary flowgram format
Galaxy alignment formats

      MAF – multiple alignment format (see below)
      (S|B)AM – text or binary format for reads and ref
      AXT – pairwise alignment, from LAV
      LAV – BLASTZ output, pairwise alignment
Galaxy interval/feature formats

    BED
         chrom, start, end, …
    INTERVAL
         BED with headers
    GFF (GFF3)
         like BED and
          interval, but 1-based
          inclusive
    WIG(GLE)
         dense, continuous-
          valued tracks
Other data formats


 In addition to interval/feature tabular data as listed
    previously, other files with similar properties can be
    processed by some tools:
        *.txt tab-separated values (e.g. tabular FASTA)
        HTML (for additional prose)
        LPED/PBED (to describe SNPs, really two files, one for
         coordinates, other for alleles)
Data I/O
•    Upload via FTP
•    Fetch by URL
•    Import from data library
•    Provided by interoperable web service
Upload data using FTP
Fetch data from a URL
Import from data library
Fetch data from proxy service

    User submits proxy request to Galaxy
    Galaxy forwards request to remote service
    Service returns data
    Galaxy infers data type and presents results
“Big Data”


     NGS has led to massive
      data sets
     Data formats are simple,
      binary, and/or compressed
     Still, people drive around
      with USB hard disks
Data sharing/publishing


      The Galaxy platform
       allows users to
       publish and share
       their data, for
       example as
       supplemental
       materials to a
       publication*


         * example: http://genome.cshlp.org/content/19/11/2144
Tools
Which operations can I run on Galaxy?
Galaxy tools


  Send and Get data – Upload, fetch, send, submit
  Data manipulation – Join, sort, filter

  Format conversion – FASTA, other format operations

  Statistics – Regressions, simulations, model tests

  NGS – BAM, FASTQ, SOLiD, 454 file operations

  RNA analysis – cufflinks, tophat

  Evolution – branch lengths, NJ, HyPhy
Visualization
What data can I view in Galaxy?
Galaxy visualization - histograms
Galaxy visualization - scatterplots
Galaxy visualization – XY plot
Galaxy visualization – box plot
Galaxy visualization - trackster
Deployment
How to deploy your analyses on Galaxy, on
public servers, in the cloud or locally
Public servers
•    “Main” at UseGalaxy.org
•    NBIC
•    Others listed on Galaxy wiki
Local server

Pro:                          Con:

    Data close by                Complicated install
    Can add your own tools       Many dependencies
    Can develop Galaxy           UNIX-based
     further
    UNIX-based
Cloud Galaxy

  Galaxy can be installed in the (Amazon EC2 cloud)
  Private data without the hardware hassle

  Uploading and storing data can be costly, however
Implementation
How does it work under the hood?
Galaxy under the hood


             Issues
           command




 1. Parses HTTP request
 2. Identifies which tool to use
 3. Reads tool description
 4. Queues tool
 5. Parses result
 6. Returns HTML representation of result
Web server


    Simple Galaxy installs use
     a built-in, python-based
     HTTP server
    More robust installs
     typically use the Apache
     httpd server
Code base


      Most of the framework
       and the wrapper code is
       written in python
      Some wrappers in other
       languages, e.g. perl
Interface language


    Under the hood, Galaxy executes command-line programs and
     scripts
    Their interfaces and tool tips are described in XML files
Queuing

              Jobs are executed
               asynchronously
              Progress is shown in the
               data browser
              On big servers (e.g.
               “Main”), queuing is
               managed by the
               dedicated “Torque”
               system
UNIX


    Galaxy (simply)executes
     command-line programs
     within a UNIX-like
     environment
    Galaxy doesn’t have to
     “know” how to run those
     programs, it finds out from
     their descriptions at runtime
Database



      Analysis metadata is stored
       in a database, by default
       this is SQLite
      More robust installs use
       PostgreSQL
Configuration


     Galaxy has many moving
      parts that can be
      configured
     Configuration is done using
      simple text-based INI files
Version control


    Revision control (or version
     control) provides unlimited
     undo and detailed tracking
     of changes
    Galaxy uses Mercurial
    Popular now are svn, git
     and hg
Community
How to get in touch with the world-wide
community to get the most out of Galaxy?
Wiki
Mailing lists



      @lists.bx.psu.edu:
           galaxy-user
           galaxy-dev
           galaxy-announce
           galaxy-commits
Tutorials


     Galaxy 101:
          Getting data from UCSC
          Performing simple data
           manipulation
          Understanding Galaxy's
           History system
          Creating and editing
           workflows
          Applying workflows to
           your data
Screencasts
Events

      Galaxy Community
       Conference
      ISMB
      ECCB
      BOSC
      PAG
      Bio-IT World
      GMOD Meetings
“Tool shed”



                  Easy sharing of
                   new tools
                  Based on Mercurial
                  Turns Galaxy into a
                   modular ecosystem
CiteULike group




         Social citation manager
         There is a Galaxy group:
              citeulike.org/group/16008

           Articles are tagged by:
              PROJECT,
                     ISGALAXY, SHARED,
              HOWTO, METHODS,
              REPRODUCIBILITY, WORKFLOW
Organizations


     Development:
          Penn State University
          Emory University
     Support:
          NSF
          NHGRI
     Power user:
          NBIC
Links

    UseGalaxy.org – the Main server
    GetGalaxy.org – for local installs
    UseGalaxy.org/galaxy101 – intro tutorial
    Galaxy.nbic.nl – Sombrero, the NBIC server
    CiteULike.org/group/16008 – references
    genome.cshlp.org/content/19/11/2144 – windshield paper
    SlideShare.net/rvosa – these slides

Weitere ähnliche Inhalte

Was ist angesagt?

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics OmicsHiplot
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systemsHimanshi Chauhan
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoverySunghwan Kim
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure PredictionSumin Byeon
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticssarwat bashir
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure predictionMuhammed sadiq
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 

Was ist angesagt? (20)

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics Omics
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systems
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Intro to illumina sequencing
Intro to illumina sequencingIntro to illumina sequencing
Intro to illumina sequencing
 
Prosite
PrositeProsite
Prosite
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 

Andere mochten auch

Galaxy presentation
Galaxy presentationGalaxy presentation
Galaxy presentationBabubij
 
Geothermal energy
Geothermal energyGeothermal energy
Geothermal energyefabra
 
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsHuman/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsNC Military Business Center
 
Development and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelDevelopment and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelThermo Fisher Scientific
 
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Thermo Fisher Scientific
 
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemFully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemThermo Fisher Scientific
 
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...QIAGEN
 
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Thermo Fisher Scientific
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingIntegrated DNA Technologies
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Integrated DNA Technologies
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Integrated DNA Technologies
 
Sun, Moon and Planets Slideshow
Sun, Moon and Planets SlideshowSun, Moon and Planets Slideshow
Sun, Moon and Planets Slideshowsanstanton
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURESbca2010
 
Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Scissor Lee
 
Special quadrilaterals proofs ans constructions
Special quadrilaterals  proofs ans constructions Special quadrilaterals  proofs ans constructions
Special quadrilaterals proofs ans constructions cristufer
 

Andere mochten auch (20)

Galaxy presentation
Galaxy presentationGalaxy presentation
Galaxy presentation
 
Earth Moon And Sun
Earth Moon And SunEarth Moon And Sun
Earth Moon And Sun
 
Geothermal energy
Geothermal energyGeothermal energy
Geothermal energy
 
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsHuman/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
 
Example site map
Example site mapExample site map
Example site map
 
Development and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelDevelopment and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 Panel
 
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
 
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemFully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
 
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
 
xGen® Lockdown® Probes
xGen® Lockdown® ProbesxGen® Lockdown® Probes
xGen® Lockdown® Probes
 
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
 
Galaxies
GalaxiesGalaxies
Galaxies
 
Sun, Moon and Planets Slideshow
Sun, Moon and Planets SlideshowSun, Moon and Planets Slideshow
Sun, Moon and Planets Slideshow
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURES
 
Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談
 
Unit 1.my school
Unit 1.my schoolUnit 1.my school
Unit 1.my school
 
Special quadrilaterals proofs ans constructions
Special quadrilaterals  proofs ans constructions Special quadrilaterals  proofs ans constructions
Special quadrilaterals proofs ans constructions
 

Ähnlich wie The Galaxy bioinformatics workflow environment

Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Shaojun Xie
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with GalaxyEnis Afgan
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Amazon Web Services
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubeFAO
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
GeoChronos
GeoChronosGeoChronos
GeoChronoscurryr
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 

Ähnlich wie The Galaxy bioinformatics workflow environment (20)

Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with Galaxy
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCube
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
GeoChronos
GeoChronosGeoChronos
GeoChronos
 
Big data
Big dataBig data
Big data
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
Grid computing
Grid computingGrid computing
Grid computing
 
IMPACT/myGrid Hackathon - Taverna Roadmap
IMPACT/myGrid Hackathon - Taverna RoadmapIMPACT/myGrid Hackathon - Taverna Roadmap
IMPACT/myGrid Hackathon - Taverna Roadmap
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 

Mehr von Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 

Mehr von Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Kürzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Kürzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

The Galaxy bioinformatics workflow environment

  • 1. GALAXY BIOINFORMATICS WORKFLOW ENVIRONMENT Rutger Vos, 3 April 2012
  • 2. Overview Informatics in the post-genomic era
  • 3. The past (?)   Analyses glued together using scripting languages, directly on the CLI or in GUI   Sanger sequencing   Smaller data volumes   Fewer remote data resources   Hypothesis-driven
  • 4. The present   Graphical or text- based workflow tools   “Next generation” sequencing   Large data sets   Many remote data resources   “Data-driven”
  • 5. NGS – Roche 454 pyrosequencing Pyrosequencing Genome Sequencer FLX   “Emulsion PCR”   Bead with primer in each droplet   Each bead is placed in a well with luciferase   Plate is analyzed by fiber- optic chip
  • 6. NGS – Illumina/Solexa Reversible dye-terminator seq HiSeq2000 (BGI)   DNA attaches to primer on slide and is amplified   4 RT-bases are added   Camera detects labeled nucleotides   Next 1-base cycle
  • 7. NGS – IonTorrent Ion semiconductor sequencing Ion Torrent PGM   “sequencing by synthesis”   Not light-based, sensor detects H+ ions during synthesis   Longer reads
  • 8. NGS – SOLiD Sequencing Oligonucleotide ligation and SOLiD 5500 Genetic Analyzer detection   Beads with DNA fragments   Universal adapter attached to fragments   PCR product attaches to slide   Fluorescent probes ligate to the primer
  • 9. The future   “Next-next generation” sequencing (MinION?)   Smaller data sets?   Semantic web?   Back to hypotheses?
  • 10. Workflows Tools for automating bioinformatics analyses
  • 11. Examples of workflow tools Taverna taverna.org.uk Galaxy usegalaxy.org eHive ensembl.org/info/docs/eHive Mobyle mobyle.pasteur.fr Cipres phylo.org Yahoo! Pipes pipes.yahoo.com “make” gnu.org/software/make
  • 13. Provenance, histories   Where do the data come from?   How were they altered?   Galaxy tracks the history of data.   Histories can be converted to workflows
  • 16. Reproducibility   Good science requires that results be reproducible   Some analyses are run many times
  • 17. Sharing   “Standing on the shoulders of giants”   Not re-inventing the wheel   “Executable papers” (doi: 10.1101/gr. 094508.109)
  • 18. Data What data types, expressed in which formats, does Galaxy operate on? How do I get data into and out of Galaxy?
  • 19. Data types •  Sequences •  Alignments •  Intervals •  Tabular data
  • 20. File format conversion   Built-in converters between related file formats are provided   Additional converters can be added
  • 21. Galaxy sequence formats   FASTA – def line, sequence   FASTQ – FASTA + quality - SOLEXA:   ABI/SCF – binary sequence trace (see below)   SFF (454) – binary flowgram format
  • 22. Galaxy alignment formats   MAF – multiple alignment format (see below)   (S|B)AM – text or binary format for reads and ref   AXT – pairwise alignment, from LAV   LAV – BLASTZ output, pairwise alignment
  • 23. Galaxy interval/feature formats   BED   chrom, start, end, …   INTERVAL   BED with headers   GFF (GFF3)   like BED and interval, but 1-based inclusive   WIG(GLE)   dense, continuous- valued tracks
  • 24. Other data formats In addition to interval/feature tabular data as listed previously, other files with similar properties can be processed by some tools:   *.txt tab-separated values (e.g. tabular FASTA)   HTML (for additional prose)   LPED/PBED (to describe SNPs, really two files, one for coordinates, other for alleles)
  • 25. Data I/O •  Upload via FTP •  Fetch by URL •  Import from data library •  Provided by interoperable web service
  • 28. Import from data library
  • 29. Fetch data from proxy service   User submits proxy request to Galaxy   Galaxy forwards request to remote service   Service returns data   Galaxy infers data type and presents results
  • 30. “Big Data”   NGS has led to massive data sets   Data formats are simple, binary, and/or compressed   Still, people drive around with USB hard disks
  • 31. Data sharing/publishing   The Galaxy platform allows users to publish and share their data, for example as supplemental materials to a publication* * example: http://genome.cshlp.org/content/19/11/2144
  • 32. Tools Which operations can I run on Galaxy?
  • 33. Galaxy tools   Send and Get data – Upload, fetch, send, submit   Data manipulation – Join, sort, filter   Format conversion – FASTA, other format operations   Statistics – Regressions, simulations, model tests   NGS – BAM, FASTQ, SOLiD, 454 file operations   RNA analysis – cufflinks, tophat   Evolution – branch lengths, NJ, HyPhy
  • 34. Visualization What data can I view in Galaxy?
  • 36. Galaxy visualization - scatterplots
  • 40. Deployment How to deploy your analyses on Galaxy, on public servers, in the cloud or locally
  • 41. Public servers •  “Main” at UseGalaxy.org •  NBIC •  Others listed on Galaxy wiki
  • 42. Local server Pro: Con:   Data close by   Complicated install   Can add your own tools   Many dependencies   Can develop Galaxy   UNIX-based further   UNIX-based
  • 43. Cloud Galaxy   Galaxy can be installed in the (Amazon EC2 cloud)   Private data without the hardware hassle   Uploading and storing data can be costly, however
  • 44. Implementation How does it work under the hood?
  • 45. Galaxy under the hood Issues command 1. Parses HTTP request 2. Identifies which tool to use 3. Reads tool description 4. Queues tool 5. Parses result 6. Returns HTML representation of result
  • 46. Web server   Simple Galaxy installs use a built-in, python-based HTTP server   More robust installs typically use the Apache httpd server
  • 47. Code base   Most of the framework and the wrapper code is written in python   Some wrappers in other languages, e.g. perl
  • 48. Interface language   Under the hood, Galaxy executes command-line programs and scripts   Their interfaces and tool tips are described in XML files
  • 49. Queuing   Jobs are executed asynchronously   Progress is shown in the data browser   On big servers (e.g. “Main”), queuing is managed by the dedicated “Torque” system
  • 50. UNIX   Galaxy (simply)executes command-line programs within a UNIX-like environment   Galaxy doesn’t have to “know” how to run those programs, it finds out from their descriptions at runtime
  • 51. Database   Analysis metadata is stored in a database, by default this is SQLite   More robust installs use PostgreSQL
  • 52. Configuration   Galaxy has many moving parts that can be configured   Configuration is done using simple text-based INI files
  • 53. Version control   Revision control (or version control) provides unlimited undo and detailed tracking of changes   Galaxy uses Mercurial   Popular now are svn, git and hg
  • 54. Community How to get in touch with the world-wide community to get the most out of Galaxy?
  • 55. Wiki
  • 56. Mailing lists   @lists.bx.psu.edu:   galaxy-user   galaxy-dev   galaxy-announce   galaxy-commits
  • 57. Tutorials   Galaxy 101:   Getting data from UCSC   Performing simple data manipulation   Understanding Galaxy's History system   Creating and editing workflows   Applying workflows to your data
  • 59. Events   Galaxy Community Conference   ISMB   ECCB   BOSC   PAG   Bio-IT World   GMOD Meetings
  • 60. “Tool shed”   Easy sharing of new tools   Based on Mercurial   Turns Galaxy into a modular ecosystem
  • 61. CiteULike group   Social citation manager   There is a Galaxy group:   citeulike.org/group/16008   Articles are tagged by:   PROJECT, ISGALAXY, SHARED, HOWTO, METHODS, REPRODUCIBILITY, WORKFLOW
  • 62. Organizations   Development:   Penn State University   Emory University   Support:   NSF   NHGRI   Power user:   NBIC
  • 63. Links   UseGalaxy.org – the Main server   GetGalaxy.org – for local installs   UseGalaxy.org/galaxy101 – intro tutorial   Galaxy.nbic.nl – Sombrero, the NBIC server   CiteULike.org/group/16008 – references   genome.cshlp.org/content/19/11/2144 – windshield paper   SlideShare.net/rvosa – these slides