SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Some Timesavers
Programming better
• “being
      able to use understand and improve your code in 6
 months & in 60 years” - approximate Damian Conway
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
• indenting
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
• indenting

• Follow
       conventions -eg
 “Google R Style”
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
• indenting

• Follow
       conventions -eg
 “Google R Style”
• Versioning: DropBox   & http://github.com/
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
• indenting

• Follow
       conventions -eg
 “Google R Style”
• Versioning: DropBox    & http://github.com/
• Automated    testing
Programming better
• “being  able to use understand and improve your code in 6
  months & in 60 years” - approximate Damian Conway
• variable naming

• coding   width: 100 characters
• indenting

• Follow
       conventions -eg
 “Google R Style”
• Versioning: DropBox       & http://github.com/
• Automated     testing
   preprocess_snps <- function(snp_table, testing=FALSE) {
       if (testing) {
           # run a bunch of tests of extreme situations.
           # quit if a test gives a weird result.
       }
       # real part of function.
   }
Education

A Quick Guide to Organizing Computational Biology
Projects
William Stafford Noble1,2*
1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and
Engineering, University of Washington, Seattle, Washington, United States of America


Introduction                                           understanding your work or who may be                   under a common root directory. The
                                                       evaluating your research skills. Most com-              exception to this rule is source code or
   Most bioinformatics coursework focus-               monly, however, that ‘‘someone’’ is you. A              scripts that are used in multiple projects.
es on algorithms, with perhaps some                    few months from now, you may not                        Each such program might have a project
components devoted to learning pro-                    remember what you were up to when you                   directory of its own.
gramming skills and learning how to                    created a particular set of files, or you may              Within a given project, I use a top-level
use existing bioinformatics software. Un-              not remember what conclusions you drew.                 organization that is logical, with chrono-
fortunately, for students who are prepar-              You will either have to then spend time                 logical organization at the next level, and
ing for a research career, this type of                reconstructing your previous experiments                logical organization below that. A sample
curriculum fails to address many of the                or lose whatever insights you gained from               project, called msms, is shown in Figure 1.
day-to-day organizational challenges as-               those experiments.                                      At the root of most of my projects, I have a
sociated with performing computational                    This leads to the second principle,                  data directory for storing fixed data sets, a
experiments. In practice, the principles               which is actually more like a version of                results directory for tracking computa-
behind organizing and documenting                      Murphy’s Law: Everything you do, you                    tional experiments peformed on that data,
computational experiments are often                    will probably have to do over again.                    a doc directory with one subdirectory per
learned on the fly, and this learning is               Inevitably, you will discover some flaw in              manuscript, and directories such as src
strongly influenced by personal predilec-              your initial preparation of the data being              for source code and bin for compiled
tions as well as by chance interactions                analyzed, or you will get access to new                 binaries or scripts.
with collaborators or colleagues.                      data, or you will decide that your param-                  Within the data and results directo-
   The purpose of this article is to describe          eterization of a particular model was not               ries, it is often tempting to apply a similar,
one good strategy for carrying out com-                broad enough. This means that the                       logical organization. For example, you
putational experiments. I will not describe            experiment you did last week, or even                   may have two or three data sets against
profound issues such as how to formulate               the set of experiments you’ve been work-                which you plan to benchmark your
hypotheses, design experiments, or draw                ing on over the past month, will probably               algorithms, so you could create one
conclusions. Rather, I will focus on                   need to be redone. If you have organized                directory for each of them under data.
Education

A Quick Guide to Organizing Computational Biology
Projects
William Stafford Noble1,2*
1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and
Engineering, University of Washington, Seattle, Washington, United States of America


Introduction                                            understanding your work or who may be                                                  under a common root directory. The
                                                        evaluating your research skills. Most com-                                             exception to this rule is source code or
   Most bioinformatics coursework focus-                monly, however, that ‘‘someone’’ is you. A                                             scripts that are used in multiple projects.
es on algorithms, with perhaps some                     few months from now, you may not                                                       Each such program might have a project
components devoted to learning pro-                     remember what you were up to when you                                                  directory of its own.
gramming skills and learning how to                     created a particular set of files, or you may                                                Within a given project, I use a top-level
use existing bioinformatics software. Un-               not remember what conclusions you drew.                                                organization that is logical, with chrono-
fortunately, for students who are prepar-               You will either have to then spend time                                                logical organization at the next level, and
ing for a research career, this type of                 reconstructing your previous experiments                                               logical organization below that. A sample
curriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The
                                           Figure
                                           the
                                                                              for a         project. Directory                            and filenames are smaller
                                                        or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1.
                                                    are shown here. Note             dates                         gained from                                          chronological
                                                                                                                                                                                        subset of

day-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a
                                           source code src/ms-analysis.c compiled to create
                                           files        those experiments.                                files from what URL
                                                                                                                              and is                  doc/ms-analysis.html. The README
                                                                                                                                      date. TheAt script results/2009-01-15/runall
                                                                                                                                                driver
                                                                                              split1, split2,
sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle,
                                           automatically              the three                                        split3, corresponding to three cross-validation splits. The bin/parse-
                                           sqt.py           is This leads runall driver second
                                                               called            the to                                                        data directory for storing fixed data sets, a
experiments. In practice, the principles   doi:10.1371/journal.pcbi.1000424.g001
                                                        which is actually more like a version of                                               results directory for tracking computa-
behind organizing and documenting this approach, the distinction be- The Lab Notebook you
                                           with         Murphy’s Law: Everything you do,                                                       tionaltypes of entries providepeformed on that data,
                                                                                                                                                   These experiments a complete
computational experiments are often data and results may not be useful.
                                           tween                                                                                                   picture of the development of the project
                                                                                                                                                    doc directory with one subdirectory per
                                           Instead, one could probably have to parallel over this chronological a over time.
                                                        will imagine a top-level                  In     do with again.
learned on the fly, and this learning directory called something likeyou will discover some flaw in
                                             is                                                directory structure, I find it useful to
                                                        Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src
                                                                                                                                                      In practice,      and
                                                                                                                                                                        ask members of my
                                           ments, with subdirectories with names like                                                              research group to               lab notebooks
strongly influenced by personal predilec-  2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled
                                                        your initial directory notebook. the aresults directory and online, behind code protection if
                                                                                                                       document that
tions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member
                                           name            also include a                      in      root of
                                                                              or you that get access to new       your
                                                                                                                                                   necessary.
                                                                                                                                                               or I meet with
                                           indicating the topic of the experiment                                                                  of my lab or a project team, we can refer
with collaborators or colleagues.          therein. In practice, a single experiment
                                                                          you day
                                                                                               Entries in the notebook should be dated,
                                                        data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo-
                                                                                                                                                       the online the data and on
                                                                                                                                                                    lab
                                           will often require more                             and that your param-
   The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar,
                                           work,        eterization working a links or embedded of the experiments previous entries as temptingURL
                                                                                                                          images or
                                                                                                                                                   the current           but scrolling         to

                                                                                               displaying the results
one good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you
                                           few            more before creating a
                                           subdirectory. Later, when you or someone            that means                In addition
                                                                                                                                                   can also be
                                                                                                                                                   rators to organization. on the
                                                                                                                                                                   them status updates For
putational experiments. I will not describe wants experiment youthedidnotebook should record youreven
                                           else          to know what you did,                 scribing precisely what you did, the
                                                                                                  last week, or observations, may have you wouldor three data sets against
                                                                                                                                                   project.          two
                                           chronological structure of your work will
profound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create
                                           be           the                                    conclusions, and ideas
                                                                                                                                                      Note that if
                                                                                                                            work- out your own ‘‘home-brew’’ electronic note-
                                                                                                                                                                                rather not
                                                                                                                                                                                            benchmark your
hypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available.
                                               Below                         directory,
                                                        ing on over the past badly, it is will probably
                                           organization of files and                     is     month,
                                                                                                              when an experiment
                                                                                                                                    the
                                                                                                                                                   book, several
                                                                                                                                                                        so         are
                                                                                                                                                   For example, a variety of commercial
                                                                                                                                                                                           could create one
conclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data.
                                           logical,       depends upon the structure           plot or table of results and start a new
                                                                                               experiment. Before doing that,
                                                                                                                                                   software systems       each created to
                                                           of your experiment. In many simple                                                 help scientists create and maintain elec-
Education

A Quick Guide to Organizing Computational Biology
Projects
William Stafford Noble1,2*
1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and
Engineering, University of Washington, Seattle, Washington, United States of America


Introduction                                            understanding your work or who may be                                                  under a common root directory. The
                                                        evaluating your research skills. Most com-                                             exception to this rule is source code or
   Most bioinformatics coursework focus-                monly, however, that ‘‘someone’’ is you. A                                             scripts that are used in multiple projects.
es on algorithms, with perhaps some                     few months from now, you may not                                                       Each such program might have a project
components devoted to learning pro-                     remember what you were up to when you                                                  directory of its own.
gramming skills and learning how to                     created a particular set of files, or you may                                                Within a given project, I use a top-level
use existing bioinformatics software. Un-               not remember what conclusions you drew.                                                organization that is logical, with chrono-
fortunately, for students who are prepar-               You will either have to then spend time                                                logical organization at the next level, and
ing for a research career, this type of                 reconstructing your previous experiments                                               logical organization below that. A sample
curriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The
                                           Figure                             for a         project. Directory                            and filenames are smaller
                                                        or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1.
                                                                                                                   gained from
                                                                                                                                                                                        subset of



   In each results folder:
                                           the      are shown here. Note             dates                                                                              chronological
day-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a
                                           source code src/ms-analysis.c compiled to create
                                           files        those experiments.                                files from what URL
                                                                                                                              and is                  doc/ms-analysis.html. The README
                                                                                                                                      date. TheAt script results/2009-01-15/runall
                                                                                                                                                driver
                                                                                              split1, split2,
sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle,
                                           automatically              the three                                        split3, corresponding to three cross-validation splits. The bin/parse-
                                           sqt.py           is This leads runall driver second
                                                               called            the to                                                        data directory for storing fixed data sets, a


   •script: getResults.rb or WHATIDID.txt
experiments. In practice, the principles   doi:10.1371/journal.pcbi.1000424.g001
                                                        which is actually more like a version of                                               results directory for tracking computa-
behind organizing and documenting this approach, the distinction be- The Lab Notebook you
                                           with         Murphy’s Law: Everything you do,                                                       tionaltypes of entries providepeformed on that data,
                                                                                                                                                   These experiments a complete
computational experiments are often data and results may not be useful.
                                           tween                                                                                                   picture of the development of the project
                                                                                                                                                    doc directory with one subdirectory per
                                           Instead, one could probably have to parallel over this chronological a over time.
                                                        will imagine a top-level                  In     do with again.

   •intermediates
learned on the fly, and this learning directory called something likeyou will discover some flaw in
                                             is                                                directory structure, I find it useful to
                                                        Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src
                                                                                                                                                      In practice,      and
                                                                                                                                                                        ask members of my
                                           ments, with subdirectories with names like                                                              research group to               lab notebooks
strongly influenced by personal predilec-  2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled
                                                        your initial directory notebook. the aresults directory and online, behind code protection if
                                                                                                                       document that
tions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member
                                           name            also include a                      in      root of
                                                                              or you that get access to new
                                                                                                                                                   necessary.
                                                                                                                                                               or I meet with

   •output
                                           indicating the topic of the experiment                                 your                             of my lab or a project team, we can refer
with collaborators or colleagues.          therein. In practice, a single experiment
                                                                          you day
                                                                                               Entries in the notebook should be dated,
                                                        data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo-
                                                                                                                                                       the online the data and on
                                                                                                                                                                    lab
                                           will often require more                             and that your param-
   The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar,
                                           work,        eterization working a links or embedded of the experiments previous entries as temptingURL
                                                                                                                          images or
                                                                                                                                                   the current           but scrolling         to

                                                                                               displaying the results
one good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you
                                           few            more before creating a
                                           subdirectory. Later, when you or someone            that means                In addition
                                                                                                                                                   can also be
                                                                                                                                                   rators to organization. on the
                                                                                                                                                                   them status updates For
putational experiments. I will not describe wants experiment youthedidnotebook should record youreven
                                           else          to know what you did,                 scribing precisely what you did, the
                                                                                                  last week, or observations, may have you wouldor three data sets against
                                                                                                                                                   project.          two
                                           chronological structure of your work will
profound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create
                                           be           the                                    conclusions, and ideas
                                                                                                                                                      Note that if
                                                                                                                            work- out your own ‘‘home-brew’’ electronic note-
                                                                                                                                                                                rather not
                                                                                                                                                                                            benchmark your
hypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available.
                                               Below                         directory,
                                                        ing on over the past badly, it is will probably
                                           organization of files and                     is     month,
                                                                                                              when an experiment
                                                                                                                                    the
                                                                                                                                                   book, several
                                                                                                                                                                        so         are
                                                                                                                                                   For example, a variety of commercial
                                                                                                                                                                                           could create one
conclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data.
                                           logical,       depends upon the structure           plot or table of results and start a new
                                                                                               experiment. Before doing that,
                                                                                                                                                   software systems       each created to
                                                           of your experiment. In many simple                                                 help scientists create and maintain elec-
Markdown.
Markdown.
•A   few tools
knitr (sweave)Analyzing & Reporting in a single file.
MyFile.Rnw
knitr (sweave)Analyzing & Reporting in a single file.
MyFile.Rnw
documentclass{article}
usepackage[sc]{mathpazo}
usepackage[T1]{fontenc}

begin{document}

<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=
# this is equivalent to SweaveOpts{...}
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(replace.assign=TRUE,width=90)
@


title{A Minimal Demo of knitr}

author{Yihui Xie}

maketitle
You can test if textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

<<boring-random,echo=TRUE,cache=TRUE>>=
set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@

The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

<<boring-plots,cache=TRUE,echo=TRUE'>>=
## two plots side by side (option fig.show='hold')
par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)
boxplot(x)
hist(x,main='')
@

Do the above chunks work? You should be able to compile the TeX{}
document and get a PDF file like this one: url{https://github.com/downloads/
knitr (sweave)Analyzing & Reporting in a single file.
                                                                         ### in R:
MyFile.Rnw                                                               library(knitr)
documentclass{article}
usepackage[sc]{mathpazo}
usepackage[T1]{fontenc}                                                 knit(“MyFile.Rnw”)
begin{document}                                                         # --> creates MyFile.tex
<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=
# this is equivalent to SweaveOpts{...}
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(replace.assign=TRUE,width=90)
@
                                                                         ### in shell:
                                                                         pdflatex MyFile.tex
title{A Minimal Demo of knitr}

author{Yihui Xie}
                                                                         # --> creates MyFile.pdf
maketitle
You can test if textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

<<boring-random,echo=TRUE,cache=TRUE>>=
set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@

The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

<<boring-plots,cache=TRUE,echo=TRUE'>>=
## two plots side by side (option fig.show='hold')
par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)
boxplot(x)
hist(x,main='')
@

Do the above chunks work? You should be able to compile the TeX{}
document and get a PDF file like this one: url{https://github.com/downloads/
knitr (sweave)Analyzing & Reporting in a single file.
                                                                         ### in R:
MyFile.Rnw                                                               library(knitr)
documentclass{article}
usepackage[sc]{mathpazo}
usepackage[T1]{fontenc}                                                 knit(“MyFile.Rnw”)
begin{document}                                                         # --> creates MyFile.tex
<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=
# this is equivalent to SweaveOpts{...}
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(replace.assign=TRUE,width=90)
@
                                                                         ### in shell:
                                                                         pdflatex MyFile.tex
title{A Minimal Demo of knitr}

author{Yihui Xie}
                                                                         # --> creates MyFile.pdf
maketitle
You can test if textbf{knitr} works with this minimal demo. OK, let's                                   A Minimal Demo of knitr
get started with some boring random numbers:
                                                                                                                         Yihui Xie
<<boring-random,echo=TRUE,cache=TRUE>>=
set.seed(1121)                                                                                                      February 26, 2012
(x=rnorm(20))
mean(x);var(x)
@                                                                             You can test if knitr works with this minimal demo. OK, let’s get started with s
                                                                            numbers:
The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:                                  set.seed(1121)
                                                                            (x <- rnorm(20))
<<boring-plots,cache=TRUE,echo=TRUE'>>=
                                                                            ## [1] 0.14496 0.43832        0.15319   1.08494 1.99954 -0.81188       0.16027   0
## two plots side by side (option fig.show='hold')
                                                                            ## [10] -0.02531 0.15088      0.11008   1.35968 -0.32699 -0.71638      1.80977   0
par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)
                                                                            ## [19] 0.13272 -0.15594
boxplot(x)
hist(x,main='')
                                                                            mean(x)
@
                                                                          ## [1] 0.3217
Do the above chunks work? You should be able to compile the TeX{}
document and get a PDF file like this one: url{https://github.com/downloads/
                                                                          var(x)
knitr (sweave)Analyzing & Reporting in a single file.
                                                                         ### in R: A Minimal Demo of knitr
MyFile.Rnw                                                               library(knitr)     Yihui Xie
documentclass{article}
                                                                                        February 26, 2012
usepackage[sc]{mathpazo}
usepackage[T1]{fontenc}                                                 knit(“MyFile.Rnw”)
                                                                           You can test if knitr works with this minimal demo. OK, let’s get started with so
begin{document}                                                         # --> creates MyFile.tex
                                                                         numbers:

<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=                      set.seed(1121)
# this is equivalent to SweaveOpts{...}                                (x <- rnorm(20))
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(replace.assign=TRUE,width=90)
@
                                                                         ### in shell:
                                                                        ## [1] 0.14496 0.43832
                                                                        ## [10] -0.02531 0.15088
                                                                                                       0.15319
                                                                                                       0.11008
                                                                                                                  1.08494 1.99954 -0.81188
                                                                                                                  1.35968 -0.32699 -0.71638
                                                                                                                                                 0.16027
                                                                                                                                                 1.80977
                                                                                                                                                           0.
                                                                                                                                                           0.

                                                                         pdflatex MyFile.tex
                                                                        ## [19] 0.13272 -0.15594

                                                                         mean(x)
title{A Minimal Demo of knitr}

author{Yihui Xie}
                                                                         # --> creates MyFile.pdf
                                                                         ## [1] 0.3217

                                                                         var(x)
maketitle
You can test if textbf{knitr} works with this minimal demo. OK, let's   ## [1] 0.5715
get started with some boring random numbers:
                                                                            The first element of x is 0.145. Boring boxplots and histograms recorded by the PDF
<<boring-random,echo=TRUE,cache=TRUE>>=
                                                                         ## two plots side by side (option fig.show=’hold’)
set.seed(1121)
                                                                         par(mar = c(4, 4, 0.1, 0.1), cex.lab = 0.95, cex.axis = 0.9,
(x=rnorm(20))
                                                                             mgp = c(2, 0.7, 0), tcl = -0.3, las = 1)
mean(x);var(x)
                                                                         boxplot(x)
@
                                                                         hist(x, main = "")
The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots                       2.0                 ●                               8
and histograms recorded by the PDF device:                                                                 ●


                                                                                       1.5
<<boring-plots,cache=TRUE,echo=TRUE'>>=                                                                                                    6
## two plots side by side (option fig.show='hold')
                                                                                       1.0
par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)




                                                                                                                               Frequency
boxplot(x)                                                                                                                                 4
                                                                                       0.5
hist(x,main='')
@
                                                                                       0.0
                                                                                                                                           2
Do the above chunks work? You should be able to compile the TeX{}
                                                                                      −0.5
document and get a PDF file like this one: url{https://github.com/downloads/
ggplot2: beautiful &
(almost) effortless R plots
ggplot2: beautiful &
(almost) effortless R plots
                                                    10




                                            count
                                                     5




                                                     0

                                                         4       6         8
                                                             factor(cyl)




ggplot(mtcars, aes(factor(cyl))) + geom_bar()
ggplot2: beautiful &
(almost) effortless R plots
                                                   10




                                           count
                                                    5




                                                    0

                                                            4                 6          8
                                                                        factor(cyl)




                                                   10
                                                                                      factor(gear)
                                                                                          3




                                           count
                                                                                          4
                                                                                          5
                                                    5




                                                    0

                                                        4           6             8
                                                                factor(cyl)




ggplot(mtcars, aes(factor(cyl))) + geom_bar()
ggplot(mtcars, aes(factor(cyl), fill=factor(gear))) + geom_bar()
Ruby.
Ruby.



“Friends don’t let friends do Perl” - reddit user
Getting help.
Getting help.
• In   real life: Make friends with people. Talk to them.
Getting help.
• In   real life: Make friends with people. Talk to them.

• Online:
Getting help.
• In   real life: Make friends with people. Talk to them.

• Online:
  • Specific    discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
Getting help.
• In   real life: Make friends with people. Talk to them.

• Online:
  • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
  • Programming: http://stackoverflow.com
Getting help.
• In   real life: Make friends with people. Talk to them.

• Online:
  • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
  • Programming: http://stackoverflow.com
  • Bioinformatics: http://www.biostars.org
Getting help.
• In   real life: Make friends with people. Talk to them.

• Online:
  • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
  • Programming: http://stackoverflow.com
  • Bioinformatics: http://www.biostars.org
  • Sequencing-related: http://seqanswers.com
• Once   I wanted to set up a BLAST server.
• Once   I wanted to set up a BLAST server.




                             Anurag Priyam, Mechanical
                           engineering student, Kharagpur
• Once   I wanted to set up a BLAST server.




                          Anurag Priyam, Mechanical
                        engineering student, Kharagpur
Aim:             An open source
           idiot-proof web-interface
               for custom BLAST
http://www.sequenceserver.com/
1. Installing
   gem install sequenceserver
http://www.sequenceserver.com/
1. Installing
   gem install sequenceserver




2. Configure.
   # .sequenceserver.conf
   bin: ~/ncbi-blast-2.2.25+/bin/
   database: /Users/me/blast_databases/
http://www.sequenceserver.com/
1. Installing
   gem install sequenceserver




2. Configure.
   # .sequenceserver.conf
   bin: ~/ncbi-blast-2.2.25+/bin/
   database: /Users/me/blast_databases/

3. Launch.
   sequenceserver
   ###   Launched SequenceServer at: http://0.0.0.0:4567
http://www.sequenceserver.com/
1. Installing
   gem install sequenceserver


       Do you have BLAST+? If not:
                gem install blast


       Do you have BLAST-formatted databases? If not:
                sequenceserver format-databases /path/to/fastas




2. Configure.
   # .sequenceserver.conf
   bin: ~/ncbi-blast-2.2.25+/bin/
   database: /Users/me/blast_databases/

3. Launch.
   sequenceserver
   ###   Launched SequenceServer at: http://0.0.0.0:4567
http://0.0.0.0:4567
So what did we do this week?
          CummeRbund? SOAP? WTF?


Aim: first stages of working with a non-model organism.
• Read   quality: FastQC [required for all data!]
• Readquality: FastQC [required for all data!]
• Genome




• RNA




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo




• RNA




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:




• RNA




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).




• RNA




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).




• RNA
     • de   novo Assembly: Trinity




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)




• RNA
     • de   novo Assembly: Trinity




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)
      • Gene identification



• RNA
     • de   novo Assembly: Trinity




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)
      • Gene identification
         • MAKER (automated uses many tools)

• RNA
     • de   novo Assembly: Trinity




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)
      • Gene identification
         • MAKER (automated uses many tools)
         • Apollo (fixing MAKER’s gene models)
• RNA
     • de novo Assembly: Trinity




• SNPs   & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)
      • Gene identification
         • MAKER (automated uses many tools)
         • Apollo (fixing MAKER’s gene models)
• RNA
     • de novo Assembly: Trinity
     • Gene expression comparison (Queen vs Worker vs Male)
         • TopHat (mapping to genome)
         • Cufflinks (de novo gene prediction & quantification)
         • CummeRbund (easy visualization)
• SNPs & population stuff
• Read quality: FastQC [required for all data!]
• Genome
      • Assembly: SOAPdenovo
      • Assembly quality:
            • Internal metrics (scaffold size, number).
            • Comparison with other data (assembled RNA)
      • Gene identification
         • MAKER (automated uses many tools)
         • Apollo (fixing MAKER’s gene models)
• RNA
     • de novo Assembly: Trinity
     • Gene expression comparison (Queen vs Worker vs Male)
         • TopHat (mapping to genome)
         • Cufflinks (de novo gene prediction & quantification)
         • CummeRbund (easy visualization)
• SNPs & population stuff
    • from mapping of pools of RNA
    • from RAD (Stacks)
What is special about my genome?
What is special about my genome?

• After   assembly:
What is special about my genome?

• After   assembly:

  • Candidate   genes?
What is special about my genome?

• After   assembly:

  • Candidate   genes?

  • Gene    expression comparisons?
What is special about my genome?

• After   assembly:

  • Candidate   genes?

  • Gene    expression comparisons?

  • Genome-wide       scans for enrichment (of protein domains; of
   pathways....)
20120622 fridayadelboden

Weitere ähnliche Inhalte

Andere mochten auch

Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...Jonathan Eisen
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesisschamber
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenJonathan Eisen
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen
 
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Jonathan Eisen
 
The neurobiological nature of free will
The neurobiological nature of free willThe neurobiological nature of free will
The neurobiological nature of free willBjörn Brembs
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-updateJan Aerts
 
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...Jean-Claude Bradley
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformaticsJan Aerts
 
Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Deepak Singh
 
Jonathan Eisen @phylogenomics talk for #LAMG12
Jonathan Eisen @phylogenomics talk for #LAMG12Jonathan Eisen @phylogenomics talk for #LAMG12
Jonathan Eisen @phylogenomics talk for #LAMG12Jonathan Eisen
 
Evolution of gene family size change in fungi
Evolution of gene family size change in fungiEvolution of gene family size change in fungi
Evolution of gene family size change in fungiJason Stajich
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talkRoderic Page
 
Using Social Media in Research
Using Social Media in ResearchUsing Social Media in Research
Using Social Media in ResearchHolly Bik
 
Fungal ITS meeting presentation
Fungal ITS meeting presentationFungal ITS meeting presentation
Fungal ITS meeting presentationHolly Bik
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for PhyloinformaticsRutger Vos
 

Andere mochten auch (20)

Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan Eisen
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
 
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
 
The neurobiological nature of free will
The neurobiological nature of free willThe neurobiological nature of free will
The neurobiological nature of free will
 
ORCID Principles
ORCID PrinciplesORCID Principles
ORCID Principles
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
 
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11
 
Jonathan Eisen @phylogenomics talk for #LAMG12
Jonathan Eisen @phylogenomics talk for #LAMG12Jonathan Eisen @phylogenomics talk for #LAMG12
Jonathan Eisen @phylogenomics talk for #LAMG12
 
Evolution of gene family size change in fungi
Evolution of gene family size change in fungiEvolution of gene family size change in fungi
Evolution of gene family size change in fungi
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talk
 
ESA 2012 talk
ESA 2012 talkESA 2012 talk
ESA 2012 talk
 
Using Social Media in Research
Using Social Media in ResearchUsing Social Media in Research
Using Social Media in Research
 
Fungal ITS meeting presentation
Fungal ITS meeting presentationFungal ITS meeting presentation
Fungal ITS meeting presentation
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 

Ähnlich wie 20120622 fridayadelboden

Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clusteringbutest
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchJoshuaApolonio1
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Topical clustering of search results
Topical clustering of search resultsTopical clustering of search results
Topical clustering of search resultsSunny Kr
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014dreusser
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
A Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionJennifer Strong
 
CIS 525 Education guide/Tutorialrank.com
CIS 525 Education guide/Tutorialrank.comCIS 525 Education guide/Tutorialrank.com
CIS 525 Education guide/Tutorialrank.comnummaju
 
Scrum an extension pattern language for hyperproductive software development
Scrum an extension pattern language  for hyperproductive software developmentScrum an extension pattern language  for hyperproductive software development
Scrum an extension pattern language for hyperproductive software developmentShiraz316
 
Object Oriented System Design
Object Oriented System DesignObject Oriented System Design
Object Oriented System DesignMurugeswari Ravi
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahniHitesh Wagle
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentationNoonapau
 

Ähnlich wie 20120622 fridayadelboden (20)

Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
Brochure curriculum (1)
Brochure curriculum (1)Brochure curriculum (1)
Brochure curriculum (1)
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Topical clustering of search results
Topical clustering of search resultsTopical clustering of search results
Topical clustering of search results
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014
SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
A Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component Collection
 
CIS 525 Education guide/Tutorialrank.com
CIS 525 Education guide/Tutorialrank.comCIS 525 Education guide/Tutorialrank.com
CIS 525 Education guide/Tutorialrank.com
 
Scrum an extension pattern language for hyperproductive software development
Scrum an extension pattern language  for hyperproductive software developmentScrum an extension pattern language  for hyperproductive software development
Scrum an extension pattern language for hyperproductive software development
 
Object Oriented System Design
Object Oriented System DesignObject Oriented System Design
Object Oriented System Design
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
 
01.intro
01.intro01.intro
01.intro
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentation
 

Mehr von Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics researchYannick Wurm
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible researchYannick Wurm
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdomYannick Wurm
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosomeYannick Wurm
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assemblyYannick Wurm
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker badYannick Wurm
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.keyYannick Wurm
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitchYannick Wurm
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolutionYannick Wurm
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.keyYannick Wurm
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcommYannick Wurm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 introYannick Wurm
 

Mehr von Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdom
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 

Kürzlich hochgeladen

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

20120622 fridayadelboden

  • 2. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway
  • 3. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming
  • 4. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters
  • 5. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters • indenting
  • 6. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters • indenting • Follow conventions -eg “Google R Style”
  • 7. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters • indenting • Follow conventions -eg “Google R Style” • Versioning: DropBox & http://github.com/
  • 8. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters • indenting • Follow conventions -eg “Google R Style” • Versioning: DropBox & http://github.com/ • Automated testing
  • 9. Programming better • “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway • variable naming • coding width: 100 characters • indenting • Follow conventions -eg “Google R Style” • Versioning: DropBox & http://github.com/ • Automated testing preprocess_snps <- function(snp_table, testing=FALSE) { if (testing) { # run a bunch of tests of extreme situations. # quit if a test gives a weird result. } # real part of function. }
  • 10. Education A Quick Guide to Organizing Computational Biology Projects William Stafford Noble1,2* 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America Introduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects. es on algorithms, with perhaps some few months from now, you may not Each such program might have a project components devoted to learning pro- remember what you were up to when you directory of its own. gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-level use existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono- fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, and ing for a research career, this type of reconstructing your previous experiments logical organization below that. A sample curriculum fails to address many of the or lose whatever insights you gained from project, called msms, is shown in Figure 1. day-to-day organizational challenges as- those experiments. At the root of most of my projects, I have a sociated with performing computational This leads to the second principle, data directory for storing fixed data sets, a experiments. In practice, the principles which is actually more like a version of results directory for tracking computa- behind organizing and documenting Murphy’s Law: Everything you do, you tional experiments peformed on that data, computational experiments are often will probably have to do over again. a doc directory with one subdirectory per learned on the fly, and this learning is Inevitably, you will discover some flaw in manuscript, and directories such as src strongly influenced by personal predilec- your initial preparation of the data being for source code and bin for compiled tions as well as by chance interactions analyzed, or you will get access to new binaries or scripts. with collaborators or colleagues. data, or you will decide that your param- Within the data and results directo- The purpose of this article is to describe eterization of a particular model was not ries, it is often tempting to apply a similar, one good strategy for carrying out com- broad enough. This means that the logical organization. For example, you putational experiments. I will not describe experiment you did last week, or even may have two or three data sets against profound issues such as how to formulate the set of experiments you’ve been work- which you plan to benchmark your hypotheses, design experiments, or draw ing on over the past month, will probably algorithms, so you could create one conclusions. Rather, I will focus on need to be redone. If you have organized directory for each of them under data.
  • 11. Education A Quick Guide to Organizing Computational Biology Projects William Stafford Noble1,2* 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America Introduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects. es on algorithms, with perhaps some few months from now, you may not Each such program might have a project components devoted to learning pro- remember what you were up to when you directory of its own. gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-level use existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono- fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, and ing for a research career, this type of reconstructing your previous experiments logical organization below that. A sample curriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The Figure the for a project. Directory and filenames are smaller or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1. are shown here. Note dates gained from chronological subset of day-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a source code src/ms-analysis.c compiled to create files those experiments. files from what URL and is doc/ms-analysis.html. The README date. TheAt script results/2009-01-15/runall driver split1, split2, sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle, automatically the three split3, corresponding to three cross-validation splits. The bin/parse- sqt.py is This leads runall driver second called the to data directory for storing fixed data sets, a experiments. In practice, the principles doi:10.1371/journal.pcbi.1000424.g001 which is actually more like a version of results directory for tracking computa- behind organizing and documenting this approach, the distinction be- The Lab Notebook you with Murphy’s Law: Everything you do, tionaltypes of entries providepeformed on that data, These experiments a complete computational experiments are often data and results may not be useful. tween picture of the development of the project doc directory with one subdirectory per Instead, one could probably have to parallel over this chronological a over time. will imagine a top-level In do with again. learned on the fly, and this learning directory called something likeyou will discover some flaw in is directory structure, I find it useful to Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src In practice, and ask members of my ments, with subdirectories with names like research group to lab notebooks strongly influenced by personal predilec- 2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled your initial directory notebook. the aresults directory and online, behind code protection if document that tions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member name also include a in root of or you that get access to new your necessary. or I meet with indicating the topic of the experiment of my lab or a project team, we can refer with collaborators or colleagues. therein. In practice, a single experiment you day Entries in the notebook should be dated, data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo- the online the data and on lab will often require more and that your param- The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar, work, eterization working a links or embedded of the experiments previous entries as temptingURL images or the current but scrolling to displaying the results one good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you few more before creating a subdirectory. Later, when you or someone that means In addition can also be rators to organization. on the them status updates For putational experiments. I will not describe wants experiment youthedidnotebook should record youreven else to know what you did, scribing precisely what you did, the last week, or observations, may have you wouldor three data sets against project. two chronological structure of your work will profound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create be the conclusions, and ideas Note that if work- out your own ‘‘home-brew’’ electronic note- rather not benchmark your hypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available. Below directory, ing on over the past badly, it is will probably organization of files and is month, when an experiment the book, several so are For example, a variety of commercial could create one conclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data. logical, depends upon the structure plot or table of results and start a new experiment. Before doing that, software systems each created to of your experiment. In many simple help scientists create and maintain elec-
  • 12. Education A Quick Guide to Organizing Computational Biology Projects William Stafford Noble1,2* 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America Introduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects. es on algorithms, with perhaps some few months from now, you may not Each such program might have a project components devoted to learning pro- remember what you were up to when you directory of its own. gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-level use existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono- fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, and ing for a research career, this type of reconstructing your previous experiments logical organization below that. A sample curriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The Figure for a project. Directory and filenames are smaller or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1. gained from subset of In each results folder: the are shown here. Note dates chronological day-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a source code src/ms-analysis.c compiled to create files those experiments. files from what URL and is doc/ms-analysis.html. The README date. TheAt script results/2009-01-15/runall driver split1, split2, sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle, automatically the three split3, corresponding to three cross-validation splits. The bin/parse- sqt.py is This leads runall driver second called the to data directory for storing fixed data sets, a •script: getResults.rb or WHATIDID.txt experiments. In practice, the principles doi:10.1371/journal.pcbi.1000424.g001 which is actually more like a version of results directory for tracking computa- behind organizing and documenting this approach, the distinction be- The Lab Notebook you with Murphy’s Law: Everything you do, tionaltypes of entries providepeformed on that data, These experiments a complete computational experiments are often data and results may not be useful. tween picture of the development of the project doc directory with one subdirectory per Instead, one could probably have to parallel over this chronological a over time. will imagine a top-level In do with again. •intermediates learned on the fly, and this learning directory called something likeyou will discover some flaw in is directory structure, I find it useful to Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src In practice, and ask members of my ments, with subdirectories with names like research group to lab notebooks strongly influenced by personal predilec- 2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled your initial directory notebook. the aresults directory and online, behind code protection if document that tions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member name also include a in root of or you that get access to new necessary. or I meet with •output indicating the topic of the experiment your of my lab or a project team, we can refer with collaborators or colleagues. therein. In practice, a single experiment you day Entries in the notebook should be dated, data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo- the online the data and on lab will often require more and that your param- The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar, work, eterization working a links or embedded of the experiments previous entries as temptingURL images or the current but scrolling to displaying the results one good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you few more before creating a subdirectory. Later, when you or someone that means In addition can also be rators to organization. on the them status updates For putational experiments. I will not describe wants experiment youthedidnotebook should record youreven else to know what you did, scribing precisely what you did, the last week, or observations, may have you wouldor three data sets against project. two chronological structure of your work will profound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create be the conclusions, and ideas Note that if work- out your own ‘‘home-brew’’ electronic note- rather not benchmark your hypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available. Below directory, ing on over the past badly, it is will probably organization of files and is month, when an experiment the book, several so are For example, a variety of commercial could create one conclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data. logical, depends upon the structure plot or table of results and start a new experiment. Before doing that, software systems each created to of your experiment. In many simple help scientists create and maintain elec-
  • 15. •A few tools
  • 16. knitr (sweave)Analyzing & Reporting in a single file. MyFile.Rnw
  • 17. knitr (sweave)Analyzing & Reporting in a single file. MyFile.Rnw documentclass{article} usepackage[sc]{mathpazo} usepackage[T1]{fontenc} begin{document} <<setup, include=FALSE, cache=FALSE, echo=FALSE>>= # this is equivalent to SweaveOpts{...} opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ title{A Minimal Demo of knitr} author{Yihui Xie} maketitle You can test if textbf{knitr} works with this minimal demo. OK, let's get started with some boring random numbers: <<boring-random,echo=TRUE,cache=TRUE>>= set.seed(1121) (x=rnorm(20)) mean(x);var(x) @ The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots and histograms recorded by the PDF device: <<boring-plots,cache=TRUE,echo=TRUE'>>= ## two plots side by side (option fig.show='hold') par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) boxplot(x) hist(x,main='') @ Do the above chunks work? You should be able to compile the TeX{} document and get a PDF file like this one: url{https://github.com/downloads/
  • 18. knitr (sweave)Analyzing & Reporting in a single file. ### in R: MyFile.Rnw library(knitr) documentclass{article} usepackage[sc]{mathpazo} usepackage[T1]{fontenc} knit(“MyFile.Rnw”) begin{document} # --> creates MyFile.tex <<setup, include=FALSE, cache=FALSE, echo=FALSE>>= # this is equivalent to SweaveOpts{...} opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ ### in shell: pdflatex MyFile.tex title{A Minimal Demo of knitr} author{Yihui Xie} # --> creates MyFile.pdf maketitle You can test if textbf{knitr} works with this minimal demo. OK, let's get started with some boring random numbers: <<boring-random,echo=TRUE,cache=TRUE>>= set.seed(1121) (x=rnorm(20)) mean(x);var(x) @ The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots and histograms recorded by the PDF device: <<boring-plots,cache=TRUE,echo=TRUE'>>= ## two plots side by side (option fig.show='hold') par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) boxplot(x) hist(x,main='') @ Do the above chunks work? You should be able to compile the TeX{} document and get a PDF file like this one: url{https://github.com/downloads/
  • 19. knitr (sweave)Analyzing & Reporting in a single file. ### in R: MyFile.Rnw library(knitr) documentclass{article} usepackage[sc]{mathpazo} usepackage[T1]{fontenc} knit(“MyFile.Rnw”) begin{document} # --> creates MyFile.tex <<setup, include=FALSE, cache=FALSE, echo=FALSE>>= # this is equivalent to SweaveOpts{...} opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ ### in shell: pdflatex MyFile.tex title{A Minimal Demo of knitr} author{Yihui Xie} # --> creates MyFile.pdf maketitle You can test if textbf{knitr} works with this minimal demo. OK, let's A Minimal Demo of knitr get started with some boring random numbers: Yihui Xie <<boring-random,echo=TRUE,cache=TRUE>>= set.seed(1121) February 26, 2012 (x=rnorm(20)) mean(x);var(x) @ You can test if knitr works with this minimal demo. OK, let’s get started with s numbers: The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots and histograms recorded by the PDF device: set.seed(1121) (x <- rnorm(20)) <<boring-plots,cache=TRUE,echo=TRUE'>>= ## [1] 0.14496 0.43832 0.15319 1.08494 1.99954 -0.81188 0.16027 0 ## two plots side by side (option fig.show='hold') ## [10] -0.02531 0.15088 0.11008 1.35968 -0.32699 -0.71638 1.80977 0 par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) ## [19] 0.13272 -0.15594 boxplot(x) hist(x,main='') mean(x) @ ## [1] 0.3217 Do the above chunks work? You should be able to compile the TeX{} document and get a PDF file like this one: url{https://github.com/downloads/ var(x)
  • 20. knitr (sweave)Analyzing & Reporting in a single file. ### in R: A Minimal Demo of knitr MyFile.Rnw library(knitr) Yihui Xie documentclass{article} February 26, 2012 usepackage[sc]{mathpazo} usepackage[T1]{fontenc} knit(“MyFile.Rnw”) You can test if knitr works with this minimal demo. OK, let’s get started with so begin{document} # --> creates MyFile.tex numbers: <<setup, include=FALSE, cache=FALSE, echo=FALSE>>= set.seed(1121) # this is equivalent to SweaveOpts{...} (x <- rnorm(20)) opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ ### in shell: ## [1] 0.14496 0.43832 ## [10] -0.02531 0.15088 0.15319 0.11008 1.08494 1.99954 -0.81188 1.35968 -0.32699 -0.71638 0.16027 1.80977 0. 0. pdflatex MyFile.tex ## [19] 0.13272 -0.15594 mean(x) title{A Minimal Demo of knitr} author{Yihui Xie} # --> creates MyFile.pdf ## [1] 0.3217 var(x) maketitle You can test if textbf{knitr} works with this minimal demo. OK, let's ## [1] 0.5715 get started with some boring random numbers: The first element of x is 0.145. Boring boxplots and histograms recorded by the PDF <<boring-random,echo=TRUE,cache=TRUE>>= ## two plots side by side (option fig.show=’hold’) set.seed(1121) par(mar = c(4, 4, 0.1, 0.1), cex.lab = 0.95, cex.axis = 0.9, (x=rnorm(20)) mgp = c(2, 0.7, 0), tcl = -0.3, las = 1) mean(x);var(x) boxplot(x) @ hist(x, main = "") The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots 2.0 ● 8 and histograms recorded by the PDF device: ● 1.5 <<boring-plots,cache=TRUE,echo=TRUE'>>= 6 ## two plots side by side (option fig.show='hold') 1.0 par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) Frequency boxplot(x) 4 0.5 hist(x,main='') @ 0.0 2 Do the above chunks work? You should be able to compile the TeX{} −0.5 document and get a PDF file like this one: url{https://github.com/downloads/
  • 21. ggplot2: beautiful & (almost) effortless R plots
  • 22. ggplot2: beautiful & (almost) effortless R plots 10 count 5 0 4 6 8 factor(cyl) ggplot(mtcars, aes(factor(cyl))) + geom_bar()
  • 23. ggplot2: beautiful & (almost) effortless R plots 10 count 5 0 4 6 8 factor(cyl) 10 factor(gear) 3 count 4 5 5 0 4 6 8 factor(cyl) ggplot(mtcars, aes(factor(cyl))) + geom_bar() ggplot(mtcars, aes(factor(cyl), fill=factor(gear))) + geom_bar()
  • 24.
  • 25. Ruby.
  • 26. Ruby. “Friends don’t let friends do Perl” - reddit user
  • 28. Getting help. • In real life: Make friends with people. Talk to them.
  • 29. Getting help. • In real life: Make friends with people. Talk to them. • Online:
  • 30. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
  • 31. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com
  • 32. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org
  • 33. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org • Sequencing-related: http://seqanswers.com
  • 34.
  • 35.
  • 36. • Once I wanted to set up a BLAST server.
  • 37. • Once I wanted to set up a BLAST server. Anurag Priyam, Mechanical engineering student, Kharagpur
  • 38. • Once I wanted to set up a BLAST server. Anurag Priyam, Mechanical engineering student, Kharagpur Aim: An open source idiot-proof web-interface for custom BLAST
  • 39. http://www.sequenceserver.com/ 1. Installing gem install sequenceserver
  • 40. http://www.sequenceserver.com/ 1. Installing gem install sequenceserver 2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/
  • 41. http://www.sequenceserver.com/ 1. Installing gem install sequenceserver 2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/ 3. Launch. sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567
  • 42. http://www.sequenceserver.com/ 1. Installing gem install sequenceserver Do you have BLAST+? If not: gem install blast Do you have BLAST-formatted databases? If not: sequenceserver format-databases /path/to/fastas 2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/ 3. Launch. sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567
  • 44.
  • 45. So what did we do this week? CummeRbund? SOAP? WTF? Aim: first stages of working with a non-model organism.
  • 46. • Read quality: FastQC [required for all data!]
  • 47. • Readquality: FastQC [required for all data!] • Genome • RNA • SNPs & population stuff
  • 48. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • RNA • SNPs & population stuff
  • 49. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • RNA • SNPs & population stuff
  • 50. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • RNA • SNPs & population stuff
  • 51. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • RNA • de novo Assembly: Trinity • SNPs & population stuff
  • 52. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • RNA • de novo Assembly: Trinity • SNPs & population stuff
  • 53. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • RNA • de novo Assembly: Trinity • SNPs & population stuff
  • 54. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • RNA • de novo Assembly: Trinity • SNPs & population stuff
  • 55. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models) • RNA • de novo Assembly: Trinity • SNPs & population stuff
  • 56. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models) • RNA • de novo Assembly: Trinity • Gene expression comparison (Queen vs Worker vs Male) • TopHat (mapping to genome) • Cufflinks (de novo gene prediction & quantification) • CummeRbund (easy visualization) • SNPs & population stuff
  • 57. • Read quality: FastQC [required for all data!] • Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models) • RNA • de novo Assembly: Trinity • Gene expression comparison (Queen vs Worker vs Male) • TopHat (mapping to genome) • Cufflinks (de novo gene prediction & quantification) • CummeRbund (easy visualization) • SNPs & population stuff • from mapping of pools of RNA • from RAD (Stacks)
  • 58. What is special about my genome?
  • 59. What is special about my genome? • After assembly:
  • 60. What is special about my genome? • After assembly: • Candidate genes?
  • 61. What is special about my genome? • After assembly: • Candidate genes? • Gene expression comparisons?
  • 62. What is special about my genome? • After assembly: • Candidate genes? • Gene expression comparisons? • Genome-wide scans for enrichment (of protein domains; of pathways....)

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. # This is my project intro\n\nYes oh yes ants are the best\n\n# Results\n\nLorem ipsum **dolor sit amet**, consectetur adipiscing elit. Morbi a quam et urna fringilla facilisis. Sed commodo, turpis et luctus pellentesque, nisl nunc luctus mauris, ut sollicitudin enim massa eu dolor. Phasellus interdum neque porta lorem vehicula auctor. Etiam justo magna, aliquam at tempus non, adipiscing vitae nibh. Integer pharetra laoreet eros, at ultrices leo gravida vel. Integer sollicitudin nibh eros, ut ullamcorper tellus. *Nulla ac tortor sed massa bibendum accumsan et fringilla ligula*. Etiam at metus lorem, vitae euismod metus. Maecenas sollicitudin elit eget nulla consequat fermentum tincidunt ipsum adipiscing. Donec ut fringilla turpis. Nunc augue purus, elementum id imperdiet et, volutpat vel magna. Donec euismod libero non augue varius sed venenatis magna tempor. Suspendisse rhoncus felis velit, et scelerisque risus.\n\n\n## They really are\n\nUh-huh\n\n\n## They really really are\n\nOk good job because: \n \n * bla \n * blabla\n * blablabla\n\n\n# Conclusion\n\nYou win: Ants are cool. I want to look at them and crush them and sequence them and genotype them. \n
  13. \n
  14. Many output formats.\n
  15. Many output formats.\n
  16. Many output formats.\n
  17. Many output formats.\n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n