This document provides an overview of a tutorial on building reproducible network data visualization workflows using Cytoscape and IPython Notebook. The tutorial will cover integrating data, analyzing networks, visualizing results, and preparing outputs for publication. It will demonstrate setting up a portable data analysis environment using Docker and sharing work through GitHub. The bulk of the tutorial will focus on using IPython Notebook as an electronic lab notebook for interactive and reproducible experiments with Cytoscape.
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytoscape and IPython Notebook
1. SDCSB Advanced
Cytoscape Tutorial
4/17/2015 @Sanford
Keiichiro Ono
UCSD Trey Ideker Lab
Cytoscape Core Team
Building Reproducible Network
Data Visualization Workflows with
Cytoscape and IPython Notebook
5. Keiichiro Ono
Background
Bioinformatics
Computer Science
Work
Research
Bioinformatics workflow
Visualization pipeline
Data
Visualization
Networks
Other Biological Data
Integration
Molecular Interactions
Pathways
Annotations
Software Development
Cytoscape
NeXO
Cyberinfrastructure
All kinds of small tools
Like
Art
Kandinsky
Mondrian
Music
Electronica
Techno
Minimal
Detroit
Jazz
Sci-fi
Movie
Novel
Life
US
San Diego
San Francisco Bay Area
Los Angeles
Orange County
Japan
Gifu
Tokyo
17. Problems in Bioinformatics
- No more free lunch
- Even if you buy expensive machines, you cannot get free performance gain
anymore. You have to design your code for massively distributed
environment. (From Scale-up to Scale-out)
- Complex Data Analysis Pipeline
- Need to build pipeline by connecting multiple resources, or services
- Needs for complex, customized data visualization
- Reproducibility
➡ But building, deploying, and maintaining reproducible pipeline is not
straight-forward
21. REST
- Docker
- Data analysis environment in a portable
container
- GitHub
- For source code sharing
- IPython Notebook
- Your electronic lab notebook
- cyREST
- RESTful API module for Cytoscape
24. - Full-stack
- Data preparation to web application
- Easy to learn
- Strong support from data science community
- Tons of high-performance libraries
25. A community for developers and users of Python data tools
pydata.org
45. Git/GitHub For Sharing Code/Notebooks
- Git - Distributed Source Code Management
System
- GitHub - (Public) Remote repository + great user
interface for working with OSS code
46. - Create a new repository from existing one
- Complete copy of the original + your full access
- Pull Request
Forking
52. Bare Metal Machine
OS (Linux)
Docker
Frameworks
Application
Frameworks
Application
Frameworks
Application
Frameworks
Application
Frameworks
Application
53.
54. What is Docker?
- Container to run applications in an isolated
environment
- Application = Layer of images
- Sharable Environments
- Environments as code
55. Docker Hub
- Sharing environments as code!
- Dockerfile - Definition of your container
- “GitHub of Images”
62. docker run -d -v $PWD:/notebooks
-p 80:8888 -e "PASSWORD=yourpass"
-e "USE_HTTP=1" idekerlab/
vizbi-2015
Actual Command to Run the Image (one-line)
63. ~/g/sdcsb-advanced-tutorial git:master ›❯›❯›❯ docker run -d -v $PWD:/notebooks -
p 80:8888 -e "PASSWORD=sdcsb" -e "USE_HTTP=1" idekerlab/vizbi-2015
Unable to find image 'idekerlab/vizbi-2015:latest' locally
Pulling repository idekerlab/vizbi-2015
7dfae1b52000: Pulling dependent layers
511136ea3c5a: Download complete
f3c84ac3a053: Download complete
a1a958a24818: Download complete
9fec74352904: Download complete
d0955f21bf24: Download complete
4f527ba3fd02: Download complete
ac7605e8bbf0: Download complete
8e8747f25e33: Download complete
.
.
.
This takes a very long for the first time…
64. ~/g/sdcsb-advanced-tutorial git:master ›❯›❯›❯ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
fa3a9466a261 idekerlab/vizbi-2015:latest "/notebook.sh" 3 minutes ago
Up 3 minutes 0.0.0.0:80->8888/tcp sad_wright
Check Status