Topic outline

  • General

    Logo Logo SIB Summer School: From Data to Models in Biological Systems

    Kandersteg, 14-19 August 2011


    Waldhotel Doldenhorn, Kandersteg

    Motivation and Format Modern biological science is able to generate enormous quantities of complex data, often through multiple sources. Efficient and accurate analysis of such data can only be accomplished through appropriate computational methods.

    Systems Biology, with its aim of integrating complex measurements to understand the functioning of biological systems, is expected to make heavy use of these data, and, in particular, of their analysis within network paradigms. Working together, bioinformaticians and systems biologists can develop advanced approaches to extract knowledge from the data, and thereby drive the creation of hypotheses and models that have the potential to open new avenues of understanding for biology at the systems level. and SIB have joined forces to organise a PhD Summer School entitled, “From Data to Models in Biological Systems". The objective of this course is to teach PhD students how to integrate, manage and analyse experimental data using advanced bioinformatic tools. Topics covered include the visualization of complex biological interactions and the application of modeling to predict network processes.

    • Sunday Aug 14th

      Arrival, Welcome and Dinner: 19:00-21:00

      • Monday Aug 15th

        Morning session
        : Sven Bergman, (University of Lausanne and SIB)
        Title: Integrative analysis of large-scale data
        Abstract: High-throughput technologies like microarrays, next-generation sequencing or mass-spec allow for measuring huge sets of observables for very large collections of samples. The processing of these data in order to extract biological insights is very challenging. One important aspect is the reduction of complexity by grouping similar features or samples together. Specifically, modules contain subsets of samples that exhibit a coherent pattern over some of the measured features. I will present a computational tool, the Iterative Signature Algorithm that enables the efficient extraction of such modules from large datasets. Existing structured information on the features, like those available in ontologies, can be used to annotate modules. Yet, the modular approach can also be used to co-analyze several sets of data. For example, we developed the so-called Ping-pong Algorithm to identify co-modules from two large datasets. We applied this tool for the integrative analysis of gene expression and drug response data from the NCI60 sample collection. This approach allows for predicting gene-drug interactions using only high-throughput data with a significant increase of true positives. Modular analysis is also useful in the context of genome-wide association studies (GWAS) that aim at linking molecular phenotypes (like gene expression or mass-spec data) with genotypes. We will present recent research in this field that is aimed towards an integrative analysis of organismal and molecular phenotypes with very large collections of genotypic markers.

        Slides of the lecture.

        Afternoon session
        Speakers: Rico Rueedi, Tanguy Corre, Andrea Prunnotto, Barbara Piasecka (University of Lausanne and SIB)
        The goal of the ISA afternoon workshop is to provide you with a hands-on experience of how to perform a modular analysis. To this end we ask participants to install: R and our ISA Bioconductor package, eisa.

        Software downloads necessary prior to arrival:
        R and our ISA Bioconductor package
        Usually, eisa can be installed by issuing the commands (within R):
        > source("")
        > biocLite("eisa","ALL")
        although, depending on the platform, preliminary steps may be required.

        In case the basic installation command does not work, several packages need to be "manually"
        installed before. The command:


        would install the required packages if needed.

        Then the basic command :

        While we will provide some toy data, we also highly encourage interested participants to bring their own data set for analysis. Although the ISA was originally developed for the analysis of gene expression data, it can be applied to any table of two-dimensional large-scale numerical data. By "large-scale" we mean that the smaller dimension should be at least around 50, while the larger dimension can be up to 100,000 (although that may imply a relatively long computation time, so for practical purposes one should focus first on a smaller subset). For participants with their own data set, we can also provide an ISA package for Matlab, if preferred.

        Recommended introductory preparation:
        - Bergmann S. et al., Phys. Rev. E 67, 031902
        - Gábor Csárdi, Zoltán Kutalik, Sven Bergmann Bioinformatics: 2010, 26(10);1376-7
        - Andreas Lüscher, Gábor Csárdi, Aitana Morton de Lachapelle, Zoltán Kutalik, Bastian Peter, Sven Bergmann, Bioinformatics: 2010, 26(16);2062-3
        - The documentation of the R-implementation of ISA, on which the workshop will be loosely based

        Slides of the tutorial.
        Exercises of the tutorial.

        17:30-18:00: Break
        18:00-19:00: Pre-session with Nicolas Le Novere (reminder on elementary chemical kinetics and phase diagram)
        19:00-20:30: Dinner
        20:30- : Free time

        • Tuesday Aug 16th

          Morning session: 9:00- 12:30 (coffee break 10:30)
          Speaker: Nicolas Le Novère (EMBL-EBI)
          Title: Standard formats to exchange and represent models of biological processes
          Abstract: Over the last two decades, the number of computational models of biological processes has exploded. Furthermore, those models have become larger and more complex. To fully benefit from that evolution, systems biologists must be able to re-use models. This requires sharing, but also being able to interpret them properly. Community standard formats have played a significant role in that respect. We will present in details the Systems Biology Markup Language (SBML), a format now supported by more than 200 software applications, and the Systems Biology Graphical Notation (SBGN). We will also briefly introduce other fledging efforts complementing them in order to cover the whole modeling life-cycle. The presentation will be completed by a series of demos and tutorials. The students will learn how write simple models in SBML and represent them in SBGN. Using COPASI and CellDesigner, we will cover model design, simulation and analysis. The students will then retrieve and reuse more complex models from BioModels Database.

          Recommended introductory preparation and course material
          One link to all

          Software downloads necessary prior to arrival:
          - COPASI
          - Cell Designer

          12:30-14:00 Lunch

          Afternoon session: 14:00- 17:30 (coffee break 15:30)
          Speaker: Seàn O'Donoghue (EMBL-Heidelberg)
          Title: Visualisation of omics data

          19:00-20:30 Dinner
          20:30-22:00 Student Presentations

          • Wednesday Aug 17th

            Social Outing 9:00-17:00

            18:30-20:00 Dinner

            20:00-21:00: Pre-session tutorial: Dagmar Iber.
            Abstract: In this lecture we will provide a brief introduction to the modelling of biological signaling and the analysis of non-linear systems.

            21:00-22:00 Pre-session tutorial: Bas Teusink

            Recommended material for the pre-session:
            - Dagmar Iber's book chapter

            • Thursday Aug 18th

              Morning session: 9:00- 12:30 (coffee break 10:30)
              Speaker: Dagmar Iber (ETH Zurich and SIB)
              Title: Spatio-temporal models of biological signaling
              Abstract: This lecture will focus on spatio-temporal models of biological signaling interactions. We will cover the modelling of single and interacting morphogen gradients and introduce Travelling Waves and Turing pattern. In the second part we will cover how gradients can be read out and used in the patterning of tissues and organs.

              Recommended introductory preparation:
              - Reaction-diffusion model as a framework for understanding biological pattern formation. Kondo S, Miura T,. Science. 2010 Sep 24;329(5999):1616-20.

              Lecture notes.
              Exercises_I - Introduction to modelling
              Exercises II - Spatio temporal modelling

              Additional Material
              Matlab tutorial
              Comsol tutorial

              Afternoon session 14:00- 17:30 (coffee break 15:30)
              Speaker: Bas Teusink (VU University of Amsterdam),
              Title: Genome-scale metabolic models: uses and limitations
              Abstract: The post-genomics revolution has confronted mainstream biologists with the need of models for data integration, analysis, and - ultimately - understanding of the complexity of biological systems. Hence, if we want to make optimal use of functional genomics data, we need models of genome scale. Such genome-scale metabolic models are based on bioinformatics, comparison with other genome-scale models, literature, and experimental evidence for the activity of specific pathways. In this course we will first discuss the metabolic reconstruction process, i.e. issues related to the construction of a genome-scale model. Then I will explain and discuss several so-called constraint-based modelling techniques applied to such models. These modelling techniques all aim to predict or interpret flux distributions through the metabolic network. The most important ones will be discussed.

              Recommended introductory preparation: Bas Teusink's book chapter


              Software downloads necessary prior to arrival: any internet browser (except Internet Explorer)

              19:00 Dinner

              20:30-21:30: Pre-session tutorial: Daryl Shanley. The evening session will introduce the biology of ageing with an emphasis on what has been achieved and what we hope to achieve by adopting a systems approach. The morning lecture will provide a structured overview illustrated with examples of how knowledge and data have been integrated to generate testable hypotheses and in some cases how this has been followed up with detailed dynamic modelling (e.g. Passos 2010). The hands on session will use: 1) cytoscape to explore data integration and; 2) CellDesigner and Copasi to build, calibrate and test a dynamic model.
              21:30-22:30 Pre-session tutorial: Michael Stadler. Checking of proper software setup for the Friday afternoon tutorial on epigenomics

              • Friday Aug 19th

                Morning session: 9:00- 12:30 (coffee break 10:30)
                Speaker: Daryl Shanley (University of Newcastle)
                Title: Systems Biology of Aging
                Abstract: The process of ageing can be understood as the accumulation of unrepaired damage to molecules, cells and tissues. There are many unresolved questions in ageing: for example it is well known that a drastic reduction in food intake extends lifespan in many organisms but we do not know the underlying mechanisms; there are many single gene mutations known to extend lifespan but again we lack knowledge on how. All organisms face challenges from the environment, and have evolved sophisticated systems of protection against sources of damage, e.g. the antioxidant system, and in the event of damage occurring systems of repair or removal. Ageing then involves multiple biochemical and cellular mechanisms affecting multiple tissues and adopting an integrative systems approach is essential to reach a full understanding (Kirkwood 2011).

                The evening session will introduce the biology of ageing with an emphasis on what has been achieved and what we hope to achieve by adopting a systems approach. The morning lecture will provide a structured overview illustrated with examples of how knowledge and data have been integrated to generate testable hypotheses and in some cases how this has been followed up with detailed dynamic modelling (e.g. Passos 2010). The hands on session will use: 1) cytoscape to explore data integration and; 2) CellDesigner and Copasi to build, calibrate and test a dynamic model.

                Recommended introductory preparation:
                - Kirkwood TBL (2011) Systems biology of ageing and longevity. Phil. Trans. R. Soc. B 366:64-70
                - Passos JF et al. (2010) Feedback between p21 and reactive oxygen production is necessary for cell senescence. Mol. Sys. Biol. 6:347

                Software downloads necessary prior to arrival:
                - Cytoscape
                - COPASI
                - Cell Designer

                12:30-14:00 Lunch

                Afternoon session: 14:00- 17:30(coffee break 15:30)
                Speakers: Michael Stadler (FMI, University of Basel)
                Title: Epigenomics and Cell Differentiation; Data Analysis and modeling options
                Abstract: Epigenomics studies the genome-wide distribution of stable, yet reprogrammable nuclear changes that control gene expression, such as histone modifications and DNA methylation. Epigenetic control helps to maintain a pluripotent state of a stem cell, or restrict its developmental potential in the course of cellular differentiation. The goal of the Cell Plasticity project from is to decode the regulation of stem cell status and cell differentiation in different experimental systems (hematopoiesis, neurogenesis, the epithelial-mesenchymal transition, and oncogenic transformation in AML). Ultimately, by combining experimental and computational approaches, we aim to generate predictive models for the epigenetic contribution to the state of pluripotency and to cell differentiation.
                In order to measure transcription, histone modifications and DNA methylation at genomic scale, we make use of next generation sequencing: Millions of short sequence reads are obtained from isolated RNA or enriched DNA samples and used to quantify epigenetic states and transcriptional activities. While the recent technical advances in next generation sequencing has greatly improved our ability to perform epigenomics experiments, the large amounts of data generated by these experiments requires new and interdisciplinary approaches to understand and interpret this new kind of data.

                Recommended introductory preparation:
                Reviews on epigenomics and cell differentiation:
                1. Ng HH, Surani MA. The transcriptional and signalling networks of pluripotency. Nature Cell Biology 2011, 13:490-496.
                2. Bernstein BE, Meissner A, Lander ES: The mammalian epigenome. Cell 2007, 128:669-681.
                3. Mohn F, Schubeler D: Genetics and epigenetics: stability and plasticity during cellular differentiation. Trends Genet 2009, 25:129-136

                Software downloads necessary prior to arrival: R and a few Bioconductor packages. To download and install R, follow instructions from To download and install the Bioconductor packages, make sure your computer is connected to the internet, start R and type these commands into the R console:
                biocLite(c("Biobase", "Biostrings", "IRanges", "GenomicRanges", "Rsamtools"))
                Proper setup of the software will be checked during the pre-session tutorial on Thursday evening.

                Familiarity with R is not absolutely required but highly beneficial for the tutorial. Many tutorials and introductory courses are available from the internet, e.g. at, or at