This three-day course will provide an overview of the RNA-seq analysis pipeline, as well as the downstream analysis of the resulting data using bioconductor packages in R. The course will cover the following topics:

  • The structure of an RNAseq analysis pipeline:
    • Raw data quality check;
    • RNAseq reads alignment;
    • Gene Expression level quantification and normalization by reads counting;
    • De novo Transcripts reconstruction and differential splicing.
  • Overview of downstream analysis
    • Differential Expression analysis with R/Bioconductor packages;
    • Class discovery: usage of Principal Component Analysis, Clustering, Heatmaps, Gene Set Enrichment Analysis in RNA-seq analysis.

Next Generation Sequencing (NGS) techniques will not be covered in this course; experimental design as well as the statistical methods will not be detailed in this course.


We currently live in an era where most computers possess multiple computing units, and where parallelization is key. In particular, GPGPUs (General Purpose Graphical Processing Units) are built for massive parallelism and they have recently risen to prominence as they are now used for many scientific tasks, such as physics or biological simulations, statistical inference or machine learning.

In this crash course we will focus on CUDA as well as several CUDA-based API, including openMP GPU offloading and python APIs. Through concrete examples we will describe the principles at the core of a successful parallelization attempt.

Have you ever been stuck with a file format that doesn't precisely conform to your needs, found yourself doing annoyingly repetitive data manipulations, or struggled to efficiently manage and explore your data? Python to the rescue!

Python is an open-source and general-purpose scripting language which runs on all major operating systems. It was designed to be easily read and written with comparatively simple syntax, and is thus a good choice for beginners in programming. Python is applied in many disciplines and is one of the most common languages for bioinformatics. The Python community enthusiastically maintains a rich collection of libraries/modules for everything from web development to machine learning. Other programming languages such as R have comparable functionality to Python, however some tasks are more natural (and easier!) in Python.

This 3-days course is addressed to beginners who want to become familiar with writing Python code to accomplish common tasks such as automated data parsing, basic statistical operations and graphical representations.


With a constant evolution of technologies, laboratory biologists are faced with an increasing need of bioinformatics skills to deal with high-throughput data storage, retrieval and analysis.

Although several resources developped for such tasks have a web interface (most of the time, the first choice of biologists), many operations can be more efficiently handled with command lines (CLI).

During the first part of this workshop, researchers and professionals involved in Big Data management at VitalIT/SIB as well as in Data Management Plan preparation at UNIL/CHUV will teach you best practices in data management and how to collect, describe, store, secure and archive research data. You will be introduced to the need for a Data Management Plan (DMP) preparation, an evolving document reporting how the research data will be managed during and after a research project.

This "First Steps with R" course is addressed to beginners wanting to become familiar with the R environment and master the most common commands to be able to start exploring their own datasets.

Experiments designed to quantify gene expression often yield hundreds of genes that show statistically significant differences between two classes (two biological states, two phenotype states, two experimental conditions, etc). Once differentially expressed genes are identified, enrichment analysis (EA) methods can be conducted to identify groups of genes (e.g. particular pathways) that are differentially expressed, and offer insights into biological mechanisms. One example of such a method is the Gene Set Enrichment Analysis (GSEA), which is very popular and frequently used for high-throughput gene expression data analysis.

This course will cover GSEA and alternative enrichment tools. Since most of their implementations are directly linked to databases that annotate the function of genes in the cell, the course will also introduce GO enrichment analysis.