Topic outline

  • General

    Information Retrieval and Text Mining for Biology

    Geneva, 3-5 June 2015



    In the life sciences, communication and dissemination of scientific information is still largely provided by texts, notably through research articles. The proposed activity aims to provide participants with a state of the art information retrieval (identify relevant documents in a collection) and text mining (identify and extract relevant information in a document ) applied to biology.
    We will give an overview of successes and evolution since the 2000s (tasks have been automated with a level of greater than or equal performance to man), and current challenges, but also the most promising recent trends. Interactive demonstrations of existing tools, maintained and used by SIB or other groups will provide a practical aspect to the state of the art.


    - understand the methods and the pipeline supporting most text mining services

    - evaluate different text analytics tasks such with the appropriate metrics

    - use some text mining services to support biocuration


    Skill requirements: none
    Material requirements: will be defined later if any


    Application is now open.  Please register from the CUSO/Staromics website

    Deadline for registration and cancellation is set to the 29 May 2015.


    Geneva, CMU, auditorium S1-S2

    Additional information

    For administrative questions, please contact
    For technical and scientific questions, please contact

  • June 3 – Training

    9:00 to 12:00 – Julien Gobeill (SIB Swiss Institure of Bioinformatics - HES-SO University of Applied Sciences)

    Focus: Introduction, and how to evaluate Information Retrieval and Text Mining. 

    14:00 to 17:00 – Patrick Ruch (SIB Swiss Institure of Bioinformatics - HES-SO University of Applied Sciences)

    Focus: Text Mining and Information Retrieval Applications

  • June 4 – Mini symposium day 1

    9:00 to 9:15 – Ioannis Xenarios (SIB)


    9:15 to 10:15 – Cecilia Arighi, Research Associate Professor at Department of Computer and Information Sciences, University of Delaware

    Focus of the speech: BioCreative challenge evaluations.

    10:45 to 12:00 – Donat Agosti, President of Plazi, Bern

    Focus of the speech: Plazi, a persistent and openly accessible digital taxonomic literature.

    12:00 to 14:00 – Lunch break

    14:00 to 15:00 – Thomas Lemberger, European Molecular Biology Organization (EMBO)

    Focus of the speech: SourceData: towards an integration of biocuration in publishing.

    15:00 to 15:30 – Coffee Break

    15:30 to 17:00 – Round table

  • June 5 – Mini symposium day 2

    9:00 to 10:15 – Johanna McEntyre, European Bioinformatics Institute (EMBL-EBI), Cambridge

    Title: A tour of the Europe PMC full text database and related text-based tools at the EMBL-EBI.

    10:15 to 10:45 – Coffee break

    10:45 to 11:45 – Thérèse Vachon, Novartis Institutes for BioMedical Research, Basel 

    11:45 to 13:15 Lunch break

    13:15 to 14:15 – Lynette Hirschman, Chief Scientist for Biomedical Informatics, MITRE Corporation, Bedford

    Title: The Cost of Curation

    Focus: There is an urgent need to provide scalable, timely, affordable curation of the biomedical literature; current curation pipelines lag publication time and handle only a portion of published material.  This talk will explore topics related to the complex balance between increasing curation efficiency and throughput on the one hand, and maintaining quality on the other. The talk will touch on three related topics: 1) cost-quality trade-offs for biomedical curation – what we know and what we need to know; 2) recent research on a novel hybrid curation approach that combines automated extraction of biomedical entities with crowdsourcing to provide scalable, cost effective curation; and 3) opportunities to integrate new tools and approaches into the curation workflow to increase efficiency while maintaining quality.

    14:15 to 15:15 – Zhiyong Lu, NCBI, NLM, NIH, Washington DC

    Title: NCBI/NLM text mining tools for literature search, analysis, and curation: The case of PubMed and PMC.

    15:15 to 15:30 – Coffee break

    15:30 to 16:45 – Pannel discussion

    16:45 to 17:00 – Wrap up