Information Retrieval and Text Mining for Biology
Geneva, 3-5 June 2015
In the life sciences, communication and dissemination of scientific information is still largely provided by texts, notably through research articles. The proposed activity aims to provide participants with a state of the art information retrieval (identify relevant documents in a collection) and text mining (identify and extract relevant information in a document ) applied to biology.
We will give an overview of successes and evolution since the 2000s (tasks have been automated with a level of greater than or equal performance to man), and current challenges, but also the most promising recent trends. Interactive demonstrations of existing tools, maintained and used by SIB or other groups will provide a practical aspect to the state of the art.
- understand the methods and the pipeline supporting most text mining services
- evaluate different text analytics tasks such with the appropriate metrics
- use some text mining services to support biocuration
Skill requirements: none
Material requirements: will be defined later if any
Application is now open. Please register from the CUSO/Staromics website
Deadline for registration and cancellation is set to the 29 May 2015.
Geneva, CMU, auditorium S1-S2
9:00 to 12:00 – Julien Gobeill (SIB Swiss Institure of Bioinformatics - HES-SO University of Applied Sciences)
Focus: Introduction, and how to evaluate Information Retrieval and Text Mining.
14:00 to 17:00 – Patrick Ruch (SIB Swiss Institure of Bioinformatics - HES-SO University of Applied Sciences)
Focus: Text Mining and Information Retrieval Applications
9:00 to 9:15 – Ioannis Xenarios (SIB)
9:15 to 10:15 – Cecilia Arighi, Research Associate Professor at Department of Computer and Information Sciences, University of Delaware
Focus of the speech: BioCreative challenge evaluations.
10:45 to 12:00 – Donat Agosti, President of Plazi, Bern
Focus of the speech: Plazi, a persistent and openly accessible digital taxonomic literature.
12:00 to 14:00 – Lunch break
14:00 to 15:00 – Thomas Lemberger, European Molecular Biology Organization (EMBO)
Focus of the speech: SourceData: towards an integration of biocuration in publishing.
15:00 to 15:30 – Coffee Break
15:30 to 17:00 – Round table
9:00 to 10:15 – Johanna McEntyre, European Bioinformatics Institute (EMBL-EBI), Cambridge
Title: A tour of the Europe PMC full text database and related text-based tools at the EMBL-EBI.
10:15 to 10:45 – Coffee break
10:45 to 11:45 – Thérèse Vachon, Novartis Institutes for BioMedical Research, Basel
11:45 to 13:15 Lunch break
13:15 to 14:15 – Lynette Hirschman, Chief Scientist for Biomedical Informatics, MITRE Corporation, Bedford
Title: The Cost of Curation
Focus: There is an urgent need to provide scalable, timely, affordable curation of the biomedical literature; current curation pipelines lag publication time and handle only a portion of published material. This talk will explore topics related to the complex balance between increasing curation efficiency and throughput on the one hand, and maintaining quality on the other. The talk will touch on three related topics: 1) cost-quality trade-offs for biomedical curation – what we know and what we need to know; 2) recent research on a novel hybrid curation approach that combines automated extraction of biomedical entities with crowdsourcing to provide scalable, cost effective curation; and 3) opportunities to integrate new tools and approaches into the curation workflow to increase efficiency while maintaining quality.
14:15 to 15:15 – Zhiyong Lu, NCBI, NLM, NIH, Washington DC
Title: NCBI/NLM text mining tools for literature search, analysis, and curation: The case of PubMed and PMC.
15:15 to 15:30 – Coffee break
15:30 to 16:45 – Pannel discussion
16:45 to 17:00 – Wrap up