Section outline

  • 9h to 17h - Patrick Ruch and Julien Gobeill (SIB and Unige) - Machine learning for text mining and data curation

    We are going to introduce the main types of text mining applications: ad hoc retrieval, automatic text classification, information extraction... and how they can be combined and assessed to built text mining pipelines. Some of the core normalization layers and data/software resources (terminologies, stemming, feed-back...) supporting all these tasks will be introduced.

    The practical session will be based on the last gene ontology task at BioCreative IV challenge (2013). We will see two subtasks: (1) filtering full-text articles in order to predict relevant passages for curation (2) predicting functional annotations from the selected sentences. For both subtasks we will have a look to the data, then select and implement the best learning algorithm, and finally see how to evaluate our results.

    Reading material: Overview of the gene ontology task at BioCreative IV.