Topic 3
Section outline
-
Tuesday 24 June
Big data efforts applied to disease understanding, diagnostics and treatments
9:00 - 10:30 Prof. Norbert Graf (Universität des Saarlandes, Germany) - Information technology and personalized Medicine. A clinical perspective
Abstract: Medicine is undergoing a paradigm shift, which gradually transforms the nature of healthcare from reactive to preventive. The changes are catalyzed by a new approach to disease that has triggered the emergence of personalized medicine focusing on integrated diagnosis, treatment and prevention of disease in individual patients. The pre-‐requisites for this are the convergence of systems approaches to disease, new measurement, modeling and visualization technologies, and new computational and mathematical tools (http://www.cra.org/ccc/initiatives).
While the goal is clear, the path to it has been fraught with roadblocks in terms of technical, scientific, and sociological challenges. The first step to facilitate the gradual translation from current medical practices to personalized medicine is to bring together internationally recognised leaders in their fields to create an innovative computational, service-‐oriented IT infrastructure. The emphasis must be to provide an open, modular architectural framework for tools, models and services:
- to share and handle efficiently the enormous personalized data sets
- coming from clinical trials and
- hospital information systems (HIS)
- to ensure that policies for privacy, non-‐discrimination, and access to data, services, tools and models are implemented to maximize data protection and data security
- to enable demanding Virtual Physiological Human (VPH) simulations
- for which standardization and semantic data interoperability is a major issue
- to integrate models from system biology with VPH models
- to build and standardize tools and models
- for explicit reuse of tools and services
- to guarantee that tools, services and models are clinically driven and do enhance decision support
- to provide tools for large-‐scale, privacy-‐preserving data mining, and literature mining
- to enhance patient empowerment
The design and development of such a modular architectural framework is technologically challenging. In addition all tools, models and services need to be evaluated and validated by end-‐users. Usability of these tools is a major issue and is essential for starting a certification process. Feedback loops to developers for continuous improvements have to be integrated. Such an innovative architecture should promote the principle of open source. All tools, models and services have to be tested in concrete advanced clinical research projects and clinical trials that target urgent topics of the medical research community, a key area of societal importance. Maintenance and further developments of the framework need to be addressed from the beginning. To sustain such a self-‐supporting infrastructure realistic use cases have to offer tangible results for end-‐users in their daily practice. Teaching and educational programs for end-‐ users have to be implemented to facilitate the access to the platform and the use of tools, models and services.
10:30 - 11:00 Coffee break11:00 - 12:30 Prof. Norbert Graf - Demonstration of some tools and discussions
12:30 – 14:00 Lunch
14:00 – 17:30 Timothy W. Clark (Mass General Institute for Neurodegenerative Disease, USA) - Next-generation scientific publishing and scientific reproducibility
Lecture: This talk will review a series of problems in scientific communication traceable to the incomplete transition of printed material to the Web as well as relentless working of Moore's Law on scientific instruments. It will analyze various critiques and proposals for implementing the "next generation" in scientific publishing. This topic has exceptional importance for bioinformaticians because it promises / threatens to provide them with a potentially huge volume of data for meta-analysis. Likewise it has implications for industrial drug discovery and translational research.
14:50 Coffee break
15:00 Break into groups and discuss the following papers, which should have been read prior to arrival; prepare to present on them to the wider group.
Group A
- Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research. Nature 2012, 483(7391):531-533.
- Arrowsmith J (2011) Trial watch: Phase II failures: 2008–2010. Nat Rev Drug Discov 10(5):328-329.
- Vasilevsky NA, et al. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148.
- Helsby MA, Fenn JR and Chalmers AD (2013) Reporting research antibody use: how to increase experimental reproducibility F1000Research 2013, 2:153 (doi: 10.12688/f1000research.2-153.v2)
Questions: What do the first two papers suggest are probable causes of reproducibility failures? How do the second two papers address them? Are the solutions suggested adequate? Are there any practices of the pharmaceutical companies themselves that can lead to reproducibility failures?
Group B:
- Joint Declaration of Data Citation Principles, 2014: http://force11.org/datacitation
- Perkins, Lee & Tanentzapf, The systematic identification of cytoskeletal genes required for Drosophila melanogaster muscle maintenance. Nature Scientific Data (2014) doi:10.1038/sdata.2014.2
- Hu Y and Bajorath J (2012) Freely available compound data sets and software tools for chemoinformatics and computational medicinal chemistry applications F1000Research 2012, 1 10.12688/f1000research.1-11.v1
- Hiltemann S, et al. (2014) CGtag: complete genomics toolkit and annotation in a cloud-based Galaxy. GigaScience 3(1):1.
Questions: Briefly explain potential contribution of data citation to improve reproducibility. Are there other potential advantages? Review the three journal articles and comment on them in terms of the data citation principles.
15:50 Break
16:00 Groups A, B each have 20 minutes to present their conclusions + ten minutes each discussion time.
17:00 Concluding discussion: compare and contrast: Nature Scientific Data vs. F1000 Research vs. PeerJ vs. GigaScience as next-generation publishing efforts.
19:30 – 21:30 Dinner
- to share and handle efficiently the enormous personalized data sets