Course: Querying SIB resources with SPARQL

Section outline

General

Querying SIB Swiss Institute of Bioinformatics resources with SPARQL

SWAT4HCLS - Edinburgh (Dec. 2019)

The SIB Swiss Institute of Bioinformatics has been publishing data using Resource Description Framework (RDF) since 2007, with the UniProt knowledgebase as the first SIB resource to provide its data on the semantic web. Since then, more and more SIB resources are modelling their knowledge with RDF and made them queryable and accessible through their own SPARQL endpoints.

In this tutorial, we explain how you can use the data from nine independent SIB resources (GlyConnect, UniProt, Rhea, OrthoDB, OMA, Bgee, HAMAP, MetaNetX and neXtProt) to answer interesting biological questions.

For each resource, we present an introduction about what kind of data is available, followed by how it is modelled and then how you can query it using SPARQL. Then we illustrate the strength of SPARQL 1.1 federated queries to show how the connected SIB databases can answer more than any of our databases could independently.

Domain knowledge wise it covers proteins, glycans, reactions of biological interest, orthology, metabolic networks, chemical mapping, and genome/proteome annotations.

The tutorial starts with a quick introduction to RDF and SPARQL 1.1 in general.

At the end of the course, participants are expected to be able to:

Have a basic understanding on SIB resources
Have some understanding on RDF and SPARQL

Authors

Jerven Bolleman	Introduction to RDF & SPARQL Glyconnect UniProt HAMAP EBI RDF Ensembl (Elixir friend) DisGeNET (Elixir friend)
Dmitry Kuznetsov	OrthoDB
Thierry Lombardot	Rhea IDSM (Elixir friend)
Julien Mariethoz	Glyconnect
*Tarcisio Mendes de Faria*	Bgee OMA browser
Anne Morgat	Rhea IDSM (Elixir friend)
Marco Pagni	MetaNetX
Monique Zahn	neXtProt

Select section Presentations

Collapse Expand
Presentations
- Select activity Introduction: querying SIB Swiss Institute of Bioinformatics resources with SPARQL (slides)
  
  Introduction: querying SIB Swiss Institute of Bioinformatics resources with SPARQL (slides) File
  
  Presentation of the SIB Swiss Institute of Bioinformatics + quick introductions to RDF and SPARQL in general.
- Select activity SPARQLing Rhea (slides)
  
  SPARQLing Rhea (slides) File
  
  Rhea is a comprehensive expert-curated resource of biochemical transformations, transport reactions, and spontaneous reactions of biological interest.
- Select activity SPARQLing Rhea (queries)
  
  SPARQLing Rhea (queries) URL
  
  Accompanying Jupyter Notebook (hands-on introduction to querying metabolism related data across multiple data sources using SPARQL).
- Select activity SPARQLing MetaNetX/MNXref (slides)
  
  SPARQLing MetaNetX/MNXref (slides) File
  
  MetaNetX/MNXref is a resource for systems biology and metabolomics
- Select activity SPARQLing neXtprot (slides)
  
  SPARQLing neXtprot (slides) File
  
  The neXtProt knowledgebase is an integrative resource providing both data on human protein and the tools to explore these.
- Select activity SPARQLing OMA (slides)
  
  SPARQLing OMA (slides) File
  
  In this tutorial, you will learn how to query and retrieve orthology and paralogy information from the OMA database with SPARQL.
- Select activity SPARQLing Bgee (slides)
  
  SPARQLing Bgee (slides) File
  
  In this tutorial, you will learn how to query gene expression patterns from the Bgee database with SPARQL.
- Select activity SPARQLing GlyConnect (slides)
  
  SPARQLing GlyConnect (slides) File
  
  GlyConnect is a platform integrating sources of information to help characterise the molecular components of protein glycosylation.
- Select activity SPARQLing HAMAP (slides)
  
  SPARQLing HAMAP (slides) File
  
  HAMAP is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated, manually created annotation rules that specify annotations that apply to family members. HAMAP is used to annotate protein records in UniProtKB via UniProt's automatic annotation pipeline.
- Select activity SPARQLing UniProt (slides)
  
  SPARQLing UniProt (slides) File
  
  The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation.
  In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
- Select activity SPARQLing OrthoDB (slides)
  
  SPARQLing OrthoDB (slides) File
  
  OrthoDB: The hierarchical catalog of orthologs
  mapping genomics to functional data
- Select activity SPARQLing Elixir friends: IDSM, Ensembl & DisGeNet (slides)
  
  SPARQLing Elixir friends: IDSM, Ensembl & DisGeNet (slides) File
  
  IDSM (Elixir Czech node): Integrated Database of Small Molecules
  Ensembl (EBI RDF platform): Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation
  DisGeNET: genes and variants associated to human diseases

Section outline

General

Querying SIB Swiss Institute of Bioinformatics resources with SPARQL

Presentations