Practical 3 ENSEMBL/BIOMART
Use ENSEMBL BioMart to obtain various informations
a) Look for all genes involved in morbid diseases on Human chromosome Xq28
Go to BioMart by clicking on BioMart link at top of ENSEMBL home page.
Start a new session if you have already an ongoing search.
Then select the database (ENSEMBL Genes 76), the dataset (Human GRCh38).
Filter for the region (Xq28) and limit to genes with MIM disease IDs.
Count the matches (click "Count" button).
Select Attributes ENSEMBL Gene ID; Associated Gene Name; MIM Morbid Accession; MIM Morbid Description
Click "Results" button
See if you find the gene involved in Hemophilia.
b) You have a list of Danio rerio (Zebrafish) genes and you want to get the 1000bp 5' upstream regions of mouse orthologous genes
Gene list:
ENSDARG00000019601
ENSDARG00000009014
ENSDARG00000087508
ENSDARG00000036036
ENSDARG00000089441
ENSDARG00000093052
ENSDARG00000091211
ENSDARG00000016771
ENSDARG00000078322
ENSDARG00000045408
ENSDARG00000088116
ENSDARG00000002758
ENSDARG00000037116
ENSDARG00000013430
1) Start a new session if you have already an ongoing search.
Then select the database (ENSEMBL Genes 76), the dataset (Danio rerio genes (Zv9)).
Filter for the ENSEMBL Gene IDs and limit to genes in the list above.
Count the matches (click "Count" button).
Select Attributes Ensembl Gene ID; Mouse Ensembl Gene ID; Orthology confidence [0 low, 1 high]; Associated Gene Name; Description
Click "Results" button
You can see that your list is not perfect, many genes have multiple hits, and some genes have no hits. This is normal.
You can add an additional attribute in order to help you choose: % Identity with respect to query gene
Then take the mouse ENSEMBL gene IDs you identified and do a new BioMart query:
2) Start a new session if you have already an ongoing search.
Then select the database (ENSEMBL Genes 76), the dataset (Mus musculus genes (GRCm38.p2)).
Filter for the ENSEMBL Gene IDs and limit to genes in the list above.
Count the matches (click "Count" button).
Select Attributes Ensembl Gene ID; Flank (Gene); Upstream flank [1000]
Click "Results" button
Export your FASTA file!
Of course this 2 step procedure is a bit boring, you could automatize the process by getting the perl scripts of both queries (click on the "Perl" button) and connecting them (this requires programming knowledge).