Exercise 2 - Task 1
Working with Ensembl
2.1 Exploring features related to a gene
- Find the gene report for the human Pyruvate Carboxylase gene.
Hint: use the search/find boxes to do a text search.
- What is the official HUGO Gene Nomenclature Committee (HGNC) gene symbol?
- Are there HGNC synonyms?
- How many transcripts are predicted for this gene?
- How many transcripts are protein coding for this gene and have an Consensus CDS (CCDS) entry?
- Find the transcript with matched annotation between NCBI and Ensembl (MANE)
Note down the mRNA RefSeq ID
Explain what MANE is.
What other flags are set for this transcript? (Give a short description)
- Find the UniProt ID for this transcript
-
What is the size of this mRNA
How many exons does it have?
How many coding exons does it have?
Calculate the average length of the coding (!) exons!
- Find the length of the longest and shortest coding (!) exon?
- What are the gene's associated molecular functions?
- On what evidence are these functions based on?
Hint: look for Gene ontology (GO) on the left side
- In which cellular components can the enzyme be found?
- On what evidence are these locations based on?
Hint: look for Gene ontology (GO) on the left side
- Explore the “Gene ontology” biological process terms in Ensembl.
Find the GO ID corresponding to "Gluconeogenesis" and use AMIGO to find out,
how many unique gene products are involved in gluconeogenesis in human, contributed by UniProt having experimental evidence?
List them together with their direct GO class annotation.
Hint: Use filters in AMIGO on the left side and set the filters as needed
-
Visualize the 3D structure for PC predicted by AlphaFold
A mutation of a single residue in the PT domain, F1077A, can disrupt the tetramer formation and inactivate the enzyme.
Mark the Exon in the structure which carries the F1077A mutant.
Hint: Go to the "Transcript" tab and look for protein information in the left menue
- On which chromosomal band is the Pyruvate Carboxylase gene located?
On which contigs is it positioned in the genomic sequence assembly?
Name the contigs, how long in base paris are they?
- Explain the term Contig (consult the glossary from the Help (?) link).
- Why are some genes above and some below the DNA contigs? (Think about DNA organisation!)
Which is the coding strand for the Pyruvate Carboxylase gene?
- Are there orthologues in Zebrafish?
If so, what are their gene symbols, where are they located in the Zebrafish genome?
Name and explain the type of the orthology. How can this type arise (Hint: use tooltip)?
Show a gene tree image of the orthologues you identified.
-
What do the Target %id and Query %id numbers indicate (listed in the Orthologues view)?
Hint: Consult the Help (?) link, or use mouse over.
- Perform a "region comparison" between Homo sapiens and Mus musculus (CL57BL6) at the Pyruvate Carboxylase gene.
Are there other othologue genes in the region +/- 20Kb? Name them!
Hint: Configure the image to join ortholog genes (Comparative features -> Join genes) and change the coordinates for the Location by subtracting 20kb from the start position and adding 20kb to the end position.
the blue lines connect the ortholog genes.
-
Show a graphical overview of the Synteny. What is the size of the entire syntenic block that carries the Pyruvate Carboxylase gene region?
Which is the next upstream and next downstream human gene that has no homologue genes in Mus musculus?
Hint: In the Synteny view, click on "Centre on gene PC" to get list of genes
2.2 Retrieve all unique germline SNPs (variations) from dbSNP that have a stop gained as variant consequence and have pathogenic clinical significance for the PC gene
These SNPs should have the following properties
-
Presence in the germline
-
Stop gained in the coding sequence
-
Clinical significance: pathogenic
The following information should be extracted
-
Ensembl Gene ID
-
Chromosome location (bp position) of gene
-
Ensembl Transcript ID
-
Variant Name
-
Variant Source
-
Chromosome location (bp position) of variant
-
Transcript location (bp position) of variant
-
Protein location (aa position) of variant
-
Variant Consequence
-
Consequence specific allele
-
Protein allele
-
Clinical significance
-
1. Go to http://www.ensembl.org/ and select the “BioMart” link.
-
2. Choose the species and the focus for your query on the START page:
-
Select “Ensembl Genes 106” as we are looking for a gene list (NOT a SNP list!)
-
Select “Homo sapiens genes”
-
3. Click count. It indicates the total number of Ensembl genes present in the database.
-
4. Use the "GENE:" filter!
-
Check “Input external reference ID list” and select "HGNC symbol(s)"; enter "PC" in the “input field”.
-
5. Check that the number of selected genes is 1 (click "Count").
-
6. Select the appropriate Variant filters according to the questions.
-
6. Select the required Attributes (output) for your SNPs list:
-
Hint: Select “Variant (Germline)”.
-
Select all attributes you need for answering the question.
-
When done:
Click on Results
-
Choose “MS Excel (XLS)” (or html if no MS Excel is installed) as output format.
Tick "Unique results only"
-
Use Excel to filter the list and generate the final output!
-
Report the number of unique SNP found with the above listed criteria.
-
Follow the links or use the Variant name to search in dbSNP
-
Report the disease name
-
List the publications for the variant(s)
Links: