Skip to main content

Bioinformatics 1 Exercise 2

Notes

Introduction (Exercise 2)

Cheat sheet (Exercise 2)

Part I – Regulatory genomics

1.1For the human transcript NM_000920 (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3’UTR sequence as fasta file, length of 3’UTR (Ensembl Biomart).
1.2Is there a complementary sequence within the 3’UTR of PC to postion 2-8 in the sequence of microRNA hsa-miR-182-5p using Notepad++.
1.3Identify position of transcript start site and transcription end in the human genome assembly hg38 (UCSC genome browser).
1.4Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad++.
1.5Construct in both cases sequence logo and frequency plot using WebLogo. Can you identify (regulatory) sequence motifs?

Part II – Protein function

2.1What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved?
– Show human pathway maps and find Enzyme ID (EC number) using KEGG
– Identify functional domains and gene ontology annotation of the pyruvate carboxylase protein sequence using Uniprot and InterPro.
2.2Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae (OrthoDB), perform multiple sequence alignment using ClustalW (Clustal Omega) and visualize with Jalview.

Part III – Bioinformatics challenge

3.1Seven transmembrane-spanning G protein-coupled receptors (GPCRs) comprise the largest known membrane protein family encoded by the human genome. Which of the intracellular amino acid sequences of these GPCRs (7m.txt) encode a protein kinase A (PKA) phosphorylation motif (RXXS/T). What are the respective numbers and positions of these motifs.
Hints: One possible way is to use Notepad++ to transform text into 2 columns separating name and sequence by a tab (\t) and using regular expression (R..[ST]) to find and replace motifs (e.g. &&&&). In order to count matches using EXCEL and formulas C1=FINDEN(“&&&&”,$B1,1) and to find more occurances e.g. D1=FINDEN(“&&&&”,$B1,C1+4) starting search at position from the previous match adding 4.