Exercise 2 - Task 2
Transcriptional regulation / Transcription factor binding sites
2.2.1 Find potential TFBS in promotor regions of genes
-
Use the UCSC genome browser to retrieve promoter sequences (-700 ... +300) for the human genes in the list below and use the PFM
for the upregualting response elements from the previous task to search for potential TFBS (transcription
factor binding sites).
- Gene list
- Use FIMO from the meme suite
FIMO from the meme siute, can use the PFM as produced by the sequence logo program.
-
How many p53 target genes could you identify (generate a list of these genes)?
-
For the most significant target, find a publication that confirms the regulation by p53.
- Hints
2.2.2 Map peaks from ChIPseq data to the human genome
-
Map the p53 peak regions from a ChIPseq result file (BED file format, genome assembly hg19/GRCh37) to the human genome using the UCSC genome browser.
ChIPseq peak-file: p53_macs_peaks.bed
- Make sure to select the correct genome assembly version. (*)
- Add the data as custom track (rename the track to “p53”)
- display the region around the CDKN1A gene
- Show the mapped regions together with other known regulatory features
- CpG islands
- ENCODE Regulation
- TFBS conserved
- Where are the p53 peaks located in respect to the TSS (near/far/upstream/downstream)?
- Find all peaks that overlap with a CpG site.
- Find all peaks that overlap with a DNase cluster.
- Extract the peak sequence from the peak that overlaps with CDKN1A
- Find the exact TFBS in this peak with your p53 PFM (upregulated)
- Map the exact binding site back to the ChIPseq peak regions in UCSC genome browser and display the result.
- Hints
2.2.3 Predict a TFBS motif from ChIPseq data
-
Predict a TFBS motif using peak sequences from ChIPseq data
ChIPseq peak-file: p53_macs_peaks.bed
- Find the top 300 ChIPseq peaks, according to their -log10(p-value) in the 5th column:
Import p53_macs_peaks.bed in Excel and sort descending, and save the first 300 lines as new BED file.
- Extract the peak sequences from the top 300 ChIPseq peaks using the UCSC table browser.
- Discover a TFBS motif with MEME-ChIP using the sequences from the top 300 peaks
- How many sites contained the discovered motif?
- Compare the motif (logo) obtained from the ChIPseq data with the motif(s) you generated in task 2.1 and the motifs from the Jaspar database.
- Discuss the differences (if any). Which are the most similar motifs that can be found in the databases?
Hints
Links:
Notes: (*) You may also convert the peak coordinates to the newest genome assembly using liftOver.
if you use liftOver you need too re-add the score values in the 5th BED-file column