Bioinformatic Random Seed

Keep up with what's happening in Bioinformatics and Machine Learning (^ω^)

Back to main page

Reference Genome

Last update: 6/8/2021
By: Huitian (Yolanda) Diao

Reference Genome

A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals [1].

Genome versus Transcriptome

Genome = all DNA
Transciptome = all transcribed RNA

Difference between Reference Genomes

GENCODE uses the UCSC convention of prefixing chromosome names with “chr”, e.g. “chr1” and “chrM”, but Ensembl calls these “1” or “MT”. At the time of writing (Ensembl 89), a few transcripts differ due to conversion issues. In addition, around 160 PAR genes are duplicated in GENCODE but only once in Ensembl. The differences affect fewer than 1% of the transcripts. Apart from gene annotation itself, the links to external databases differ [2].

Genomice Orientation

Genomic orientation

References

[1] Wikipedia - Reference Genome. [2] UCSC Genome FAQ.