Genome browsers
Annotation sources
Databases
Bioinformatics tools
Coding areas
SIFT
Predicts whether amino acid substitutions affect protein function. Uses comparison with other related proteins and their structures to infer the effect.
POLYPHEN2
Infer the impact of amino acid change based on the physical and comparative structure causing the change.
MUT TASTER
Uses a Bayesian method to predict a change at the nucleotide or amino acid level. This includes evaluation of intron borders, synonymous substitutions.
MUT ASSESSOR
Evaluates the impact of an amino acid change in cancer using information from conserved sites in other homologous molecules.
FATHMM
Evaluates the effect of missense mutations using a Markov conservation model on conserved protein alignments, and assesses their pathogenicity weight.
INTRONS
dbscSNV RF and Ada
Both tools evaluate intronic cleavage regions (-3 to +8 and -12 to +2) to determine the impact of nucleotide change on splicing. The Ada version uses ensemble values to calculate the probability of impact.
CONSERVATION
GERP
Identifies conservation elements by the pressure to remain unchanged.
Functional Whole Genome
GenoCanyon
Self-learning unsupervised annotation method that infers the functional impact of each base in the genome.
fitCons
Integrates functional assays to calculate a conservation value of a genomic pattern.
Other annotation databases
Table Name | Explanation | Date |
1000g2015aug (6 data sets) | The 1000G team fixed a bug in chrX frequency calculation. Based on 201508 collection v5b (based on 201305 alignment) | 20150824 |
abraom | 2.3 million Brazilian genomic variants | 20181204 |
avsnp150 | dbSNP150 with allelic splitting and left-normalization | 20170929 |
cadd13 | CADD version 1.3 | 20170123 |
cadd13gt10 | CADD version 1.3 score>10 | 20170123 |
cadd13gt20 | CADD version 1.3 score>20 | 20170123 |
cg46 | alternative allele frequency in 46 unrelated human subjects sequenced by Complete Genomics | 20120222 |
cg69 | allele frequency in 69 human subjects sequenced by Complete Genomics | 20120222 |
clinvar_20221231 | Clinvar version 20221231 with separate columns (CLNALLELEID CLNDN CLNDISDB CLNREVSTAT CLNSIG) | 20230105 |
cosmic68wgs | COSMIC database version 68 on WGS data | 20140224 |
dbnsfp42a | reformatted to include more columns than dbnsfp41a | 20210710 |
dbscsnv11 | dbscSNV version 1.1 for splice site prediction by AdaBoost and Random Forest | 20151218 |
eigen | whole-genome Eigen scores, see ref | 20160330 |
ensGene | FASTA sequences for all annotated transcripts in Gencode v43 Basic collection lifted up to hg19 (last update was 2023-02-15 at UCSC) | 20230315 |
esp6500siv2_all | alternative allele frequency in All subjects in the NHLBI-ESP project with 6500 exomes, including the indel calls and the chrY calls. This is lifted over from hg19 by myself. | 20141222 |
exac03 | ExAC 65000 exome allele frequency data for ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian)). version 0.3. Left normalization done. | 20151129 |
exac03nonpsych | ExAC on non-Psychiatric disease samples (updated header) | 20160423 |
exac03nontcga | ExAC on non-TCGA samples (updated header) | 20160423 |
fathmm | whole-genome FATHMM_coding and FATHMM_noncoding scores (noncoding and coding scores in the 2015 version was reversed) | 20160315 |
gene4denovo201907 | gene4denovo database | 20191101 |
gene4denovo201907 | gene4denovo database | 20191101 |
gerp++elem | conserved genomic regions by GERP++ | 20140223 |
gerp++gt2 | whole-genome GERP++ scores greater than 2 (RS score threshold of 2 provides high sensitivity while still strongly enriching for truly constrained sites. ) | 20120621 |
gme | Great Middle East allele frequency including NWA (northwest Africa), NEA (northeast Africa), AP (Arabian peninsula), Israel, SD (Syrian desert), TP (Turkish peninsula) and CA (Central Asia) | 20161024 |
gnomad211_exome | gnomAD exome collection (v2.1.1), with “AF AF_popmax AF_male AF_female AF_raw AF_afr AF_sas AF_amr AF_eas AF_nfe AF_fin AF_asj AF_oth non_topmed_AF_popmax non_neuro_AF_popmax non_cancer_AF_popmax controls_AF_popmax” header | 20190318 |
gnomad312_genome | version 3.1.2 whole-genome data | 20221228 |
gwava | whole genome GWAVA_region_score and GWAVA_tss_score (GWAVA_unmatched_score has bug in file), see ref. | 20150623 |
hrcr1 | 40 million variants from 32K samples in haplotype reference consortium | 20151203 |
icgc28 | International Cancer Genome Consortium version 28 | 20210122 |
intervar_20180118 | InterVar: clinical interpretation of missense variants (indels not supported) | 20180325 |
kaviar_20150923 | 170 million Known VARiants from 13K genomes and 64K exomes in 34 projects | 20151203 |
knownGene | FASTA sequences for all annotated transcripts in UCSC Known Gene (last update was 2009-05-10 at UCSC) | 20211019 |
ljb26_all | whole-exome SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR, VEST, CADD, GERP++, PhyloP and SiPhy scores from dbNSFP version 2.6 | 20140925 |
mcap13 | [M-CAP scores v1.3] | 20181203 |
mitimpact2 | pathogenicity predictions of human mitochondrial missense variants (see here | 20150520 |
nci60 | NCI-60 human tumor cell line panel exome sequencing allele frequency data | 20130724 |
popfreq_all_20150413 | A database containing all allele frequency from 1000G, ESP6500, ExAC and CG46 | 20150413 |
popfreq_max_20150413 | A database containing the maximum allele frequency from 1000G, ESP6500, ExAC and CG46 | 20150413 |
refGene | FASTA sequences for all annotated transcripts in RefSeq Gene (last update was 2020-08-22 at UCSC) | 20211019 |
refGeneWithVer | FASTA sequences for all annotated transcripts in RefSeq Gene with version number (last update was 2020-08-22 at UCSC) | 20211019 |
regsnpintron | lifeOver of above | 20180922 |
regsnpintron | prioritize the disease-causing probability of intronic SNVs | 20180920 |
revel | REVEL scores for non-synonymous variants | 20161205 |
snp138 | I lifted over SNP138 to hg18 | 20140910 |