Genomic tools and databases

Genome browsers

IGV

Online browser

Annotation sources

UCSC annotations

Databases

OMIM

ClinVar

Varsome

Clingen

Bioinformatics tools

Coding areas

SIFT  

Predicts whether amino acid substitutions affect protein function. Uses comparison with other related proteins and their structures to infer the effect.

POLYPHEN2   

Infer the impact of amino acid change based on the physical and comparative structure causing the change.

MUT TASTER  

Uses a Bayesian method to predict a change at the nucleotide or amino acid level. This includes evaluation of intron borders, synonymous substitutions.

MUT ASSESSOR      

Evaluates the impact of an amino acid change in cancer using information from conserved sites in other homologous molecules.

FATHMM      

Evaluates the effect of missense mutations using a Markov conservation model on conserved protein alignments, and assesses their pathogenicity weight.

INTRONS

dbscSNV RF and Ada

Both tools evaluate intronic cleavage regions (-3 to +8 and -12 to +2) to determine the impact of nucleotide change on splicing. The Ada version uses ensemble values to calculate the probability of impact.

CONSERVATION

GERP  

Identifies conservation elements by the pressure to remain unchanged.

Functional Whole Genome 

GenoCanyon  

Self-learning unsupervised annotation method that infers the functional impact of each base in the genome.

fitCons     

Integrates functional assays to calculate a conservation value of a genomic pattern.

Other annotation databases

Table NameExplanationDate
1000g2015aug (6 data sets)The 1000G team fixed a bug in chrX frequency calculation. Based on 201508 collection v5b (based on 201305 alignment)20150824
abraom2.3 million Brazilian genomic variants20181204
avsnp150dbSNP150 with allelic splitting and left-normalization20170929
cadd13CADD version 1.320170123
cadd13gt10CADD version 1.3 score>1020170123
cadd13gt20CADD version 1.3 score>2020170123
cg46alternative allele frequency in 46 unrelated human subjects sequenced by Complete Genomics20120222
cg69allele frequency in 69 human subjects sequenced by Complete Genomics20120222
clinvar_20221231Clinvar version 20221231 with separate columns (CLNALLELEID CLNDN CLNDISDB CLNREVSTAT CLNSIG)20230105
cosmic68wgsCOSMIC database version 68 on WGS data20140224
dbnsfp42areformatted to include more columns than dbnsfp41a20210710
dbscsnv11dbscSNV version 1.1 for splice site prediction by AdaBoost and Random Forest20151218
eigenwhole-genome Eigen scores, see ref20160330
ensGeneFASTA sequences for all annotated transcripts in Gencode v43 Basic collection lifted up to hg19 (last update was 2023-02-15 at UCSC)20230315
esp6500siv2_allalternative allele frequency in All subjects in the NHLBI-ESP project with 6500 exomes, including the indel calls and the chrY calls. This is lifted over from hg19 by myself.20141222
exac03ExAC 65000 exome allele frequency data for ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian)). version 0.3. Left normalization done.20151129
exac03nonpsychExAC on non-Psychiatric disease samples (updated header)20160423
exac03nontcgaExAC on non-TCGA samples (updated header)20160423
fathmmwhole-genome FATHMM_coding and FATHMM_noncoding scores (noncoding and coding scores in the 2015 version was reversed)20160315
gene4denovo201907gene4denovo database20191101
gene4denovo201907gene4denovo database20191101
gerp++elemconserved genomic regions by GERP++20140223
gerp++gt2whole-genome GERP++ scores greater than 2 (RS score threshold of 2 provides high sensitivity while still strongly enriching for truly constrained sites. )20120621
gmeGreat Middle East allele frequency including NWA (northwest Africa), NEA (northeast Africa), AP (Arabian peninsula), Israel, SD (Syrian desert), TP (Turkish peninsula) and CA (Central Asia)20161024
gnomad211_exomegnomAD exome collection (v2.1.1), with “AF AF_popmax AF_male AF_female AF_raw AF_afr AF_sas AF_amr AF_eas AF_nfe AF_fin AF_asj AF_oth non_topmed_AF_popmax non_neuro_AF_popmax non_cancer_AF_popmax controls_AF_popmax” header20190318
gnomad312_genomeversion 3.1.2 whole-genome data20221228
gwavawhole genome GWAVA_region_score and GWAVA_tss_score (GWAVA_unmatched_score has bug in file), see ref.20150623
hrcr140 million variants from 32K samples in haplotype reference consortium20151203
icgc28International Cancer Genome Consortium version 2820210122
intervar_20180118InterVar: clinical interpretation of missense variants (indels not supported)20180325
kaviar_20150923170 million Known VARiants from 13K genomes and 64K exomes in 34 projects20151203
knownGeneFASTA sequences for all annotated transcripts in UCSC Known Gene (last update was 2009-05-10 at UCSC)20211019
ljb26_allwhole-exome SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR, VEST, CADD, GERP++, PhyloP and SiPhy scores from dbNSFP version 2.620140925
mcap13[M-CAP scores v1.3]20181203
mitimpact2pathogenicity predictions of human mitochondrial missense variants (see here20150520
nci60NCI-60 human tumor cell line panel exome sequencing allele frequency data20130724
popfreq_all_20150413A database containing all allele frequency from 1000G, ESP6500, ExAC and CG4620150413
popfreq_max_20150413A database containing the maximum allele frequency from 1000G, ESP6500, ExAC and CG4620150413
refGeneFASTA sequences for all annotated transcripts in RefSeq Gene (last update was 2020-08-22 at UCSC)20211019
refGeneWithVerFASTA sequences for all annotated transcripts in RefSeq Gene with version number (last update was 2020-08-22 at UCSC)20211019
regsnpintronlifeOver of above20180922
regsnpintronprioritize the disease-causing probability of intronic SNVs20180920
revelREVEL scores for non-synonymous variants20161205
snp138I lifted over SNP138 to hg1820140910