Database developments

August 2018, R19

  • This R19 release compiles data on over 29000 somatic mutations, 8000 variants reported in SNP databases, 1200 germline mutations related to Li-Fraumeni syndrome, 2700 cell-lines, 900 experimentally induced mutations, and functional data on over 4400 mutant proteins.
  • Variant descriptions are now provided on both hg19 and hg38 genome builds.
  • The dataset of germline mutations (rare disease-causing variants) has been updated with data published between January 2016 and June 2018. The dataset has increased by 30%!
  • The dataset of functional impact of p53 mutant proteins has been updated with 12 studies, including one major study that analyzed over 9500 DNA-binding domain variants for their growth suppression activity (see Kotler et al.,).
  • Data on polymorphisms (variants frequent in healthy human population) have been curated to include most recent data from dbSNP151, Flossie, gnomAD, 1000G and ESP6500 databases for the full TP53 gene sequence, including 5'UTR and 3'UTR regions (hg38, chr17:7661725-7689853). Allelic frequencies have been retrieved from these databases to classify variants as "validated polymorphisms" (MAF > 0.001 in at least one of these databases).
  • New annotations have been added on the predicted functional impact of variants by REVEL, BayesDel and an optimized Align-GVGD algorithm (see Fortuno et al.,).
  • Direct links to individual mutations in other databases have been added: CLINVAR, COSMIC, gnomAD.
  • The dataset of somatic mutations has not been updated as most new data are captured in other databases (cBioportal, COSMIC, TCGA and ICGC data portals). Instead we added somatic mutation counts from cBioportal for each individual mutation.
  • The dataset of induced mutations has not been changed because no new data were found.
  • Data on TP53 status in cell-lines have not been updated.
