Datasets and their annotations
geneVariationIARC TP53 Database,R20.txt
Column head |
Description |
MUT_ID |
Unique identifier of each gene variation reported in the database.
This identifier is used in all datasets (somatic, polymorhisms, germline). |
hg19_Chr17_coordinates |
Chromosome coordinate of mutation: start position based on hg19 human genome assembly. |
hg38_Chr17_coordinates |
Chromosome coordinate of mutation: start position based on hg38 human genome assembly. |
ExonIntron |
Location of the mutation in the introns or exons in TP53 gene for the reference sequence NM_000546.5.
Terms occurring in this column are "1-intron"' to "11-intron" and "2-exon"
to "11-exon". An "i" or "e" in front mean that the mutation is located within the indicated intron
or exon with no information on the precise location.
|
Codon_number |
For mutations in exons, codon number
at which the mutation is located (1-393). If a mutation spans more
than one codon, (e.g. tandem mutation or deletion of several
bases) only the first (5') codon is entered. For mutations in
introns, 0 is entered.
|
WT_nucleotide |
Base in the reference sequence at the position of the mutation on the coding sequence. |
Mutant_nucleotide |
Mutant base, described on the coding strand. |
Description |
Nucleotide change read from the coding sequence. For
deletions and insertions, the number of bases deleted (del) or
inserted (ins) is given. For more complex mutation events, a full
description is given as indicated in the original publication.
|
c_description |
Mutation nomenclature according to HGVS standards and using the NM_000546.5 coding sequence as reference. |
g_description |
Mutation nomenclature according to HGVS standards and using the GenBank NC_000017.10 (hg19 assembly) genomic sequence as reference. |
g_description_GRCh38 |
Mutation nomenclature according to HGVS standards and using the GenBank NC_000017.11 (hg38 assembly) genomic sequence as reference. |
Type |
Nature of the mutation. The terms
occurring in this column are "A:T>C:G" (A to C or T
to G base change), "A:T>G:C" (A to G or T to C base
change), "A:T>T:A" (A to T or T to A base change),
"G:C>A:T" (G to A or C to T base change at non CpG
sites), "G:C>A:T at CpG" (G to A or C to T base
change at CpG sites),
"G:C>C:G" (G to C or C to G base change),
"G:C>T:A" (G to T or C to A base change),
"tandem" (two consecutive base changes), "ins"
(insertion), "del" (deletion) and "complex"
(complex changes). |
Splice_site |
Annotation on the position of the mutation within conserved nucleotides of p53 consensus, criptic or alternative splice sites:
consensus SD or SA= the mutation is located at conserved dinucleotides involved in p53 consensus splice sites
(SD for splice donor site, SA for splice acceptor site) producing the full-lenght p53 protein (TA isoform);
criptic SD or SA= the mutation is located at conserved dinucleotides involved in splice sites (gt or ag)
that have been observed experimentally in p53 sequences carrying mutated consensus splice sites;
alternative SD or SA= conserved dinucleotides involved in splice sites (gt or ag)
responsible for producing p53 isoforms beta and gamma;
alternative = mutated nucleotides are in the "cassette" sequence responsible for producing the p53 delta isoform;
no= the position is outside the above mentioned nucleotides.
Information on splice site can be found here.
|
CpG_site |
Yes or No indicate if the position of the mutation falls within a CpG
site or not.
|
Context_coding |
Trinucleotide sequence context of variants. The 5' base and 3' base of the start position of the variant are indicated on the left and right respectively of the mutated base.
This context is provided on the coding strand of the gene sequence and is based on hg38 TP53 sequence.
|
Mut_rate |
Substitution rates were calculated for all single base substitutions in the coding sequence of p53 according to the
dinucleotide substitution rates derived from human-mouse aligned sequences of chromosomes 21 and 10
(Lunter and Hein 2004).
The mutation probabilities for a given single nucleotide substitution are calculated by averaging the dinucleotide substitution
rates at that position for the forward and reverse strands. |
WT_codon |
For mutations in exons, sequence of the codon in which the mutation occurred in NM_000546.5 transcript. |
Mutant_codon |
Base sequence of the mutated codon in NM_000546.5 transcript. |
WT_AA |
Wild-type amino acid encoded at the codon in which the mutation occurred (three-letter amino acid
abbreviation). Check AA letter code and
Genetic code |
Mutant_AA |
Mutated amino acid encoded at the
codon in which the mutation occurred (three-letter amino acid
abbreviation). The chain terminating mutations due to single base
substitutions are designated by "stop".
Check AA letter code and
Genetic code
|
ProtDescription |
Mutation description at the protein level as recommended by HGVS and using the Uniprot reference sequence P04637. |
Mut_rateAA |
Mutation rate of amino-acid substitution calculated by summing up the nucleotide substitution
rates. This value is only valid for amino-acid substitutions
resulting from single nucleotide substitutions.
|
Effect |
Effect of the mutation. The terms
occurring in this column are: missense (change of one amino-acid), nonsense (introduction of a stop codon), FS
(frameshift), silent (no change in the protein sequence),
splice (mutations located in the two first and two last conserved nucleotides of the introns and are thus predicted to alter splicing,
or mutations that have been shown to alter splicing experimentally), other (inframe deletions or insertions), intronic (mutations in introns outside splicing sites), NA (mutations upstream in 5' or 3' UTR).
|
Polymorphism |
Polymorphic status of the gene variation.
Validated : MAF > 0.001 in ESP6500, 1000G or gnomAD databases;
No : not reported or reported at MAF < 0.001 in ESP6500, 1000G, or gnomAD databases;
NA : not applicable.
|
SNPlink |
Link to NCBI SNP database. |
gnomADlink |
Link to gnomAD database. |
SourceDatabases |
SNP databases from which the variants have been extracted. |
PubMedlink |
PubMedID of the publications in which was reported the polymorphic status of the variant. |
Domain function |
Function of the domain in which the mutated residue is located. |
Residue function |
Known function of the wild-type residue. When the function is not known but the structure
is known, the solvent accessibility (SA) of the residue is indicated by the terms buried, exposed or partially exposed
(SA calculated with Naccess and 1TSR (chain B) structure of p53:
<20 = buried, >
=20 and
<50 = partially exposed, >
=50 = exposed).
|
Hotspot |
"Yes" indicate the a variant is located in a codon defined as a cancer hotspot by Chang (2017). |
Structural_motif |
2D and 3D motifs where the mutation is located according to structures described in
Cho
et al. (1994) and
May and May (1999) |
SA |
Solvent accessibility of the wild-type residue as calculated with
Naccess
and the 1TSR (chain B) structure of p53.
|
AGVGD class |
Missense variant functional predictions by an optimized Align-GVGD tool.
Mutations classified as "C0" are considered tolerated while other classes are considered damaging.
Further details in Fortuno (2018).
|
BayesDel |
Missense variant functional predictions by BayesDel tool (Feng 2017) used without allele frequency.
Score >=0.16: damaging
Score <0.16: tolerated
Further details in Fortuno (2018).
|
REVEL |
Missense variant functional predictions by REVEL tool (Ioannidis 2016).
Score >=0.5: damaging
Score <0.5: tolerated
Further details in Fortuno (2018).
|
SIFT class |
Functional classification based on SIFT program using default settings.
Missense mutations are classified as "damaging" or "tolerated".
|
Polyphen2 |
Functional classification based on Polyphen2
HVAR annotations retrieved with Annovar software:
D: probably Damaging
P: Possibly damaging
B: Benign.
|
Transactivation |
Promoter-specific transcriptional activity measured in yeast functional assays and expressed as percent of wild-type activity.
Data from Kato (2003) |
TransactivationClass |
Functional classification based on the overall transcriptional activity (TA) on 8 different promoters as measured in yeast assays by Kato et al.
For each mutant, the median of the 8 promoter-specific activities (expressed as percent of the wild-type protein) is calculated and missense mutations are classified as "non-functional" if the median is <=20,
"partially functional" if the median is >20 and <=75, "functional" if the median is >75 and <=140, and "supertrans" if the median is >140. |
DNE_LOFclass |
Functional classification for loss of growth-suppression and dominant-negative activities based on Z-scores from Giacomelli et al., (2018) study:
DNE_LOF = p53WTNutlin3 Z-score >= 0.61 and Etoposide Z-score <= -0.21;
notDNE_notLOF = p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score > -0.21;
notDNE_LOF if p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score <= -0.21;
unclass = others
|
DNE class |
Dominant-negative (DN) Effect on transactivation by wild-type p53.
Classification established for mutants for which available DN activity on more than
2 p53-response elements is available. Data are based on WAF1 and RGC promoters in various studies
(these promoters were the most frequently used in different studies to assess DNE status), and on
two large systematic study (Dearth et al that includes 76 mutants;
Monti et al that includes 104 mutants).
Mutants were classified as "Yes" if they had dominant-negative activity on both WAF1 and
RGC promoters, or on all promoters in the large studies, "Moderate" if they had dominant-negative
activity on some but not all promoters, and "No" if they had no dominant-negative
activity on both WAF1 and RGC promoters, or none of the promoters in the large studies.
|
Structure/Function class |
Functional predictions derived from a computer model that takes into account
the 3D structure of WT and mutant proteins and is trained on the transactivaton dataset from
Kato et al. Mutations are classified as "functional" or "non-functional". More details
here.
|
EffectGroup3 |
Mutation classification based on protein 3D structure and mutation type.
This classification has been used to derive gentoype-phenotype correlations
in sporadic breast cancers (Olivier et al., 2006).
0=silent+intron;
1=missense in DNA-binding loops(L2,H1,L3,L1,S2,S2',H2);
2=other missense;
3=inFrame del/ins;
4=FS+splice+nonsense.
|
SwissProtLink |
SwissProt identification number with link to the variant page of the SwissProt database. |
Somatic_count |
Number of occurence in the IARC somatic dataset (number of tumors reported to carry this somatic mutation). Total count is 29891 in R20.
|
Germline_count |
Number of occurence in the IARC germline dataset (number of pedigree/individual carriers of this germline mutation). Total count is 1532 in R20.
|
CellLine_count |
Number of occurence in the IARC cell-line dataset (number of cell-lines reported to carry this mutation). Total cell-line count is 2766 in R20.
|
COSMIClink |
Link to mutation ID in COSMIC database. |
CLINVARlink |
Link to CLINVAR database. |
TCGA_ICGC_GENIE_count |
Sum of mutation occurence from TCGA (MC3), ICGC (v28) and GENIE (V5) datasets. Total count is 23570. |
Predicted effect on splicing:
Column head |
Description |
Site Type |
Indicate if the predicted splice site is an acceptor site or donor site. |
p53 Site |
Indicate if the predicted splice site correspond to a canonical p53 splice site. |
WT score |
Fit score of the predicted splice site for the non-mutated sequence (scores are specific of prediction tools). |
MUT score |
Fit score of the predicted splice site for the mutated sequence (scores are specific of prediction tools). |
Variation |
Predicted effect of the mutation on the predicted splice site. |
Source |
Prediction tool used. |
Predicted effect on p53 protein isoforms:
The predictions provided are based on whether the mutation falls within the specific isoform. For a description of p53 isoforms, see here.
Column head |
Description |
TAp53alpha |
Indicate if the mutation fall within the canonical isoform coding for the full length p53 protein. |
TAp53beta....deltap53alpha |
Indicate if the mutation fall within the specified isoform. |
Somatic mutations found in human tumor samples
somaticMutationDataIARC TP53 Database, R20.txt
This dataset contains TP53 somatic mutations identified in human tumor samples (including metastasis and cell-lines).
It includes data on the type and position of mutations, detailed information on the tumor in which the mutations have been found,
and on various characteristics of the patients in which the tumor developed.
Each row in the downloaded tab-delimited text file represents a single mutation reported in a tumor sample with an arbitrarily assigned unique identification number.
A unique identification number is also attributed to the tumor sample and to the patient. Table content is as follows:
Column head |
Description |
The first set of columns describe the mutation. |
Mutation_ID |
Unique identification number for a Sample/Mutation association.
Tandem mutations (two adjacent base substitutions) are considered as one mutation event;
therefore tandem mutations have only one identification number and are a single record. |
MUT_ID....SwissProtLink |
see mutation annotations |
The second set of columns are assigned to the description of the organ site, tissue and type of
lesion in which the mutation has been identified. The descriptions given in the publication are translated into the standards
of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000)
and SNOMED.
For information on tumor classification, grading and staging, check out
ICD-O training at SEER,
Cancer
Information at NCI and Oncologychannel.com.
|
Sample_Name |
A sample name is assigned as follows: first 3 letters of the first author's name, year of publication (2 digit), followed by the ID number indicated in the publication.
The same name or number can occur several times as in some samples more than one mutation has been reported. |
Sample_ID |
Unique sample identification number. This number allows the automatic retrieval of samples with
multiple mutations. |
Sample_source...TNM |
see sample annotations |
p53_IHC |
p53 immunostaining graded as ‘positive’, ‘negative’ or ‘+/-‘. ND stands for not done. |
Add_Info |
Any relevant additional information is entered here. |
The third set of columns are assigned to the description of the patient origin and life-style.
They contain heterogeneous notes, usually comments emphasized by authors reporting the mutations.
It should be noted that this information is generally qualitative. No quantitative information on exposure of risk
factors is included in the database. This information does not presuppose that a formal, causal link has been established between
such factors and the mutation described. Moreover, for most exogenous risk factors, individual exposure has not been
monitored. This information is given solely to (i) permit the retrieval of mutations found in patients
belonging to defined groups or having specific risk factors, and (ii) facilitate access to the corresponding publications. For
detailed comparison between exposure groups, users are invited to perform their own analysis based on the information given in the
original publication.
|
Individual_ID |
Unique identification number for an individual included in the database. It is automatically assigned
by the database system.
|
Sex...Country |
see patient annotations |
TP53polymorphism |
Presence of a polymorphism in TP53 gene. |
Germline_mutation |
Germline mutation detected in any gene in the patient. |
Family_history |
Information on the presence or absence of cancers in the family of the patient. |
Tobacco |
Information on the smoking status of the patient. Terms occurring in this column are 'smoker' (with
qualitative amount in brackets), 'non-smoker', 'passive-smoker' and chewer. |
Alcohol |
Information on the drinking status of the patient. Terms occurring in this column are ‘drinker’
(with qualitative amount in brackets), and 'non-drinker'. |
Exposure |
Risk factors to which the patient has been exposed to, such as aflatoxins, radon, thorotrast, etc… |
Infectious_agent |
Pathogen (virus or bacteria) detected in the patient. |
Ref_ID |
Unique identification number for the reference in which the mutation is described. |
PubMed |
PubMed reference number provided by NCBI. |
Exclude_analysis |
Studies that we recommend to exclude from any analysis because of dubious quality.
Such studies are identified based on the following criteria: they report several samples with
multiple mutations, and/or a high proportion of rare variants or variants classified as functional.
|
somaticMutationReference-IARC TP53 Database, R20.txt
This file lists the publications in which are described the mutations
and gives the method used to detect the mutations. Each row (record) represents a
citation with an arbitrarily assigned unique identification number
(Ref_ID). See standardized annotations for the
description of the column content.
Prevalence of TP53 somatic mutations by tumor site
prevalenceSomaticIARC TP53 Database, R20.txt
This dataset contains information on the proportion of tumors that carry a somatic TP53
mutation extracted from publications contained in the Somatic dataset, and in additional publications that do not give a
detailed description of the mutations (many studies do not provide detailed information on each mutation detected
but rather report their results in the form of summary tables, preventing their inclusion in the somaticMutation dataset), or publications reporting negative results (no mutation found, thus not included in the somaticMutation dataset).
For each study, the total number of tumors or tissue samples analyzed, and the number of these samples which were found to contain
a mutation is provided.The reference paper, method of mutation detection and country of origin of the patients are also indicated.
When the same research team published several papers that describe the same set of samples, data from the most recent or more complete paper are used.
Column head |
Description |
Prevalence_ID |
Unique entry identification number. |
Topography...Morpho_code |
see sample annotations |
Sample_analyzed |
Number of tumor samples analyzed for TP53 mutations. |
Sample_mutated |
Number of tumor samples with a mutation in TP53. |
Country...Development |
see patient annotations |
Comment |
Any relevant information. |
Ref_ID...PubMed |
see reference annotations |
Tissue_processing....exon 11 |
see method annotations |
Exclude_Analysis |
Studies that we recommend to exclude from any analysis because of data quality issues.
Studies are labeled as 'exclude' if: they report several samples with multiple mutations
in patients with no specific genetic background or exposure to mutagen;
they report more variants that are classified as functional or partially functional (based on TA class) than variants classified as non-functional;
mutations are not precisely described and can not be fully annotated in the database;
several mutations in the series are reported with errors (such as position and base that do not fit, or report of neutral polymorphisms as somatic mutations).
|
Prevalence of the R249S TP53 mutations in liver cancer
This dataset contains data on the prevalence of the c.747G>T (p.R249S) mutation in liver cancers.
It includes studies that have screened at least exon 7 of TP53 by sequencing, and studies that have searched for this specific mutation by RFLP.
The presence of this mutation in hepatocellular carcinomas has been linked to exposure to aflatoxins and HBV, and may thus constitutes a
biomarker of exposure. This dataset has been released with the R15 version of the database and has not been updtaed since then.
The file is a tab-delimited text file, that contains the following info:
Column head |
Description |
Ref_ID...PubMed |
see reference annotations. |
Country |
see patient annotations |
Sample_analyzed |
Number of tumor samples analyzed for the c.747G>T (p.R249S) TP53 mutation. |
Count_R249S |
Number of tumor samples containing the c.747G>T (p.R249S) TP53 mutation. |
Remark |
Any relevant information. |
Method |
Comment on method if different from sequencing. |
Prognostic value of TP53 somatic mutations
prognosisSomatic-IARC TP53 Database,R20
This dataset includes information on all studies that have analyzed the
relationship between p53 mutations and cancer prognosis. For each study,
the patient cohort, study settings and a summary of the results are
described. When the same research team published several papers with
increasing number of patients, the most recent paper with the largest
dataset is used.
Many of these studies do not provide detailed information on each
mutation detected but rather report their results in the form of summary
tables. Such publications have been included in the prognosis dataset but not in the somaticMutation dataset.
For some of them, the mutations have been published in a previous paper and can be retrieved with the
Cross_Ref_ID study identifier (see below).
The downloaded file contains the following information:
Column head |
Description |
Prognosis_ID |
Unique entry identification number. |
Topography |
see sample annotations |
Morphology |
see sample annotations |
Population |
see patient annotations |
Country |
see patient annotations |
Institution |
Name of the hospital(s) where the patients have been recruited. |
Period |
Time period (year) during which the patients have been recruited. |
Inclusion criteria |
ICD-O
(3rd edition) or SNOMED code for morphology |
Treatment |
Treatment protocol used for most of the patients.
SU, surgery; CX, chemotherapy; RX, radiotherapy; pre-op, pre-operative; CP, cyclophosphamide; CISP, cisplatin; doxo, doxorubicin; 5-FU, 5-fluorouracil
|
Median FU |
Median follow-up time of the patients in month. |
Range FU |
Range follow-up time of the patients in month. |
Cohort |
Number of patients/tumors analyzed for TP53 mutations. |
p53 mutations |
Number of patients/tumors with a mutation in TP53. |
Percent mutated |
Proportion of mutated tumors (%). |
Parameter_analyzed |
Clinical parameter analyzed (patient survival and/or tumor response to treatment). |
Association |
Summary result: association with the presence of a TP53 mutation. |
Result |
Main findings. |
Ref_ID...PubMed |
see
annotations. |
Exclude_analysis |
Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple mutations, and/or a high proportion of rare variants or variants classified as neutral or functional). |
Germline mutations in LFS/LFL families
germlineMutationData-IARC TP53 Database,R20
Inherited TP53 mutations are associated with a rare autosomal dominant disorder,
the Li-Fraumeni syndrome (LFS).
This dataset contains information on individuals that are carriers of a TP53 germline mutation and families in which at least one family member
has been identied as a carrier of a germline mutation in the TP53 gene. Criteria for inclusion are the following:
a) individuals carrying a sequenced TP53 germline mutation, affected or not by a cancer, b) individuals affected by a cancer and
belonging to a family in which at least one family member has been identified as a carrier of a germline mutation in the TP53 gene.
Each row (record) in the downloaded file represents a tumor found in an individual having a TP53
germline mutation. The file contains the following information:
Column head |
Description |
Family_ID |
Unique family identification number. |
Family_Code |
Name or number given in the original
publication or an arbitrarily-assigned name, usually the 3 first
letters of the first author's name and the publication date.
|
Country |
see annotations |
Class |
Family classification:
LFS = strict clinical definition of Li-Fraumeni
syndrome (defined by Li and Fraumeni as a Proband with sarcoma
<45 years with a first degree relative with cancer at <45
and another first/second degree relative with cancer at <45 or
sarcoma at any age);
LFL = Li-Fraumeni like for the extended clinical
definition of Li-Fraumeni (including Birch definition:
proband with any childhood cancer or sarcoma, brain tumor or
adrenocortical carcinoma at <45 years, with one first or second
degree relative with sarcoma, breast cancer, brain tumor,
leukemia, or adrenocortical carcinoma at any age, plus one first
or second degree relative in the same lineage with any cancer
diagnosed under age 60; Eeles definition E1: two different tumors
which are part of extended LFS in first or second degree relatives
at any age (sarcoma, breast cancer, brain tumor, leukemia,
adrenocortical tumor, melanoma, prostate cancer, pancreatic
cancer); Eeles definition E2: sarcoma at any age in the proband
with two of the following (two of the tumors may be in the same
individual): breast cancer at <50 years and/or brain tumor,
leukemia, adrenocortical tumor, melanoma, prostate cancer,
pancreatic cancer at <60 years or sarcoma at any age).
FH: family history of cancer which does not
fulfil LFS or any of the LFL definitions (Birch, Eeles E1 or E2,);
No FH: no family history of cancer.; FH= Family
history of cancer (not fulfilling the definition of LFS/LFL); No=
no family history of cancer; ?= unknown.
|
Generations_analyzed |
Number of generations analyzed in the family. |
Germline_mutation |
A TP53 germline mutation has been identified. |
MUT_ID...TAclass |
see mutation annotations |
Individual_ID |
Unique identification number for an individual included in the database.
It is automatically assigned by the database system. |
Individual_code |
Code or number given in the original publication or an arbitrarily-asigned code,
usually the family code followed by the position of the individual in the family tree. |
FamilyCase |
Family case in the pedigree, such as proband (index case), mother, father,.... |
FamilyCase_group |
Degree of relationship to the proband. |
Sex |
Gender of the individual. |
Germline_carrier |
TP53 mutation status of the individual: confirmed= the individual has been tested for the presence of the mutation and the
mutation has been found; obligatory= the individual has not been tested for the presence of the mutation but must be carrier based on
the mutation status of the other individuals in the pedigree; 50%prob.= there is a chance of 50% that the individual is a mutation carrier;
negative= the individual has been tested for the presence of the mutation and the mutation has not been found; NA=
the individual has not been tested for the presence of the mutation.
|
Mode_of_inheritance |
Mode of mutation inheritance: P=paternal, M=maternal, M&P=maternal and paternal, de novo= mutation that has not been inherited, "de novo, mosaic"= mutation that has not been inherited and is present in a subpopulation of cells, na=not known.
|
Dead |
Living status of the individual at
time of follow-up. 0=alive; 1=dead
|
Unaffected |
Disease status of the individual at time of follow-up. 0 = affected by cancer; 1 = not affected by
cancer.
|
Age |
Age of the individual at the time of follow-up. |
Topography |
see annotations |
Morphology |
see annotations |
Age at diagnosis |
Age of the individual at the time of diagnosis of the tumor. |
Ref_ID |
Reference number indicating the
publication in which the mutation is described. This number
corresponds to the Ref_ID number in the GermlineRefR20 file.
|
germlineMutationReference-IARC TP53 Database,R20
Each row represents a reference identified by a unique identification
number (Ref_ID). See standard annotations
for a description of the column content.
germlinePrevalenceIARC TP53 Database, R20
This dataset includes studies reporting TP53 germline mutation screening in large cohorts of patients selected based on various criteria (family history of cancer, specific cancer diagnosis, ...)
Each row represents the result of the analysis of TP53 germline mutation status in a selected cohort.
Column head |
Description |
Diagnosis |
Tumor site or clinical description of the selected cohort. |
Cohort |
Detailed criteria for patient selection. |
Cases analyzed |
Number of patients included in the mutation screen. |
Cases mutated |
Number of patients found to carry a TP53 mutation.
Details on mutations can be found in the dataset of germline mutations when the information was provided, but many studies
do not provide a detailed list of mutations. |
Mutation prevalence |
Percent of mutated cases. |
Remark |
Any further information on the cohort or method. |
PubMed |
PubMed ID with link to ncbi database. |
germlineFrequency_IARCTP53Database_R20
This new dataset includes studies reporting the frequency of individual TP53 germline mutations in case-control series using NGS for the screening of the entire coding regions of TP53.
Column head |
Description |
MUT_ID....DNEclass |
see mutation annotations |
Freq_cases |
Frequency of the TP53 variant in cases. |
Freq_controls |
Frequency of the TP53 variant in controls. |
Total_Cases |
Number of patients included in the "case" group. |
N_Cases |
Number of cases found to carry the specific TP53 variant.
Details on mutation phenotype may be found in the dataset of germline mutations if enough details is provided on patients and tumor type. |
Total_Controls |
Number of patients included in the "control" group. |
N_Controls |
Number of controls found to carry the specific TP53 variant. |
DataSource |
PubMed ID of the paper from which data have been extracted. |
StudyDetails |
Description of the case and control groups. |
Functional activities of missense mutations
Data on the biological properties of p53 mutant proteins in functional assays performed in yeast or human cells, are provided in two datasets.
functionalAssessment-IARC TP53 Database, R20
 |
In this dataset, data were extracted from publications that report functional assessment of p53 mutant proteins
in human or yeast cells, assessed either by transfection and overexpression of mutant proteins,
or by assessment of endogenous mutants. Comparison between mutants requires caution since
functional assays differ from one study to the other, in particular with respect to
the expression vector (which influences the level of expression of the mutant protein),
the p53-responsive elements (generic consensus sequence versus gene-specific
response elements from WAF1, BAX or PIG3), and the recipient cells that have been used.
|
The functional properties of mutant proteins that are included in this dataset are:
- transcriptional activities on various well-described p53-RE,
- dominant negative effects on the activities of wild-type p53,
- capacity to induce apoptosis, cell-cycle arrest or checkpoints in human cells,
- capacity to transactivate promoters that are not induced by wild-type p53,
- ability to promote cell growth and confer tumorigenicity,
- sensitivity to temperature changes regarding their ability to transactivate specific promoters.
The functional results have been organised in 5 columns for (1) conserved
wild-type properties, (2) complete or partial loss of wild-type properties, (3) dominant-negative effects, (4) gain
of function and (5) temperature sensitivity. The cell system is indicated in two columns and a detailed reference to the
published report is given.
Column head |
Description |
Function_ID |
Unique identification number for each entry. |
ProtDescription...Structural_motif |
see mutation annotations |
Codon 72 |
Amino-acid at codon 72 of p53 (polymorphism) |
Conserved WT function |
Functional property of mutant that is similar to the activity of the wild-type protein.
- Activities of mutant proteins in human or yeast cells:
DNAb = DNA binding capacity tested by gel-shift or ChIP assay;
TA = transactivation of a reporter gene under the control of a p53-response element
(indicated in brakets, see list here);
TR = transrepression of a reporter gene under the control of a gene-specific response-element
(name of gene indicated in brakets)
TETR = capacity to form tetramers;
x binding = interaction with protein x;
drug sensitivity = conserved capacity to mediate cytotoxic effect of drug (specific drug used is indicated, see List of abbreviations).
- Activities of over-expressed mutant proteins in human cells:
APO = induction of apoptosis;
GS = growth suppression measured by colony forming assay (CFA), establishment of stable clones, or other proliferation assay;
GA = cell cycle arrest measured by FACS;
TUMOR- = inhibition of tumorigenicity in nude mice;
up/downregulation = induction or repression of an endogenous GENE (in upper-case letters) or protein (in lowercase letters);
HR repression = inhibition of Homologous Recombination;
- Biological effect after over-expression in mouse or rat embryonic fibroblasts:
TRANSF- = ability to counteract the transformation of primary cells induced by the co-transfection of ras or another
transformant oncogene, such as HPV E7;
"super" indicates that the activity of mutant protein is higher than the one of wtp53 (on transactivation, induction of apoptosis, DNA binding or growth suppression).
|
Loss of Function |
WTp53 functional property that is lost by the mutant protein.
Same annotations as in previous column, with partial" indicating that the loss of function is partial (residual activity).
|
Dominant negative activity |
Inhibition of the wild-type protein by
mutant proteins in transactivation or cell growth assays.
- Yes = the mutant protein counteract the activity of the wild-type
protein when the two proteins are co-expressed in human or yeast
cells (the p53-response element or cell growth assay performed is indicated in brakets);
- No = the mutant protein does not counteract the effects of the
wild-type protein.
"moderate" indicates that the mutant protein has a partial inhibiting effect on
the wild-type protein. |
Gain of Function |
Functional properties displayed by the
mutant but not by the wild-type protein.
- Activities of over-expressed mutant proteins in human or yeast cells:
same annotations as in column 9, plus:
TUMOR+ = confer tumorigenic property (in nude mice) to transfected cells;
p73 interference = ability to counteract p73 activity when both proteins are expressed in a cell system;
Drug resistance = confer resistance to a cytotoxic drug (see List of abbreviations);
Growth advantage = increase growth rate.
- Biological effect after over-expression in mouse or rat embryonic fibroblasts:
TRANSF+ = ability to cooperate with ras or another transformant oncogene, such as HPV E7, in the transformation of
primary cells.
"moderate" indicates that the mutant protein has a partial effect on the activity studied;
"no" indicates that the mutant protein has no effect on the activity studied.
|
Temperature sensitivity |
Sensitivity of mutant to temperature changes in transactivation assays
(the p53-RE is indicated in brackets), and in other experimental assays (specified in brackets).
Yes = the activity of the mutant protein is affected by the
temperature at which is preformed the test;
mut_H = the protein is inactive (mutant) at higher temperatures;
mut_L = the protein is inactive (mutant) at lower temperatures;
No = the activity of the mutant protein is NOT affected by the temperature at which is preformed the test.
Note that functional tests are performed at different temperature in yeast (30ÝC) and human (37ÝC) cells.
Detailed annotation rules are available here. |
Temp_ref |
Temperature at which experiments have been performed or which has been used as reference for temperature sensitivity assays. |
Cell assay |
Human = the activity of the mutant
protein has been tested in human cells.
Yeast = the activity of the mutant protein has been tested in the
yeast.
|
cellLines |
Name of cell-line(s) that have been
used for testing mutant activities. "(endo)" indicates that activities have been tested on endogenous mutants.
|
Assay design |
Indicates if the assay has been performed with or without wtp53 as control, or if activity has been tested on endogenous mutant. |
Method |
Details on type of experimental assay that was performed to assess function. |
FRef_ID...PubMed |
see reference annotations).
|
systematicFunctionalAssessment-IARC TP53 Database, R20
The functional data that are included in this dataset were provided by Chikashi Ishioka and have been published in
Kato et al., Kakudo et al., and
Kawaguchi et al..
Column head |
Description |
ProtDescription...codon number |
see mutation annotations |
WAF1nWT, MDM2nWT, BAXnWT,... |
Promoter-specific transcriptional activity measured in yeast functional assays
and expressed as percent of wild-type activity. |
WAF1nWT_Saos2, MDM2nWT_Saos2,... |
Promoter-specific transcriptional activity measured in the human cell-line Saos-2. Values are normalized with p53-null vector values and expressed as percent of wild-type activity. |
SubG1nWT_Saos2 |
Induction of apoptosis by overexpression in Saos-2 cells expressed as percent of wild-type activity. |
Oligomerisation_yeast |
Capacity of mutant protein to form oligomer:
TETR=can form tetramer,
DIM=can form dimer but not tetramer,
MON= can not oligomerarize. |
TP53 status of Human Cell-Lines
CellLinesMutationStatus-IARC TP53 Database,R20
This dataset includes cell-lines that have been screened for TP53 mutation and have been
published in the scientific literature, or in the
Sanger cell-line database or the
Broad Cancer cell-line Encyclopedia.
Column head |
Description |
Sample_ID |
Unique sample identification number. |
Sample_name |
Name of the cell-line. |
ATCC_ID |
Identification number of the ATCC database. |
Cosmic_ID |
Link to sample cell-line data in the Cancer Cell Line Project
of COSMIC databases of the Sanger Institute.
|
depmap_ID |
Link to sample cell-line data in the depmap project.
|
Short_topo...Tumor_origin |
see sample annotations |
Add_info |
|
Sex |
Gender of the patient from whom the cell-line has been isolated. |
Age |
Age at cancer diagnosis of the patient from whom the cell-line has been isolated. |
Country...Population |
see patient annotations. |
Germline_mutation |
Germline mutation in TP53 or any other gene carried by the individual from which the cell-line has been isolated. |
Infectious_agent |
Infectious agent (virus or bacteria) detected in the individual from which the cell-line has been isolated. |
Tobacco |
Smoking habit of the individual from which the cell-line has been isolated. |
Alcohol |
Drinking habit of the individual from which the cell-line has been isolated. |
Exposure |
Reported exposure of the individual from which the cell-line has been isolated. |
KRAS_status |
Status of KRAS gene. WT=
wild-type; MUT=mutant (base change indicated in brackets)
|
Other_mutations |
Name of other genes in which a mutation has been identified. |
TP53status |
Status of TP53 gene. WT= wild-type gene sequence;
MUT= mutated gene sequence;
NULL= entire gene deletion;
LOE= loss of gene expression without gene mutation.
|
p53_IHC |
p53 immuno-staining status. |
p53_LOH |
Loss of heterozygocity at p53 locus. Yes= LOH, No= no LOH, NI= non informative, NA= no information |
MUT_ID... |
TP53 mutation description and functional properties, see
mutation and function annotations.
|
Ref_ID... |
Same as somatic Ref_ID, see
reference annotations.
|
Tissue_processing... |
see method annotations.
|
Mouse models with engineered TP3
The dataset contains mouse models with engineered p53 that are compiled in the
caMOD database or reported in the scientific literature.
Data curated at caMOD were courteously provided by the caMOD team. A direct link to the caMOD web site is available for each model for a detailed
description of model genetics and phenotypes. Data reported in the literature but not compiled in caMOD are curated at IARC and a link to PubMed
abstract is provided. For a detailed description of model genetics and phenotypes, please refer to caMOD and/or original publication
Mouse ModelsIARC TP53 Database, R20
Column head |
Description |
Model descriptor |
Model name as indicated in caMOD or original publication. |
Affected organs |
List of organs affected or targeted by transgene. |
AA change in human |
Amino-acid substitution. Note that amino-acids are numbered according to the human sequence. |
caMOD link |
Model ID from caMOD database. |
PubMed |
PMID link to original publication. |
Experimentally induced mutations
This dataset contains list of mutations in the human TP53 gene obtained from
mutagenicity assays in the Hupki mouse model (MEF cells treated with the indicated carcinogen agent) or in
a yeast assay. See original papers for detailed methods.
inducedMutationIARC TP53 Database, R20
Column head |
Description |
MUT_ID |
Unique ID for the mutation, used across datasets. |
Exposure |
Agents to which were exposed the cells. |
c_description |
Mutation described on the cDNA sequence. See mutation annotations |
g_description |
Mutation described at the genome level. See mutation annotations |
Model |
Experimental assay/model used. |
Clone_ID |
ID of cell clone isolated from the exposed cell population. |
Add info |
Additional details provided on assay or cell clone as derived from original publication. |
PubMed |
PMID with link to PubMed abstract that describe the model. |
Tumor samples are classified according to standards of the International Classification of Diseases
for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000) and SNOMED.
For information on tumor classification, grading and staging, check out
ICD-O training at SEER,
Collaborative staging initiative,
Cancer Information at NCI, and Oncologychannel.com.
Column head |
Description |
Sample_source |
Nature of the sample from which the
mutation has been identified: cell-line, surgery (surgical or
autopsy specimen, including fresh samples and archival, pathology
specimen), biopsy, xenograft, body fluid (blood, saliva,
urine...).
|
Tumor_Origin |
Origin of the tumor sample. Terms
occurring in this column are: primary, secondary (second primary
tumor in the same patient), metastasis (with the localisation of
the metastasis in brackets), recurrent (tumor recurrence).
|
Topography |
Site of the tumor defined by organ or
group of organs, according to the ICD-O nomenclature. (examples:
"colon", "brain", "bronchus and
lung"). Note that some tumors are annoted
"Head&Neck,NOS" or "Colorectum,NOS"
because no detail is given in the original publication (NOS= not
otherwise specified).
For the database search tool, a short name is used in place of the
ICD-O name (example: "Lung" for "bronchus and
lung"). See a
numerical list of topographies.
For metastasis, the topography corresponds to the primary
site of the tumor and the site of metastasis is indicated in
brackets in the tumor_origin field.
|
Short_topography |
For the database search tool, a short
name is used in place of the ICD-O name (example: "Lung"
for "bronchus and lung"). See
a numerical list of topographies.
|
Topo_code |
ICD-O code for topography. |
Sub_topography |
Precise identification of anatomic
site, organ or tissue. The description given in the publication is
translated to ICD-O nomenclature.
|
Morphology |
Tumor type, including morphology
and/or histologic type. The terminology used is based on ICD-O (2nd
and 3rd editions) and SNOMED classifications. Terms have been
added, such as 'normal tissue' or 'na'. See
alphabetical list of morphologies.
|
Morpho_code |
ICD-O or SNOMED codes for morphology. |
Grade |
Information on tumor grade, as given in the cited publication. |
Stage |
Information on tumor stage, as given in the cited publication. |
TNM |
TNM classification (Tumor size, Node
status, Metastasis status) for staging. For information on this
classification system, click here.
|
Column head |
Description |
Sex |
Sex of the patient (M for male, F for female). |
Age |
Age of the patient at the time of
diagnosis.
|
Ethnicity |
Ethnicity of the patient (when
available). Groups are defined as: Asian, Black, Caucasian...
|
Country |
Country/Region in which the patient was living at the time of surgery.
When not otherwise specified in the original publication, the country corresponding to the address of
the hospital is entered.
|
Population |
Grouping by population. To see the country/population classification click
here.
|
Region |
Grouping by region. To see the country/region classification click
here.
|
Development |
Grouping by development status. To see the country/development classification click
here.
|
Geo_area |
City or region within the country of
living of the patient. When not specified in the original
publication, the city where the surgery has been done is entered. |
The same references (same Ref_ID) are used for the somatic,
prevalence and prognosis data sets. Independent references are used for
the Function and Germline data sets.
Column head |
Description |
Ref_ID |
Unique identification number for a reference.
|
Cross_Ref_ID |
Ref_ID of a reference containing related data or additional information.
|
Title |
Title of the publication. |
Authors |
List of authors. |
Year |
Year of publication. |
Journal |
Name of the journal (PubMed catalogue)
|
Volume |
Volume number. |
Start_page |
First page number. |
End_page |
Last page of article. |
PubMed_entry |
PubMed identification number from NCBI. |
Comment |
Any relevant information |
Exclude_analysis |
Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple mutations, and/or a high proportion of rare variants or variants classified as neutral or functional). |
WGS_WXS |
Whole genome or whole exome sequencing study. |
Mutation detection method
Column head |
Description |
Tissue_processing |
Indicates if the sample analysed was
fresh, fixed or frozen.
|
Start_material |
Indicates if DNA or RNA was screened for mutations.
|
Prescreening/Method |
Prescreening method used to select sample to be sequenced:
‘SSCP’ for single strand polymorphism, ‘DGE’ for
denaturant gel electrophoresis, ‘FASAY’ for yeast assay,
‘none’ if no prescreening was done, etc…
|
Material_sequenced |
Indicates if the DNA or RNA was cloned or not (direct) before sequencing.
|
Exon2-11 |
Exons that have been screened for mutation. In the downloaded file,
"-1" or "TRUE" indicate that the exon has been screened and "0" or "FALSE" indicate that it has not been screened. |
Graphs and search options
- Gene variations.
This option allows the functional and structural analysis of all possible single nucleotide substitutions in TP53
exonic sequences (including those that have never been reported in cancer). In addition, all other types of mutations that have
been reported in human samples and validated polymorphisms are included in this dataset. Functional and structural
annotations and frequency statistics for these gene variations can be retrieved with this search option.
Each dataset entry corresponds to a unique gene variation.
- Somatic mutations.
This option allows the retrieval and analysis of TP53 mutations reported as somatic events in tumor samples and cell-lines.
Each dataset entry corresponds to a mutation identified in a human sample.
- Somatic mutation prevalence.
This option allows the analysis of the prevalence of TP53 somatic mutations by cancer type and population groups.
Each dataset entry corresponds to the prevalence of TP53 mutation for a specific type of cancer in a defined human population.
- Germline mutations.
This option allows the retrieval and analysis of TP53 mutations reported as germline events in human individuals.
Each dataset entry corresponds to a tumor identified in an individual carrier of a TP53 germline mutation.
The searchable dataset only includes cancer-affected individuals who are confirmed or obligatory carrier of a TP53 mutation
(data on non-affected carriers or non-confirmed carrier can be retrieved by downloading the full dataset with the 'data downloads' option).
- Germline mutations prevalence.
This table lists diferent studies reporting the prevalence of TP53 germline mutation in selected groups of individuals.
- Cell-lines.
This option allows the retrieval and analysis of TP53 mutations reported in human cell-lines.
Each dataset entry corresponds to a mutation identified in a cell-line.
- Mouse models.
This option allows the display or download of the description of mouse models with engineered p53 that are compiled in
the caMOD database or reported in the scientific literature. Links to caMOD database are available for further details on the model phenotypes.
Mutation distribution graphs
- Mutation type. Proportion of mutations classified by their nature (base change, insertions, deletions....):
number of mutations of each class divided by the total number of mutations selected (% is shown).
- Codon distribution. Proportion of exonic point mutations at each codon position:
number of mutations at each codon position divided by the total number of exonic mutations selected (% is shown).
- Exon/intron distribution. Proportion of mutations in each exon/intron:
number of mutations within each Exon/intron divided by the total number of mutations selected (% is shown).
- 3D JMOL graph. Residues (within the central domain of p53 protein -codons 96 to 289) are highlighted according to the proportion of exonic mutations at this position (start site of mutation) among all selected mutations:
number of mutations at each codon position divided by the total number of exonic mutations selected: red colored are the most frequently mutated, yellow colored the less frequently mutated, orange are intermediate.
- Mutation effect. Proportion of mutations classified according to their predicted effect on protein sequence (missense, nonsense, frameshift ins/del, …):
number of mutations of each class divided by the total number of mutations selected (% is shown).
- Point mutation. Proportion of single amino-acid substitutions classified according to their predicted effect on protein sequence (missense, nonsense, silent):
number of mutations of each class divided by the total number of point mutations selected (% is shown).
- Point mutation dot-plot. Each dot represent a specific point mutation, colored according to their predicted effect on protein sequence (missense in blue, nonsense in red and silent in green);
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).
- SIFT. Proportion of missense mutations classified according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm:
number of mutations of each class divided by the total number of missense mutations selected (% is shown).
- SIFT dot-plot. Each dot represent a specific point mutation, colored according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm;
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).
- Transactivation. Proportion of missense mutations classified according to their experimentally measured transactivation activities (based on FASAY):
number of mutations of each class divided by the total number of missense mutations selected (% is shown).
- Transactivation dot-plot. Each dot represent a specific point mutation, colored according to their experimentally measured transactivation activities;
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).
Tumor distribution graphs
- Germline data. Distribution of tumor sites associated with the selected mutations;
number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected mutations (% is shown).
- Somatic data. Distribution of tumor sites associated with the selected mutations;
number of mutations classified by tumor site divided by total number of mutations observed in tumors carrying the selected mutations (% is shown).
- Gene variation data, somatic graph. Proportion of the selected mutations among all mutations reported in the database by tumor sites;
number of selected mutations classified by tumor site divided by total number of mutations in the database for each tumor sites (% is shown).
- Gene variation data, germline graph. Distribution of tumor sites associated with the selected mutations;
number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected mutations (% is shown).
- Mutation prevalence. Proportion of mutated samples by cancer site (topography graph), cancer type (morphology graph), or by country of origin of the patients (country graph);
number of mutated samples divided by total number of samples analyzed (% is shown).