Tohoku University Tohoku Medical Megabank Organization (ToMMo) has performed sequencing of genomes of participants in the cohort studies of Tohoku Medical Megabank Project. In November 2013, we announced the completion of a whole genome sequencing of 1,070 healthy participants. In August 2014, we have released data on single nucleotide polymorphisms (SNPs) with allele frequencies of 5% or greater on our website named iJGVD (integrative Japanese Genome Variation Database) *1. The same information is also available through NBDC (National Biosciences Database Center), a national database center of Japan.
In August 2015, we published a paper regarding Tohoku Medical Megabank project’s Whole Genome Reference Panel on the bases of these data in Nature Communications *2. In the 1,070 participants’ genomes, we found 21.2-million high-confidence SNVs, of which 56.6% were novel when compared to dbSNP data at that time. We have now set up a data set of the SNVs in the panel to be open to the researchers.
The positions, allele frequencies and allele counts of all SNVs in the Whole Genome Reference Panel of Tohoku Medical Megabank Project.
Users can immediately download the data upon agreement to follow our user guidelines which include: 1) allowed to use only for non-profit (academic) purposes; 2) prohibition of identification of the cohort participants; 3) prohibition of contact with the cohort participants; etc.
After an approval from Sample and Data Access Committee of Tohoku Medical Megabank Project which is scheduled on December 15th, 2015.
Our change in the policy for opening data of genomic analysis of the cohort participants was made under the following sincere discussions by the members of Ethical, Legal, and Social Issues (ELSI) Committee and Sample and Data Access Committee for the Tohoku Medical Megabank Project.
It is almost not possible to identify participants of the Cohort study only though allele frequency of SNVs because no individual's genotype data is disclosed at iJGVD site. Further, the user of the genomic data should agree to the following terms: prohibition of identification/specification of the participants by combining the data and other possibly available social, biological, and clinical data. We also prohibit commercial use of the data set.
To exclude possibility that a participant is identified by SNVs with very low allele frequencies in combination with epidemiological data (social, biological, or clinical information), epidemiological data would be released in a way no one can easily connect the genomic and epidemiological data.
SNVs information released this time, especially those with low allele frequencies, may or will be proved to be strongly associated with a certain disease. At the same time, we recognize that such variant data in our panel inevitably include errors at a certain rate because we use next generation sequencers and perform statistical methods to identify SNVs. We think that our current data set has not met the quality that is required to estimate and report individual's risk for the disease as in clinical settings. We have to let users of the data set understand the current limitation of the genomic analyses data.
The Medical Megabank Project seeks possibility that return of reliable and beneficial genetic analysis results to the cohort participants. We, however, have to establish validation processes before we start to inform participants of the genotype results, because as stated above our current data were derived from a pipeline of next generation sequencer analysis and such data is not accurate enough for individual return of genotype data.
More details and updates on this subject will be announced on this website.
*1: iJGVD : http://ijgvd.megabank.tohoku.ac.jp/