In the Tohoku Medical and Megabank Project (TMM), we have constructed reference panels by conducting a large-scale whole genome sequencing of DNA samples from participants in our cohort studies. We sequentially expanded the size from initial version of 1KJPN, which analyzed 1,070 people in 2015, to 2,049 people (2KJPN) in 2016 and 3,554 people (3.5KJPN) in 2017. During this time, in response to the consistent progress of genome analysis methods, we also have been making continuous efforts to improve the methods applying to our reference panels.
In the most recent update to 3.5KJPN performed in 2017, we released the variants information previously excluded by some filters along with filtered datasets. Here, we would like to inform users of our reference panel about the changes we have incorporated upon the release of 3.5KJPN. Below is a summary of the changes and our future actions.
【Main changes in the release of 3.5KJPN】
We have released all variants, including the ones having three or more alleles, violating Hardy-Weinberg's law and containing the possibility of errors.
Reference <<You can download the file “MULTI-ALLELIC SNVS IN 3.5KJPN” for the multiallelic sites after your regisitration via the iJGVD website download page .>>
【Reason for changes】
It is becoming clear that the human genome has incorporated a large number of mutations owing to recent rapid expansion, etc. This enables multiple mutations to exist at the same site, resulting in three or four alleles. From a technical viewpoint of genomic analysis, such sites were not subject to analysis because they did not follow the traditional model of human genetics, and were therefore excluded from the international whole genome-resequencing projects. We also excluded them when constructing 1KJPN and 2KJPN. However, after careful analysis and examination by ourselves, we have concluded that such sites should not be excluded, and are thus included in the release of 3.5KJPN.
It is noted that the existence of sites which do not conform to the conventional model is introducing new theories to the field. We believe that the construction of a whole genome reference panel is a continuously evolving scientific challenge.
Please note that the current 3.5KJPN panel is constructed in the same method as described in the paper of 1KJPN and excluded substantial numbers of variants, but in addition we have released unfiltered datasets concomitantly.
Taking into consideration of the facts that it has already been two and half years since the release of 1KJPN and that various analyses have since been carried out throughout Japan, we reached a conclusion that additional panels which a common pipeline is applied to will make various comparisons easy and will bring many useful approaches to exploit our reference panel at a higher level.
For this reason, we are currently performing a re-analysis, using the standard analysis pipeline. We will put our highest priority on this venture and hope to deliver a new, 4,000-person panel version, by June 2018.