In April 2022, the supercomputer system behind the Tohoku Medical Megabank Project underwent a wide-scale system expansion. The supercomputer system has only undergone one update (in June 2018, for the purposes of making it more widely accessible) since it was first put into operation in July 2014. This marks the second significant update since first update.
-The total number of CPU cores has increasedfrom 8,900 to 14,880, and the capacityof high-speed storage hasaugmentedfrom 29PB to 55PB,resulting in considerably enhanced computing power.
-The number of compute nodes per unit in the new system can be easily changed, allowing computing capacity to be dynamically redistributed from units with ample capacity to those with insufficient capacity, resulting in increased flexibility and reduced congestion.
-The public unit (Unit A) of the new system can now be accessed from the user's PC via a web browser, providing added convenience.
ToMMo’s supercomputer stores a variety of data, from highly-sensitive data such as individual-level genome data to low-sensitive data such as allele frequencies. Data must be managed according to its level of sensitivity. To accomplish this, ToMMo’s supercomputer is divided into three units, and the type of data that can be analyzed in each unit is defined. Unit A is for development publicly accessible via internet, Unit B is for data distribution by data visiting, and Unit C is for data preprocessing. The processed data in Unit C is transferred to Unit B for the use from all over Japan by data visiting.
In the previous system, different physical compute nodes were deployed for each unit, and each unit was operated independently. This resulted in issues, such as Unit A being congested when being utilized for large amounts of analysis, meanwhile Unit B was completely idle. The unused compute nodes in Unit B could not be used in Unit A, and therefore this raised the challenge of creating a system where nodes were efficiently used.
In the new system, all compute nodes are connected to the same physical network, where the isolation of units is implemented virtually. This allows the administrator to easily change the number of compute nodes per unit according to their congestion status. As a result, not only does it increase the flexibility of the supercomputer system, but it also reduces congestion.
Like the compute nodes, the storage system in the new system also employs virtual isolation mechanisms. This will lead to greater efficiency in terms of storage usage in the data distribution process at ToMMo. In the previous system, data analyzed in the unit for data preprocessing (Unit C) were copied to the unit for data distribution (Unit B) to provide the data to collaborators. This meant that the same file existed in both Unit B and Unit C, consuming twice as much storage space. In contrast, the new system allows the same file to be used from both Unit B and Unit C, resulting in more efficient use of storage capacity.
In the unit for data distribution (Unit B) and the unit for data preprocessing (Unit C) of the previous system, in order to achieve secure management of sensitive data (such as genome data), a two-factor authentication was required to log in to the system. First, the user was to log in to a Windows virtual desktop, and then the user was to log in to a Linux compute node to access the supercomputer. This two-step login mechanism was difficult to understand and unpopular with users. The new system allows users to log in directly to the compute node with a single authentication to improve usability.
Additionally, the previous system utilized a combination of fingerprint authentication and public key authentication to log in to the system, whereas the new system utilizes two-factor authentication with the user’s ID, password, and their smartphones.
To access Unit A of the previous system, users needed to install an SSH client or dedicated connection software on their PC, which made things complicated. On the other hand, the new system has saved users a lot of time and effort by letting them connect through a GUI that can be accessed by opening a URL for Unit A in a web browser.
The previous system was equipped with an approximately 29PB LTO tape library for data backup. In addition to the tape library, the new system also contains two new backup systems: an object storage system (rewritable) and an optical disc library (non-rewritable, for long-term storage use). The new system promotes efficient data backup by selecting the backup system based on differences such as whether the data to be backed up can be recreated and the storage period.