Most of the machines of the Sněhurka computing cluster in Karlín was moved to a new server room in Troja and was integrated into the Chiméra cluster there. Users can now take full advantage of the relocated cluster. We ask users to move their cluster data from Karlín to Troja by September 30, 2026.
Details of the relocation are provided below:
- Transfer of cluster data from Karlín to Troja
- Relocation schedule
- What is the Chimera cluster?
- Where and why is the Karlín cluster moving?
- How will the sharing of unused computing capacity work?
- Which Karlín machines will be moved to Troja?
- What is the RSE (Research Software Engineering) group?
Transfer of cluster data from Karlín to Troja
The current situation is as follows:
- The data for the Karlín cluster (both cluster "home" [
/usr/users/login] and "work" [/usr/work/login]) is now stored on an old disk array (pole2018).- We purchased the disk array in 2018 (i.e., 8 years ago), and it is no longer under warranty.
- The disk array has two controllers (components that manage the operation of the entire array) – one serves as a backup for the other. However, one controller is no longer functional, so while the array continues to operate, we no longer have a "backup" controller.
- The data in cluster "home" is backed up (usually every night), but the data in "work" is not backed up.
- We will need space for users who will continue to use GPU nodes g1–g6 (which will remain in Karlín).
- The storage capacity of the Troja cluster has recently been significantly increased, so there is now sufficient space available.
In light of the above:
- We ask Karlín users who do not use the Karlín GPU nodes (g1–g6) to move their cluster data from Karlín to Troja (not just "copy", but actually "move" – to free up space in Karlín) by the end of the academic year (i.e., by September 30, 2026). After that, only the cluster data of GPU nodes (g1–g6) users should remain in Karlín (on the pole2018 disk array).
- Instructions on how to transfer data to the Chimera cluster are available on the Chimera (usefull notes for migration) webpage.
- We ask users who wish to continue performing computations on the g1-g6 GPU nodes (i.e., whose cluster data will remain in Karlín) to contact me (so we are aware of them) at the email address martin.trcka@matfyz.cuni.cz.
- We then plan to move the remaining data (i.e., data from users of g1–g6) to a Karlín storage location that is better protected against failure (than the pole2018 disk array). Details are yet to be determined.
Relocation schedule
The dates listed are tentative and subject to change depending on how the situation develops.
- ✅ Friday, February 6, 2026, 11:00 a.m.: cluster training (Jaroslav Hron, also introduction to the RSE group). Location: K7 (ground floor). Presentation and video recording from training session
- ✅
Mid-FebruaryApril 8, 2026: completion of the new server room in Troja(if completion is delayed, the relocation dates will also have to be postponed)
- ✅
Probably February 24, 2026 (Tuesday)March 2, 2026 (Monday)April 9, 2026 (Thursday): first wave of relocation = transfer of the Troja Chiméra cluster from the old server room to the new one (i.e., Chiméra cluster downtime –a few days)
Probably March 10, 2026 (Tuesday)April 15 (Wednesday): second wave of relocation = transfer of the Karlín Sněhurka cluster (and also servers from other locations, e.g., Malá Strana, Jinonice, Ovocný trh) to the new server room in Troja- ✅ April 14 (Tuesday) 12:00 p.m. (half a day earlier) Sněhurka cluster downtime – preparations for the cluster relocation (disconnecting nodes and cabling)
- ✅ April 15 (Wednesday): relocation of most cluster nodes to Troja
- ✅ The following was moved to Troja: CPU nodes (r31-r50), InfiniBand 100 Gb switch (and related InfiniBand cabling: for all nodes + 2 more), Ethernet switch (48 x 1 Gb/s, karc)
- ✅ The following stays in Karlín: GPU nodes (g1–g6), InfiniBand 40 Gb switch, head nodes (r3d3, r0d0), r6 (for GitLab continuous integration), disk array ("home" and "work", pole2018)
- After the relocation of hardware (as soon as possible, but it may take a few days):
- ✅ Karlín: Reconnecting the remaining cluster components (g1–g6, switches, etc.) and bringing them back online. The head node (r3d3), disk array ("home" and "work"), and GPU nodes (g1–g6) will continue to function as before; however, all CPU nodes (r31–r50) will no longer be present.
- ✅ Notify users that the GPU nodes (g1–g6) and disk array are up and running again.
- Troja: Integrating all Karlín CPU nodes (r31–r50) into the Chimera cluster and setting up an environment similar to the one in Karlín:
- ✅ Installation of nodes and switches in a rack, and cabling. Connecting cluster nodes (r31–r50) to the original InfiniBand switch (100 Gb/s) and connecting that switch to the local InfiniBand infrastructure.
- ✅ All nodes: OS installation and configuration, integration with SLURM.
- ✅ Modules (software) – installation and verification
- ✅ Creating partitions (queues) similar to those in Karlín
- ✅ Create a user group (probably named "math") that will have priority access to the cluster nodes from Karlín
- ✅ Notify users that the Karlín cluster relocation is complete
- Troja: disk space expansion:
- ✅ New storage servers (with all those storage disks) arrived
- ✅ Get storage servers up and running
- ✅ Notify users that disk space has been expanded (and that they can start transferring data from Karlín)
- ✅ Karlín: Reconnecting the remaining cluster components (g1–g6, switches, etc.) and bringing them back online. The head node (r3d3), disk array ("home" and "work"), and GPU nodes (g1–g6) will continue to function as before; however, all CPU nodes (r31–r50) will no longer be present.
What is the Chimera cluster?
The Chimera cluster is an HPC (high-performance cluster) currently located in the "old" server room in Troja (and will also be moved to the "new" server room in Troja). More detailed information about the cluster can be found on the website: https://www.mff.cuni.cz/en/hpc-cluster/ (login via CAS is required).
The cluster website also has a section dedicated to Introductory training (June 2022), which includes a link to slides [pptx, 17 slides, 9 MB] (the slides don't contain all the material covered in the hands-on session) and a recording of the entire training session [mp4, 2 h 25 min, 1.7 GB].
Where and why is the Karlín cluster moving?
There are several different computing clusters in various locations at the Faculty of Mathematics and Physics. A new server room is currently being completed in Troja, to which clusters from other locations will gradually be moved and unified under central administration.
Advantages of consolidating and unifying computing clusters:
- More efficient use of technical resources:
- Space, electricity, cooling, data network, data storage
- More efficient use of human resources:
- Hardware and software management, easier sharing of know-how
- It is difficult for users to work with multiple different clusters (different accounts, controls, settings, rules etc.)
- The unified cluster will have support that local clusters do not have:
- In the first half of 2026, the cluster will be expanded with new computing nodes (CPU and GPU) and other equipment worth a total of approximately CZK 24 million (these new nodes will be available to all users)
- Users can get support in using the cluster (from Research Software Engineering group)
- The university will contribute financially to the operation of the cluster
How will the sharing of unused computing capacity work?
The sharing of computing nodes will work on the following principle:
- Those who provided/financed the computing nodes (schools, departments, groups) will have priority access to these nodes (in the form of higher priority computing queues).
- However, when the nodes are not in use, anyone else will be able to use them. Each node in the cluster will be part of a queue that will have the lowest priority but will be of the FFA (free for all) type.
Which Karlín machines will be moved to Troja?
The following will be moved to Troja:
- all CPU nodes (r31–r50)
- InfiniBand 100 Gb switch
In addition, two new computing CPU nodes purchased from the UNCE project (prof. J. Málek) will be delivered directly to Troja (expected delivery date is in the first half of 2026).
The following will not be moved to Troja:
- all GPU nodes (g1–g6), because their form factor (dimensions and other physical characteristics, including cooling requirements) is not suitable for mounting in server room rack stands
- InfiniBand 40 Gb switch
What is the RSE (Research Software Engineering) group?
RSE group is a university group of people (currently approx. 3-4 employees) who can help users with the use of the cluster. It serves as an interface between the latest technologies and the academic environment.
The main goal of RSE is to reduce barriers to computing resources, for example:
- Assistance in the development of scientific code (new features, optimization, parallelization, version control)
- Deployment on new computing infrastructure
- Commissioning of complex computing pipelines, selection of suitable tools
Website: https://rse.cuni.cz/
Information about RSE was also presented at Jaroslav Hron's training course.
