02. March 2026 · Comments Off on April 15, 2026: Sněhurka cluster relocation to Troja HPC facility · Categories: Downtime, News, Novinky

Last update: April 13, 2026 (changelog: more detailed Relocation schedule, new subsection Transfer of cluster data, added RSE presentation and link to RSE webpage, status update)

Most of the machines in the Sněhurka computing cluster in Karlín will be moved to a new server room in Troja and integrated into the Chiméra cluster there. We plan to move the cluster in the first half of Marchon on Wednesday, April 15, 2026, with the cluster outage beginning half a day earlier, at 12:00 on Tuesday, April 14, 2026.

This change will affect Sněhurka cluster users in two ways:

  1. During the move (expected to take a few days), the relevant Sněhurka cluster computing nodes will be unavailable. Chiméra cluster nodes will be available during this time.
  2. The control and rules of the Chiméra cluster differ in some respects from those of the Sněhurka cluster. On February 6, 2026, Jaroslav Hron organized a cluster training session where these changes were explained (see below for more detailed information about the training).

Details of the relocation are provided below:

Relocation schedule

The dates listed are tentative and subject to change depending on how the situation develops.

  • Mid-February April 8, 2026: completion of the new server room in Troja (if completion is delayed, the relocation dates will also have to be postponed)
  • Probably February 24, 2026 (Tuesday) March 2, 2026 (Monday) April 9, 2026 (Thursday): first wave of relocation = transfer of the Troja Chiméra cluster from the old server room to the new one (i.e., Chiméra cluster downtime – a few days)
  • Probably March 10, 2026 (Tuesday) April 15 (Wednesday): second wave of relocation = transfer of the Karlín Sněhurka cluster (and also servers from other locations, e.g., Malá Strana, Jinonice, Ovocný trh) to the new server room in Troja
    • April 14 (Tuesday) 12:00 p.m. (half a day earlier) Sněhurka cluster downtime – preparations for the cluster relocation (disconnecting nodes and cabling)
    • April 15 (Wednesday): relocation of most cluster nodes to Troja
      • The following will be moving to Troja: CPU nodes (r31-r50), InfiniBand 100 Gb switch (and related InfiniBand cabling: for all nodes + 2 more), Ethernet switch (48 x 1 Gb/s, karc)
      • The following stays in Karlín: GPU nodes (g1–g6), InfiniBand 40 Gb switch, head nodes (r3d3, r0d0), r6 (for GitLab continuous integration), disk array ("home" and "work", pole2018)
    • After the relocation of hardware (as soon as possible, but it may take a few days):
      • Karlín: Reconnecting the remaining cluster components (g1–g6, switches, etc.) and bringing them back online. The head node (r3d3), disk arrays ("home" and "work"), and GPU nodes (g1–g6) will continue to function as before; however, all CPU nodes (r31–r50) will no longer be present.
      • Troja: Integrating all Karlín CPU nodes (r31–r50) into the Chimera cluster and setting up an environment similar to the one in Karlín:
        • Connecting cluster nodes (r31–r50) to the original InfiniBand switch (100 Gb/s) and connecting that switch to the local InfiniBand infrastructure.
        • All nodes: OS installation and configuration, integration with SLURM.
        • Modules (software) – installation and verification
        • Creating partitions (queues) similar to those in Karlín
        • Create a user group (probably named "math") that will have priority access to the cluster nodes from Karlín
        • Notify users that the Karlín cluster relocation is complete

Transfer of cluster data from Karlín to Troja

Instructions on how to transfer data to the Chimera cluster are available on the Chimera (usefull notes for migration) webpage.

The current situation is as follows:

  1. The data for the Karlín cluster (both cluster "home" and "work") is now stored on an old disk array (pole2018).
    • We purchased the disk array in 2018 (i.e., 8 years ago), and it is no longer under warranty.
    • The disk array has two controllers (components that manage the operation of the entire array) – one serves as a backup for the other. However, one controller is no longer functional, so while the array continues to operate, we no longer have a "backup" controller.
    • The data in cluster "home" is backed up (usually every night), but the data in "work" is not backed up.
  2. Recently, user demand for cluster storage space has increased, so we're struggling with a shortage of available space.
  3. We will need space for users who will continue to use nodes g1–g6 (which will remain in Karlín).
  4. The Chimera cluster in Troja is also currently struggling a bit with a lack of space. They’ve already purchased additional storage, but the priority right now is to complete relocation of all clusters to Troja and ensure that everything is fully operational. The disk space expansion will take place afterward; we expect it to take a matter of weeks.

In light of the above:

  1. Once the storage space in Troja has been expanded (i.e., once there is sufficient space there), we will ask Karlín users who do not use the Karlín GPU nodes (g1–g6) to move their cluster data from Karlín to Troja (not just "copy", but actually "move" – to free up space in Karlín). After that, only the data of GPU node (g1–g6) users should remain in Karlín (on the pole2018 disk array).
  2. We then plan to move the remaining data (i.e., data from users of g1–g6) to a Karlín storage location that is better protected against failure (than the pole2018 disk array). Details are yet to be determined.

What is the Chimera cluster?

The Chimera cluster is an HPC (high-performance cluster) currently located in the "old" server room in Troja (and will also be moved to the "new" server room in Troja). More detailed information about the cluster can be found on the website: https://www.mff.cuni.cz/en/hpc-cluster/ (login via CAS is required).

The cluster website also has a section dedicated to Introductory training (June 2022), which includes a link to slides [pptx, 17 slides, 9 MB] (the slides don't contain all the material covered in the hands-on session) and a recording of the entire training session [mp4, 2 h 25 min, 1.7 GB].

Where and why is the Karlín cluster moving?

There are several different computing clusters in various locations at the Faculty of Mathematics and Physics. A new server room is currently being completed in Troja, to which clusters from other locations will gradually be moved and unified under central administration.

Advantages of consolidating and unifying computing clusters:

  • More efficient use of technical resources:
    • Space, electricity, cooling, data network, data storage
  • More efficient use of human resources:
    • Hardware and software management, easier sharing of know-how
  • It is difficult for users to work with multiple different clusters (different accounts, controls, settings, rules etc.)
  • The unified cluster will have support that local clusters do not have:
    • In the first half of 2026, the cluster will be expanded with new computing nodes (CPU and GPU) and other equipment worth a total of approximately CZK 24 million (these new nodes will be available to all users)

How will the sharing of unused computing capacity work?

The sharing of computing nodes will work on the following principle:

  1. Those who provided/financed the computing nodes (schools, departments, groups) will have priority access to these nodes (in the form of higher priority computing queues).
  2. However, when the nodes are not in use, anyone else will be able to use them. Each node in the cluster will be part of a queue that will have the lowest priority but will be of the FFA (free for all) type.

Which Karlín machines will be moved to Troja?

The following will be moved to Troja:

  • all CPU nodes (r31–r50)
  • InfiniBand 100 Gb switch

In addition, two new computing CPU nodes purchased from the UNCE project (prof. J. Málek) will be delivered directly to Troja (expected delivery date is in the first half of 2026).

The following will not be moved to Troja:

  • all GPU nodes (g1–g6), because their form factor (dimensions and other physical characteristics, including cooling requirements) is not suitable for mounting in server room rack stands
  • InfiniBand 40 Gb switch

What is the RSE (Research Software Engineering) group?

RSE group is a university group of people (currently approx. 3-4 employees) who can help users with the use of the cluster. It serves as an interface between the latest technologies and the academic environment.

The main goal of RSE is to reduce barriers to computing resources, for example:

  • Assistance in the development of scientific code (new features, optimization, parallelization, version control)
  • Deployment on new computing infrastructure
  • Commissioning of complex computing pipelines, selection of suitable tools

Website: https://rse.cuni.cz/

Information about RSE was also presented at Jaroslav Hron's training course.

25. February 2026 · Comments Off on Chimera · Categories: News, Novinky, Tutorials

Collection of useful notes for migration from Sněhurka cluster to Chimera cluster.

Moving data to/from the cluster

For details, see Jaroslav Hron's presentation, page 12 / video recording, time 1 h 07 min.

# Copy a file from local machine to the cluster
scp local.dat <name>@hpc.troja.mff.cuni.cz:/path/to/

# Sync a directory (recommended for repeated transfers)
rsync -avzP ./results/ <name>@hpc.troja.mff.cuni.cz:/path/to/results/

If you encounter any problems, please contact us at clusteradmin@karlin.mff.cuni.cz.

30. July 2015 · Comments Off on Volné místo na disku · Categories: News, Novinky

V poslední době dochazí k pomalému zaplnění disku vyhrazeného pro /usr/nobackup. Pokud máte data, o kterých víte že je nebudete potřebovat je čas je smazat 🙂

Informace a doporučení jak zacházet s větším objemem dat najdete zde.

Je připravován nový diskový prostor s větší kapacitou, který bude v nejbližší době zprovozněn.