1,609 questions
1
vote
1
answer
96
views
Best practices for SLURM job pipeline with wrapper scripts - avoiding complex job ID extraction
I'm building a SLURM pipeline where each stage is a bash wrapper script that generates and submits SLURM jobs. Currently I'm doing complex job ID extraction which feels clunky:
# Current approach
...
1
vote
0
answers
39
views
How to run Neo4j Docker container using Singularity on HPC without shutdown during data import?
I'm trying to run the Neo4j Docker container using Singularity on an HPC system. The container starts successfully, but it shuts down automatically when I try to add data to the database (e.g., via ...
1
vote
1
answer
56
views
Debugging parallel python program in interruptible sleep
I have a mpi4py program, which runs well with mpiexec -np 30 python3 -O myscript.py at 100% CPU usage on each of the 30 CPUs.
Now I am launching 8 instances with mpiexec -np 16 python3 -O myscript.py. ...
1
vote
0
answers
73
views
Slurm: salloc gets allocated then fails immediately with ExitCode=1:0 (Start=End same second), while equivalent sbatch works
I’ve been using salloc to allocate compute nodes without issues before. Recently, after switching to another user account (same .bashrc config, only the conda path changed), salloc stopped working. I ...
0
votes
0
answers
41
views
Postgresql, Postgis, QGIS in container launched from charliecloud
I need to migrate my work for geospatial processing (using mainly qgis processing and postgis functions from python scripts) to a HPC cluster. As neither qgis nor postgis are installed on the HPC I ...
0
votes
1
answer
142
views
Spack `spack load` not setting LD\_LIBRARY\_PATH or CPATH environment variables as expected
I'm using Spack on Linux Mint to manage scientific libraries, including armadillo. I have installed Armadillo and its dependencies via Spack in an enviroment.
Problem:
When I run spack load armadillo, ...
0
votes
0
answers
50
views
slurmstepd: error: execve(): mkdir: No such file or directory
I tried to use the sbatch file from this link (Running WindNinja on an HPC Cluster) to run the WindNinja software (WindNinja introduction) installed on HPC. However, it always produce the "...
0
votes
0
answers
63
views
How to force Slurm to pack GPU jobs onto partially occupied nodes to free full nodes?
When users request 1-2 GPUs via sbatch --gres=gpu:1, Slurm locks the entire 8-GPU node. This fragments our cluster:
Multiple small requests spread across nodes (e.g., four 1-GPU jobs occupy four ...
0
votes
1
answer
48
views
how to use mkl_dcsrgemv or other functions in OneAPI to cal. scalar prodoct between mass dim sparse matrix and vector?
I program in fortran with Intel OneAPI compiler ifx and MKL packages.
I want to cal. the scalar product between a mass dim sparse matrix and a vector.
When the dim of the sparse matrix could be ...
0
votes
1
answer
76
views
How can I run snakemake jobs 'remotely'?
I love snakemake and have used it locally as well as on HPC with SLURM!
However, now we have a particular setup where it is not as easy to use snakemake as we have done before:
We need to run some ...
0
votes
0
answers
45
views
Sample UCP AM client failing with error "Destination is unreachable" for localhost
I'm learning UCX by creating a basic wrapper for both the client and server. I am using AM communication. When I run my client, I get below error :
[1749297901.816001] [prateek:19822:0] ...
0
votes
0
answers
85
views
Can I use MPI_File_read_all to read non contiguous datatypes directly (as opposed to setview)?
I'm trying to read different subsets of non-contiguous data from a file to different processes.
Ie:
I have a file with the data:
a b c d e f g h i j
and two processes who want to read the data from ...
1
vote
2
answers
88
views
What is the difference between an MPI nonblocking collective write, iwrite_all vs a "nonblocking" noncollective iwrite combined with a file sync?
I'm setting up IO for a largescale CFD code using the MPI library and the file IO is starting to eat into computation time as my problems scale.
As far as I can find the "done" thing in the ...
0
votes
0
answers
44
views
Slurm partitions on same node overallocating CPUs
I have a single computation node with 32 CPUs. I have defined two different partitions that both use this node. If I for example send two jobs on partition A requesting 20 CPUs and 25 CPUs, the second ...
0
votes
1
answer
59
views
Snakemake access snakemake.config in profile config.yaml file
I want to run a pipeline on a cluster where the name of the jobs are of the form : smk-{config["simulation"]}-{rule}-{wildcards}. Can I just do :
snakemake --profile slurm --configfile ...