0

I am trying to install slurm on Ubuntu PC. Therefore, I followed the instructions given over here

I did the following -

  1. sudo apt update -y
  2. sudo apt install slurmd slurmctld -y
  3. mkdir sudo /etc/slurm-llnl FYI... I came up with step 3. by myself
  4. sudo chmod 777 /etc/slurm-llnl
sudo cat << EOF > /etc/slurm-llnl/slurm.conf
ClusterName=localcluster
SlurmctldHost=localhost
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
# COMPUTE NODES
NodeName=localhost CPUs=12 RealMemory=8000 State=UNKNOWN
PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
EOF
  1. sudo systemctl start slurmctld
  2. sudo systemctl start slurmd

Now, when I do this -

  1. sudo scontrol update nodename=localhost state=idle

I get the error -

scontrol: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
scontrol: error: fetch_config: DNS SRV lookup failed
scontrol: error: _establish_config_source: failed to fetch config
scontrol: fatal: Could not establish a configuration source

Edit 1 -

I followed the instructions given by Pau. Now, I get the following outputs -

(base) thoma@thoma-Lenovo-Legion-5-15IMH05H:/$ systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
     Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-03-05 05:57:17 CST; 2h 42min ago
       Docs: man:slurmctld(8)
   Main PID: 6509 (slurmctld)
      Tasks: 10
     Memory: 4.3M
        CPU: 2.378s
     CGroup: /system.slice/slurmctld.service
             ├─6509 /usr/sbin/slurmctld -D -s
             └─6517 "slurmctld: slurmscriptd" "" ""

Mar 05 05:58:27 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: Invalid node state transition requested for node localhost from=INVAL to=IDLE
Mar 05 05:58:27 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: _slurm_rpc_update_node for localhost: Invalid node state specified
Mar 05 06:00:07 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: Invalid node state transition requested for node localhost from=INVAL to=IDLE
Mar 05 06:00:07 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: _slurm_rpc_update_node for localhost: Invalid node state specified
Mar 05 06:01:30 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: Invalid node state transition requested for node localhost from=INVAL to=RESUME
Mar 05 06:01:30 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: _slurm_rpc_update_node for localhost: Invalid node state specified
Mar 05 06:02:13 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: Invalid node state transition requested for node localhost from=INVAL to=RESUME
Mar 05 06:02:13 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: _slurm_rpc_update_node for localhost: Invalid node state specified
Mar 05 06:02:20 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: Invalid node state transition requested for node localhost from=INVAL to=IDLE
Mar 05 06:02:20 thoma-Lenovo-Legion-5-15IMH05H slurmctld[6509]: slurmctld: _slurm_rpc_update_node for localhost: Invalid node state specified
(base) thoma@thoma-Lenovo-Legion-5-15IMH05H:/$ systemctl status slurmd
● slurmd.service - Slurm node daemon
     Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-03-05 05:57:17 CST; 2h 42min ago
       Docs: man:slurmd(8)
   Main PID: 6514 (slurmd)
      Tasks: 1
     Memory: 316.0K
        CPU: 22ms
     CGroup: /system.slice/slurmd.service
             └─6514 /usr/sbin/slurmd -D -s

Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H systemd[1]: Started Slurm node daemon.
Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H slurmd[6514]: slurmd: error: Node configuration differs from hardware: CPUs=12:12(hw) Boards=1:1(hw) SocketsPerBoard=12:1(hw) CoresPerSocket=1:6(hw) ThreadsPerCore>
Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H slurmd[6514]: slurmd: slurmd version 21.08.5 started
Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H slurmd[6514]: slurmd: slurmd started on Tue, 05 Mar 2024 05:57:17 -0600
Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H slurmd[6514]: slurmd: CPUs=12 Boards=1 Sockets=12 Cores=1 Threads=1 Memory=7838 TmpDisk=1252975 Uptime=372 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(>
lines 1-16/16 (END)


2
  • Hello, what do you get when you do a systemctl status slurmctld and systemctl status slurmd Commented Mar 5, 2024 at 4:38
  • @Marius_Couet I have updated my question with answers to your question. Commented Mar 5, 2024 at 14:41

2 Answers 2

1

Have you also started the munge service?

Make sure to start it as well by using systemctl as follows.

sudo systemctl start munge
sudo systemctl status munge

I recommend you follow this guide I wrote on how to install Slurm in a single node environment for Ubuntu 22.04.

Cheers.

1
  • I followed your instructions. When I came to this step - (base) thoma@thoma-Lenovo-Legion-5-15IMH05H:/$ scontrol update nodename=localhost state=idle, I got this error - slurm_update error: Invalid user id Commented Mar 5, 2024 at 14:44
1

By seeing the systemctl configuration you've provided, I can tell this things:

1- As for slurmd, the HW configuration you defined in slurm.conf is not correct. What are the HW specifications of the node this configuration will run on?

 (Mar 05 05:57:17 thoma-Lenovo-Legion-5-15IMH05H slurmd[6514]: slurmd: error: Node configuration differs from hardware: CPUs=12:12(hw) Boards=1:1(hw) SocketsPerBoard=12:1(hw) CoresPerSocket=1:6(hw) ThreadsPerCore>) 

According to this output, your values for SocketsPerBoard and CoresPerSocket, should be 1 and 6 respectively.

2- Regarding slurmctld, the initial node status should be UNKNOWN, like this.

 NodeName=localhost CPUs=12 RealMemory=30517 State=UNKNOWN PartitionName=localhost Nodes=ALL Default=YES MaxTime=INFINITE State=UP

NOTE: I have seen you have put "8000" as your RealMemory value. Try using the value "8192" instead, as Slurm uses MiB values :)

Try changing these, then restart both slurmd and slurmctld and let me know if that helps.

Cheers!

2
  • I am not sure about the hardware configurations. Since this is my personal laptop, can't we configure Slurm to use what's present? Commented Mar 5, 2024 at 22:23
  • 1
    You can check your computer's memory by running "sudo lshw -c memory" and CPUs by running "sudo lshw -c processor" or running "lscpu". Then, adjust the slurm.conf file accordingly. Adjusting Slurm.conf parameters dynamically is techincally possible by using the Slurm API, but I don't recommend doing that for the hardware info. I would suggest manually setting the hardware parameters. I highly recommend you read this page: slurm.schedmd.com/dynamic_nodes.html. Hope this helps! Commented Mar 6, 2024 at 10:43

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.