Slurm Requested node configuration is not availabl…

hello everyones so im trying to set up a new hpc cluster i made an account and added users and im using a partition but whenerver i run a job it gives me an error that request node configuration is not available i checked my slurm.conf but it seems good to me i need some help the error Batch job submission failed: Requested node configuration is not available

   #
# See the slurm.conf man page for more information.
#

SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
SlurmdSpoolDir=/cm/local/apps/slurm/var/spool
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
#ProctrackType=proctrack/pgid
ProctrackType=proctrack/cgroup
#PluginDir=
#FirstJobId=
ReturnToService=2
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
TaskPlugin=task/cgroup
#TrackWCKey=no
#TreeWidth=50
#TmpFs=
#UsePAM=
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
#SchedulerAuth=
#SchedulerPort=
#SchedulerRootFilter=
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
#PriorityWeightFairshare=100000
#PriorityWeightAge=1000
#PriorityWeightPartition=10000
#PriorityWeightJobSize=1000
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd

#JobCompType=jobcomp/filetxt
#JobCompLoc=/cm/local/apps/slurm/var/spool/job_comp.log

#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherType=jobacct_gather/cgroup
#JobAcctGatherFrequency=30
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
# AccountingStorageLoc=slurm_acct_db
# AccountingStoragePass=SLURMDBD_USERPASS

# This section of this file was automatically generated by cmd. Do not edit manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
# Server nodes
SlurmctldHost=omics-master
AccountingStorageHost=master
# Nodes
NodeName=omics[01-05] Procs=48 Feature=local
# Partitions
PartitionName=defq Default=YES MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL Nodes=omics[01-05]
ClusterName=omics
# Scheduler
SchedulerType=sched/backfill
# Statesave
StateSaveLocation=/cm/shared/apps/slurm/var/cm/statesave/omics
PrologFlags=Alloc
# Generic resources types
GresTypes=gpu
# Epilog/Prolog section
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
# Power saving section (disabled)
# END AUTOGENERATED SECTION   -- DO NOT REMOVE

and this is my sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite      5   idle omics[01-05]

and this is my test script

   #!/bin/bash
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --output=std.out
#SBATCH --error=std.err
#SBATCH --mem-per-cpu=1gb
echo "hello from:"
hostname; pwd; date;
sleep 10
echo "going to sleep during 10 seconds"
echo "wake up, exiting

“

and thanks in advance

Answer

In the node definition, you do not specify RealMemory so Slurm assumes the default of 1MB (!) per node. Hence the request of 1GB per CPU cannot be fulfilled.

You should run slurmd -C on the compute node, that will give you the line to insert in the slurm.conf file for Slurm to correctly know the hardware resources it can allocate.

$ slurmd -C | head -1
NodeName=node002 CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=128547

Slurm Requested node configuration is not available

Advertisement

Answer