How to feed a large number of samples in parallel to linux?

Question

I&#8217;m trying run following command on a large number of samples. I have: but I have thousands of these samples to process. Each sample takes about a day or two to finish on my local computer. I&#8217;m using a shared linux cluster and a job scheduling system called Slurm, if that helps. Answer Write a sub…

Accepted Answer

Write a submission script such as the following and submit it with the sbatch command. #!/bin/bash#SBATCH --ntasks=1#SBATCH --cpus-per-task=#SBATCH --mem=#SBATCH --time=#SBATCH --array=1-FILES=(assembled_reads/*.sorted.bam)INFILE=${FILES[$SLURM_TASK_ARRAY_ID]}OUTFILE=$(basename $INFILE .sorted.bam).raw.snps.indels.g.vcfsrun java -jar GenomeAnalysisTK.jar -R scaffs_HAPSgracilaria92_50REF.fasta -T HaplotypeCaller -I $INFILE --emitRefConfidence GVCF -ploidy 1 -nt 1-nct $SLURM_CPUS_PER_TASK -o $OUTFILEThis is totally untested, and only aims at giving you a first direction.I am sure the administrators of the cluster you use have written some documentation, the first step would be to read it cover to cover.

Advertisement

Answer