I have 70 input files, file names are as like slurm-22801576.out, slurm-22801573.out, slurm-26801571.out, and so on. I want to extract all desired strings to one file. So I did the following but I was able to do so for one file only. How to do that on multiple files?
awk 'BEGIN{printf "file,reads,file,samplen"}NR==32{printf "%s,%s,",FILENAME,$3}NR==2{printf "%s,%s,",FILENAME,$18}' slurm-22801576.out > summary/total_reads.csv
But my output file has only one row
file,reads,file,sample slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
Inside each input file, the texts look like this:
job starting at 23:42:13 java -ea -Xmx57039m -Xms57039m -cp /sw/bioinfo/bbmap/38.61b/rackham/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 pairedonly=t ambiguous=toss secondary=t killbadpairs=t perfectmode=t minid=1 mappedonly=t outm=2006_40_aligned.sam scafstats=2006_40_fulllength.scafstats in=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R1.fq.gz in2=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R2.fq.gz threads=auto Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, pairedonly=t, ambiguous=toss, secondary=t, killbadpairs=t, perfectmode=t, minid=1, mappedonly=t, outm=2006_40_aligned.sam, scafstats=2006_40_fulllength.scafstats, in=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R1.fq.gz, in2=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R2.fq.gz, threads=auto] Version 38.61 Set OUTPUT_MAPPED_ONLY to true Scaffold statistics will be written to 2006_40_fulllength.scafstats Set threads to 10 Ambiguously mapped reads will be considered unmapped. Set MINIMUM_ALIGNMENT_SCORE_RATIO to 1.000 Set genome to 1 Loaded Reference: 2.275 seconds. Loading index for chunk 1-1, build 1 Generated Index: 4.174 seconds. Analyzed Index: 3.394 seconds. Started output stream: 0.264 seconds. Creating scaffold statistics table: 0.064 seconds. Cleared Memory: 1.216 seconds. Processing reads in paired-ended mode. Started read stream. Started 10 mapping threads. Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ------------------ Results ------------------ Genome: 1 Key Length: 13 Max Indel: 0 Minimum Score Ratio: 1.0 Mapping Mode: perfect Reads Used: 461344228 (57756075474 bases) Mapping: 595.769 seconds. Reads/sec: 774367.86 kBases/sec: 96943.77
The expected output file should look like this:
file,reads,file,sample slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160, slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160, slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160, slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
Advertisement
Answer
You may use this awk
:
awk ' BEGIN{print "file,reads,file,sample"} FNR==2 {printf "%s,%s,", FILENAME, $18} FNR==32 {printf "%s,%s,n", FILENAME, $3} ' slurm-*.out > summary/total_reads.csv