I have a list of 15000 compound names (file name: uniq-compounds) which contains names of 15000 folder. the folder have sub files i.e. out.pdbqt
which contains names of compound in 3rd Row. (Name = 1-tert-butyl-5-oxo-N-[2-(3-pyridinyl)ethyl]-3-pyrrolidinecarboxamide). I want to extract all those 15000 names by providing uniq-compound file (it contain folder names e.g ligand_*
) out of 50,000 folder.
directory and subfiles
sidra---50,000folder (ligand_00001 - ligand50,000)--each contains subfiles (out.pdbqt)--that conatins names.(mention below) another file (uniq-compound) contains 15000 folder names (that compound names i want).
out.pdbqt
MODEL 1 REMARK VINA RESULT: -6.0 0.000 0.000 REMARK Name = 1-tert-butyl-5-oxo-N-[2-(3-pyridinyl)ethyl]-3-pyrrolidinecarboxamide REMARK 8 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: N_1 and C_7
Advertisement
Answer
Assuming, uniq-compound.txt
contains the folder names and each folder contains an out.pdbqt
. Also, the compound name appears in the 3rd row of the file out.pdbqt
. If that is the case below script will work:
#!/bin/bash while IFS= read -r line; do awk 'FNR == 3 {print $4}' $line/out.pdbqt done < uniq-compound.txt
Loop will iterate through the uniq-compound.txt
one by one, for each line in the file (i.e folder), it uses awk
to display the 4th column in the 3rd line of the file out.pdbqt
inside that folder.