Skip to content
Advertisement

How to extract names of compound present in sub files?

I have a list of 15000 compound names (file name: uniq-compounds) which contains names of 15000 folder. the folder have sub files i.e. out.pdbqt which contains names of compound in 3rd Row. (Name = 1-tert-butyl-5-oxo-N-[2-(3-pyridinyl)ethyl]-3-pyrrolidinecarboxamide). I want to extract all those 15000 names by providing uniq-compound file (it contain folder names e.g ligand_*) out of 50,000 folder.

directory and subfiles

sidra---50,000folder (ligand_00001 - ligand50,000)--each contains subfiles (out.pdbqt)--that conatins names.(mention below)
another file (uniq-compound) contains 15000 folder names (that compound names i want).

out.pdbqt

MODEL 1
REMARK VINA RESULT:      -6.0      0.000      0.000
REMARK  Name = 1-tert-butyl-5-oxo-N-[2-(3-pyridinyl)ethyl]-3-pyrrolidinecarboxamide
REMARK  8 active torsions:
REMARK  status: ('A' for Active; 'I' for Inactive)
REMARK    1  A    between atoms: N_1  and  C_7

Advertisement

Answer

Assuming, uniq-compound.txt contains the folder names and each folder contains an out.pdbqt. Also, the compound name appears in the 3rd row of the file out.pdbqt. If that is the case below script will work:

#!/bin/bash
while IFS= read -r line; do
    awk 'FNR == 3 {print $4}' $line/out.pdbqt 
done < uniq-compound.txt

Loop will iterate through the uniq-compound.txt one by one, for each line in the file (i.e folder), it uses awk to display the 4th column in the 3rd line of the file out.pdbqt inside that folder.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement