I am in the process of learning Perl and I am trying to figure out how to do this task. I have a folder with a bunch of text files and I have a file ions_solvents_cofactors
that contains bunch of three letters list.
I wrote a script that opens and reads each file in a folder and should delete those lines that under a specific column [3] matches with some element from the list. It is not working well. I have some problem at the end of the script and cant figure out what it is.
Error I get is : rm: invalid option -- '5'
My input file look like this:
ATOM 1592 HD13 LEU D 46 11.698 -10.914 2.183 1.00 0.00 H ATOM 1593 HD21 LEU D 46 11.528 -8.800 5.301 1.00 0.00 H ATOM 1594 HD22 LEU D 46 12.997 -9.452 4.535 1.00 0.00 H ATOM 1595 HD23 LEU D 46 11.722 -8.718 3.534 1.00 0.00 H HETATM 1597 N1 308 A 1 0.339 6.314 -9.091 1.00 0.00 N HETATM 1598 C10 308 A 1 -0.195 5.226 -8.241 1.00 0.00 C HETATM 1599 C7 308 A 1 -0.991 4.254 -9.133 1.00 0.00 C HETATM 1600 C1 308 A 1 -1.468 3.053 -8.292 1.00 0.00 C
Here is the script:
#!/usr/bin/perl -w $dirname = '.'; opendir( DIR, $dirname ) or die "cannot open directory"; @files = grep( /.txt$/, readdir( DIR ) ); foreach $files ( @files ) { open( FH, $files ) or die "could not open $filesn"; @file_each = <FH>; close FH; close DIR; my @ion_names = (); my $ionfile = 'ions_solvents_cofactors'; open( ION, $ionfile ) or die "Could not open $ionfile, $!"; my @ion = <ION>; close ION; for ( my $line = 0; $line <= $#file_each; $line++ ) { chomp( $file_each[$line] ); if ( $file_each[$line] =~ /^HETATM/ ) { @is = split 's+', $file_each[$line]; chomp $is[3]; } foreach ( $file_each[$line] ) { #line 39 if ( "@ion" =~ $is[3] ) { system( "rm $file_each[$line]" ); } } } }
So for example if 308
from the input file matches in the file ions_cofactors_solvents` then delete all these lines in which it matches.
Advertisement
Answer
I would make use of the
Tie::File
module, which allows you to tie
an array to the module so that any changes you make to the array are reflected in the file
I’ve used glob
to find all the .txt
files, with the option :bsd_glob
so as to support spaces in the file paths
The first job is to build a hash %matches
that maps all the values in ions_solvents_cofactors
to 1. This makes it trivial to test the PDB files for the required values
Then it’s just a matter of using tie
on each .txt
file, and testing each line to see whether the value in column 4 is represented in the hash
I use variable $i
to index into the @file
array which maps the on-disk file. If a match is found then the array element is deleted with splice @file, $i, 1
. (This naturally leaves $i
indexing the next element in sequence without incrementing $i
.) If there is no match then $i
is incremented to index the next array element, leaving the line in place
use strict; use warnings 'all'; use File::Glob ':bsd_glob'; use Tie::File; my %matches = do { open my $fh, '<', 'ions_solvents_cofactors.txt'; local $/; map { $_ => 1 } split ' ', <$fh>; }; for my $pdb ( glob '*.txt' ) { tie my @file, 'Tie::File', $pdb or die $!; for ( my $i = 0; $i < @file; ) { next unless my $col4 = ( split ' ', $file[$i] )[3]; if ( $matches{$col4} ) { printf qq{Removing line %d from "%s"n}, $i+1, $pdb; splice @file, $i, 1; } else { ++$i; } } }