Skip to content
Advertisement

How to delete lines that match elements from another file

I am in the process of learning Perl and I am trying to figure out how to do this task. I have a folder with a bunch of text files and I have a file ions_solvents_cofactors that contains bunch of three letters list.

I wrote a script that opens and reads each file in a folder and should delete those lines that under a specific column [3] matches with some element from the list. It is not working well. I have some problem at the end of the script and cant figure out what it is.

Error I get is : rm: invalid option -- '5'

My input file look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C 

Here is the script:

#!/usr/bin/perl -w

$dirname = '.';
opendir( DIR, $dirname ) or die "cannot open directory";
@files = grep( /.txt$/, readdir( DIR ) );

foreach $files ( @files ) {

    open( FH, $files ) or die "could not open $filesn";
    @file_each = <FH>;
    close FH;

    close DIR;

    my @ion_names = ();

    my $ionfile   = 'ions_solvents_cofactors';
    open( ION, $ionfile ) or die "Could not open $ionfile, $!";
    my @ion = <ION>;
    close ION;

    for ( my $line = 0; $line <= $#file_each; $line++ ) {

        chomp( $file_each[$line] );
        if ( $file_each[$line] =~ /^HETATM/ ) {
            @is = split 's+', $file_each[$line];
            chomp $is[3];
        }

        foreach ( $file_each[$line] ) {    #line 39

            if ( "@ion" =~ $is[3] ) {
                system( "rm $file_each[$line]" );
            }
        }
    }
}

So for example if 308 from the input file matches in the file ions_cofactors_solvents` then delete all these lines in which it matches.

Advertisement

Answer

I would make use of the Tie::File module, which allows you to tie an array to the module so that any changes you make to the array are reflected in the file

I’ve used glob to find all the .txt files, with the option :bsd_glob so as to support spaces in the file paths

The first job is to build a hash %matches that maps all the values in ions_solvents_cofactors to 1. This makes it trivial to test the PDB files for the required values

Then it’s just a matter of using tie on each .txt file, and testing each line to see whether the value in column 4 is represented in the hash

I use variable $i to index into the @file array which maps the on-disk file. If a match is found then the array element is deleted with splice @file, $i, 1. (This naturally leaves $i indexing the next element in sequence without incrementing $i.) If there is no match then $i is incremented to index the next array element, leaving the line in place

use strict;
use warnings 'all';

use File::Glob ':bsd_glob';
use Tie::File;

my %matches = do {
    open my $fh, '<', 'ions_solvents_cofactors.txt';
    local $/;
    map { $_ => 1 } split ' ', <$fh>;
};

for my $pdb ( glob '*.txt' ) {

    tie my @file, 'Tie::File', $pdb or die $!;

    for ( my $i = 0; $i < @file; ) {

        next unless my $col4 = ( split ' ', $file[$i] )[3];

        if ( $matches{$col4} ) {
            printf qq{Removing line %d from "%s"n},
                    $i+1,
                    $pdb;
            splice @file, $i, 1;
        }
        else {
            ++$i;
        }
    } 
}
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement