Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria

by dkhalfe (Acolyte)
on Jul 26, 2012 at 20:42 UTC ( [id://983933]=perlquestion: print w/replies, xml ) Need Help??

dkhalfe has asked for the wisdom of the Perl Monks concerning the following question:

hi all

Gameplan: I am going to read in a filter_file and an input file.

Filter file will be tab-delim formatted as such:

column relationship value num_or_string filter_or_append order a <= 0.3 num filter 1 b eq abc string append 3 c <= 0.3 num filter 2

Input file will be tab-delim formatted as such:

a b c 0.2 abc 0.3 0.1 abd 0.3 0.4 abe 0.2 0.1 abc 0.5 0.7 abt 0.7 0.1 abd 0.8

My code:

#!/usr/bin/perl use warnings; use strict; use autodie; use Data::Dumper; @ARGV == 2 or die "Invalid number of arguments. Please re-run program +and suppy as arguments \n1) the filter file and \n2) the input file." +; my ($filter_file, $input_file) = @ARGV; open (my $FILTER,"<","$filter_file") or die "Cannot open filter file: +$!"; my @filter_array; <$FILTER>; while (my $line = <$FILTER>) { chomp $line; my @line_array = split(/\t/, $line); push (@filter_array, \@line_array); } @filter_array = sort { $a->[5] <=> $b->[5]} @filter_array; #print Dumper \@filter_array, "\n"; #[DEBUGGING] my $num_elements = (@filter_array-1); #print "$num_elements", "\n"; #[DEBUGGING] open (IN,"<","$input_file") or die "Cannot open input file: $!"; open (OUTFILE, ">>", "OUTPUT_$input_file") or die "Cannot create an ou +tput file: $!"; for (my $i=0; $i<=$num_elements; $i++) { my $search_string = $filter_array[$i][0]; # print "$search_string", "\n"; my $header = <IN>; my @header_titles = split /\t/, $header; my $extract_col = 0; for my $header_line (@header_titles) { last if $header_line =~ m/$search_string/; $extract_col++; } print "Extracting column $extract_col\n"; while (my $row = <IN>) { last unless $row =~ /\S/; chomp $row; my @cells = split /\t/, $row; # print "$cells[$extract_col]\n"; if ((eval "$cells[$extract_col] $filter_array[$i][1] $filt +er_array[$i][2]")) { print OUTFILE "$cells[$extract_col]", "\n"; } } }

The part I need help with concerns the bottom of my code:

if ((eval "$cells[$extract_col] $filter_array[$i][1] $filter_array[$i] +[2]")) { print OUTFILE "$cells[$extract_col]", "\n"; }

NOTE: above code translates to: if values in col 'a' are '<=' to '.3', print value (those that match the criteria).

I need to match the column header of my input file with one designated in the filter file. Which I do here:  last if $header_line =~ m/$search_string/; Then I need to make the comparison designated in the filter file. Ex: Match 'a' (column header) with 'a' in input file, if values in col 'a' are <= .3, print to OUTFILE. Which is done here:

if ((eval "$cells[$extract_col] $filter_array[$i][1] $filter_array[$i +] +[2]")) { print OUTFILE "$cells[$extract_col]", "\n"; }
My problem: If the values in col 'a' are <= .03, I need to print the entire row of the input file to the output file, not just the value from the column I am working with. Any help with this issue would be much appreciated. Thanks.

  • Comment on match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria
  • Select or Download Code

Replies are listed 'Best First'.
Re: match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria
by Corion (Patriarch) on Jul 26, 2012 at 20:54 UTC

    What part of the replies you got to Comparing a Hash key with a variable (if statement) have you applied?

    Also, if this is all you have problems with:

    My problem: If the values in col 'a' are <= .03, I need to print the entire row of the input file to the output file, not just the value from the column I am working with. Any help with this issue would be much appreciated. Thanks.

    ... why do you post the large program which has very little to do with your problem? Reduce your program, for example by hardcoding the filters and filter conditions.

    Also, you might want to look at your print statement and just print out all you want instead of only printing the matched cell.

Re: match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria
by Anonymous Monk on Jul 27, 2012 at 07:25 UTC

    While I do not know the answer to your current question, I do think your code is irreparably broken as long as you insist on using eval like that. BrowserUK gave a very good suggestion here to invoke the operators using a dispatch table (a hash of function references, each taking two arguments). You should incorporate that into your code.

      Anyway, eval is so very rarely needed and should be shunned. But you wanted minor corrections. Let's see...

      Change

      if ((eval "$cells[$extract_col] $filter_array[$i][1] $filter_array[$i] +[2]"))
      to
      if ((eval '$cells[$extract_col] ' . $filter_array[$i][1] . ' $filter_a +rray[$i][2]'))

      I need to print the entire row of the input file to the output file

      Then why don't you do so? Change

      print OUTFILE "$cells[$extract_col]", "\n";

      to

      print OUTFILE $row, "\n";

      The clarity of your code would be somewhat enhanced if you gave @filter_array's columns names; e.g. $filter_array[$i]->{action} = 'append' or $filter_array[$i]->{column} = 'c'. Remember: with clear code, it is easier to be certain that it works correct.

      Oh, and please stop quoting "$var" since it is identical to plain $var -- barring a few exceptions having to do with references and objects.

Re: match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria
by CountZero (Bishop) on Jul 27, 2012 at 19:45 UTC
    The logic of your program is all wrong.

    After extracting your filter preferences, you start a first ("outer") loop where you go through the filters one by one.

    Inside that loop you read the input file and check the filter condition for each line of the input file ("inner" loop).

    BUT when you end the "inner" loop you have exhausted the input file and when you start the next iteration of your "outer" loop, <IN> will return nothing and your "inner" loop gets skipped.

    I really doubt that is what you meant to do. Rather you should open the input file inside the outer loop and close it again after the "inner" loop is finished. The next iteration of the "outer" loop will then open the input file again and start reading it afresh and checking your next filter. Of course that is a slow and wasteful way of working: for every filter you walk the input file afresh. It is OK when the files are small, but it quickly will become unwieldly and slow when the files grow.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: match search string with column header, do some filtering with matched column, print lines/rows that match filter criteria
by CountZero (Bishop) on Jul 27, 2012 at 20:46 UTC
    What you are really doing is selecting records from a database with the criteria provided in another file. Everytime you think "database", think "DBI" and use the power of CPAN to make life easier for you.

    Look at the following:

    use Modern::Perl; use Data::Dump qw/dump/; use SQL::Abstract; use DBI; my $dbh = DBI->connect( 'dbi:CSV:', '', '', { f_dir => './DB/', csv_sep_char => "\t", } ) or die "Cannot connect: $DBI::errstr"; my @where; { my $query = 'SELECT * FROM filter_file ORDER BY orde'; my $sth = $dbh->prepare($query); $sth->execute(); while ( my $row = $sth->fetchrow_hashref ) { push @where, { $row->{'column'} => { $row->{'relationship'} => $row->{'va +lue'} } }; } $sth->finish(); } my $sql = SQL::Abstract->new; my ( $where, @bind ) = $sql->where( \@where ); { my $query = "SELECT a, b, c FROM input $where"; my $sth = $dbh->prepare($query); $sth->execute(@bind); while ( my $row = $sth->fetchrow_hashref ) { say dump($row); } $sth->finish(); }
    Output:
    { a => 0.2, b => "abc", c => 0.3 } { a => 0.1, b => "abd", c => 0.3 } { a => 0.4, b => "abe", c => 0.2 } { a => 0.1, b => "abc", c => 0.5 } { a => 0.1, b => "abd", c => 0.8 }
    This script fetches the filter conditions from the filter_file, stuffs it into a SQL::Abstract data-structure, constructs an SQL query out of it that combines all filters at once (the "where" part of the SQL is  WHERE ( ( a <= ? OR c <= ? OR b = ? ) )) and then runs that SQL query against your input file.

    No repeated going through your input file; clean and fast code; easy to understand, maintain and change for anyone who knows the basics of SQL.

    A few caveats though:

    • I had to change the name of your order column to orde since "order" is a reserved SQL keyword.
    • As we are now dealing with SQL, the "equals" test is always =, even for strings so I had to change your eq in the filter_file to =.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://983933]
Approved by chacham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-20 01:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found