Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I'm guessing that this is still in "test" stages... it does not look like you have 60000 elements in your regex yet. ;)

It's looks like your patterns are supposed to match whole fields -- for example, "01005;11200" should match a line like this:

012345;23456;01005;11200;000111222;111222333
but it should not match a line like this:
012345;23456;02006;22300;000001005;112004444
The code you posted will match both lines, because the regex does not include ";" before and after the long conjunction of field values.

Since you seem to be dealing with flat-table data, and your regex patterns involve matching certain combinations of third and fourth column values on each table row, you should consider treating handling things in a more table-like manner: read the target patterns into a hash, then read each row of the flat table file, pull out the 3rd and 4th fields, and see if they consistute an existing hash key.

In any case, you do want to make sure your script will load your target patterns from a list file, rather than putting all the values in the perl code like you've done here. For example:

use strict; use warnings; ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ) or die "Usage: $0 input.table target.list\n"; my ( $infile, $targfile ) = @ARGV; my $outfile = "OK.txt"; my $errfile = "ERROR.txt"; open( IN, $infile ) or die "$infile: $!"; open( OUT, $outfile ) or die "$outfile: $!"; open( ERR, $errfile ) or die "$errfile: $!"; open( TARG, $targfile ) or die "$targfile: $!"; my %target; while (<TARG>) { chomp; # target.list has lines like "01005;11400" $target{$_} = undef; } close TARG; while (<IN>) { my @fields = split /;/; # assuming no quoted ";" within fields my $check = join ';', @fields[2,3]; # line-initial value is $field +s[0] # so 3rd and 4th are @fields[ +2,3] if ( exists( $target{$check} )) { print OUT; } else { print ERR; } } close OUT; close ERR;

As for this comment of yours:

i use no struct strict ... etc because others need to change the script easyly and they have totaly no clue of perl
If others, with less knowledge of perl than you have, are going to be altering this script, then that's the most important reason to include  use strict; use warnings; -- that way, when they screw something up, there's a much better chance that the problem will be caught (and explained) before things get worse.

(If these other people are just making adjustments to the list of target patterns, that is another very good reason for keeping that list in a separate file, so it can be updated without having to touch the perl script.)

One last point: if your target patterns are not always being sought in the same columns of the table -- e.g. sometimes your target string is expected to match columns 3 and 4, and other times it is expected to match columns 5 and 6 -- then you might need to revert back to the regex approach. In that case, you should assign the conjunction of strings to a scalar, and form the regex like this:

my @targ_strings = <TARG>; chomp @targ_strings; my $targ_regex = join "|", @targs; while (<IN>) { if ( /;(?:$targ_regex);/ ) { print OUT; } else { print ERR; } }

In reply to Re^2: Filter script with pattern and an array by graff
in thread Filter script with pattern and an array by ultibuzz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-26 09:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found