Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Filter script with pattern and an array

by ultibuzz (Monk)
on Oct 03, 2005 at 11:18 UTC ( #496871=perlquestion: print w/ replies, xml ) Need Help??
ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:

hi guys im quit new to perl and have some problems here ive a csv file wich i need to sort all mathing lines go to another file ( this works), but i cant get it working that the array is counted so every element of the array gets in the math for 1 line. this is my test code where i need to add the count. @filter is my test array @ the end it will contain 60000 elements
#!/usr/bin/perl $infile = @ARGV[0]; $outfile = "OK.txt"; $outfile2 = "ERROR.txt"; @Filter = ('01005;01000','01005;01001','01005;01002','01005;01003'); $zaehler = 0; open(INFILE, "< $infile") or die "\nDatei $infile konnte nicht geoeffnet werden\n"; open(OUTFILE, "> $outfile") or die "\nDatei $outfile konnte nicht geoeffnet werden\n"; open(OUTFILE2, "> $outfile2") or die "\nDatei $outfile2 konnte nicht geoeffnet werden\n"; while (<INFILE>) { print OUTFILE if /@Filter[0]/; } close(INFILE); close(OUTFILE);
i tryd somthing with an foreach loop but i cant get it to work as u see above i specived an element of the arrys and it works fine, but i need to count trough all elements ;D UPDATE: here is a line from a file 120227714;520226667;03015;03200;0000031685; this string 03015;03200 need to be filterd and if its ok then the hole line needs to go to a new file

Comment on Filter script with pattern and an array
Download Code
Re: Filter script with pattern and an array
by reneeb (Chaplain) on Oct 03, 2005 at 11:37 UTC
    You should use use strict and use warnings

    I would suggest, that you make one regex, which includes all elements of @Filter... So you just have to test one time instead of many iterations...

    #!/usr/bin/perl use strict; use warnings; my $infile = @ARGV[0]; my $outfile = "OK.txt"; my @Filter = ('01005;01000','01005;01001','01005;01002','01005;01003') +; my $zaehler = 0; open(INFILE, "< $infile") or die "\nDatei $infile konnte nicht geoeffn +et werden\n"; open(OUTFILE, "> $outfile") or die "\nDatei $outfile konnte nicht geoe +ffnet werden\n"; my $filterregex = join('|',@Filter); while (<INFILE>) { if(/$filterregex/){ print OUTFILE $_; $zaehler++; } } close(INFILE); close(OUTFILE);
      can regex handel 60000 elements ? so it will look like this while (<INFILE>) { print OUTFILE if /01005;01000|01005;01001|01005;01002|01005;01003|...untill 60ooo element|03584;66689/ }
Re: Filter script with pattern and an array
by blazar (Canon) on Oct 03, 2005 at 11:53 UTC
    If you're dealing with csv, then I strongly recommend using Text::CSV_XS. Said this...
    #!/usr/bin/perl $infile = @ARGV[0]; $outfile = "OK.txt"; $outfile2 = "ERROR.txt"; @Filter = ('01005;01000','01005;01001','01005;01002','01005;01003'); $zaehler = 0;
    Hmmm no strict, no warnings. Now, that is too bad...
    open(INFILE, "< $infile") or die "\nDatei $infile konnte nicht geoeffnet werden\n";
    Better:
    open my $in, '<', $infile or die "Error message including \$!: $!\n";
    and ditto as above wrt the other opens.
    while (<INFILE>) { print OUTFILE if /@Filter[0]/; }
    Huh?!?
Re: Filter script with pattern and an array
by ultibuzz (Monk) on Oct 03, 2005 at 18:14 UTC
    thx ppl for ure help with the all in one pattern it works grade ;D i just filterd 1000000 lines and it filter everything correct ;D and yes i use no struct ,D etc because others need to change the script easyly and they have totaly no clue of perl,this is why i use the variables for the filenames as well ;D i striped some filter vars of because 60000 are to much ;D i also include an else print to sort the non maching out of the file ;D heres the code to share my xperiance with all ;D
    #!/usr/bin/perl $infile = @ARGV[0]; $outfile = "OK.txt"; $outfile2 = "ERROR.txt"; open(INFILE, "< $infile") or die "\nDatei $infile konnte nicht geoeffnet werden\n"; open(OUTFILE, "> $outfile") or die "\nDatei $outfile konnte nicht geoeffnet werden\n"; open(OUTFILE2, "> $outfile2") or die "\nDatei $outfile2 konnte nicht geoeffnet werden\n"; while (<INFILE>) { if($_ =~ /01005;01000|01005;01001|01005;01002|01005;01003|01005;01004| +01005;01005|01005;01006|01005;01007|01005;01008|01005;01009|01005;010 +10|01005;01011|01005;01012|01005;01013|01005;01014|01005;01015|01005; +01109|01005;01170|01005;01177|01005;01198|01005;01199|01005;01200|010 +05;01400|01005;01401|01005;01402|01005;01403|01005;01404|01005;01405| +01005;01406|01005;01407|01005;01408|01005;01409|01005;01410|01005;014 +11|01005;01412|01005;01443|01005;01576|01005;01577|01005;01597|01005; +01600|01005;01601|01005;01602|01005;01603|01005;01604|01005;01605|010 +05;01606|01005;01607|01005;01608|01005;01610|01005;01611|01005;01612| +01005;01613|01005;01614|01005;01615|01005;01616|01005;01617|01005;016 +18|01005;01619|01005;01620|01005;01621|01005;01622|01005;01623|01005; +01624|01005;01625|01005;01626|01005;01627|01005;01628|01005;01629|010 +05;01634|01005;01635|01005;01660|01005;01773|01005;01798|01005;01800| +01005;01801|01005;02000|01005;02001|01005;02002|01005;02003|01005;020 +04|01005;02005|01005;02006|01005;02045|01005;02046|01005;02196|01005; +02200|01005;02201|01005;02202|01005;02203|01005;02204|01005;02398|010 +05;02399|01005;02400|01005;02402|01005;02403|01005;02404|01005;02405| +01005;02406|01005;02407|01005;02408|01005;02409|01005;02598|01005;025 +99|01005;02600|01005;02601|01005;02603|01005;02800|01005;02801|01005; +02802|01005;02803|01005;02804|01005;02805|01005;02807|01005;02809|010 +05;02810|01005;02811|01005;02812|01005;02970|01005;02971|01005;02996| +01005;02997|01005;02998|01005;03000|01005;03001|01005;03002|01005;032 +00|01005;03201|01005;03202|01005;03203|01005;03204|01005;03205|01005; +03206|01005;03207|01005;03208|01005;03209|01005;03210|01005;03211|010 +05;03212|01005;03213|01005;03214|01005;03215|01005;03216|01005;03217| +01005;03218|01005;03219|01005;03220|01005;03221|01005;03222|01005;032 +23|01005;03224|01005;03225|01005;03226|01005;03227|01005;03228|01005; +03229|01005;03230|01005;03231|01005;03232|01005;03233|01005;03234|010 +05;03235|01005;03236|01005;03237|01005;03238|01005;03239|01005;03240| +01005;03241|01005;03242|01005;03243|01005;03244|01005;03245|01005;032 +46|01005;03247|01005;03248|01005;03249|01005;03250|01005;03252|01005; +03253|01005;03254|01005;03255|01005;03256|01005;03257|01005;03258|010 +05;03259|01005;03260|01005;03261|01005;03262|01005;03264|01005;03265| +01005;03266|01005;03267|01005;03268|01005;03269|01005;03270|01005;032 +71|01005;03272|01005;03273|01005;03274|01005;03275|01005;03276|01005; +03277|01005;03278|01005;03279|01005;03280|01005;03281|01005;03294|010 +05;03298|01005;03305|01005;03309|01005;03320|01005;03332|01005;03333| +01005;03337|01005;03338|01005;03339|01005;03340|01005;03341|01005;033 +42|01005;03343|01005;03344|01005;03345|01005;03347|01005;03350|01005; +03366|01005;03367|01005;03369|01005;03370|01005;03371|01005;03372|010 +05;03373|01005;03374|01005;03375|01005;03376|01005;03379|01005;03381| +01005;03382|01005;03383|01005;03384|01005;03385|01005;03386|01005;033 +87|01005;03388|01005;03389|01005;03392|01005;03393|01005;03394|01005; +03395|01005;03396|01005;03397|01005;03398|01005;03399|01005;03400|010 +05;03401|01005;03402|01005;03403|01005;03404|01005;03405|01005;03406| +01005;03407|01005;03408|01005;03409|01005;03410|01005;03411|01005;034 +12|01005;03413|01005;03414|01005;03415|01005;03416| 01005;03417|01005;03419|01005;03420|01005;03433|01005;03531|01005;0360 +0|01005;03601|01005;03602|01005;03800|01005;04000|01005;04001|01005;0 +4002|01005;04003|01005;04004|01005;04005|01005;04006|01005;04007|0100 +5;04008|01005;04009|01005;04010|01005;04011|01005;04012|01005;04070|0 +1005;04199|01005;04200|01005;04400|01005;04401|01005;04402|01005;0440 +3|01005;04404|01005;04405|01005;04600|01005;04601|01005;04602|01005;0 +4603|01005;04604|01005;04605|01005;04606|01005;04607|01005;04675|0100 +5;04676|01005;04773|01005;04800|01005;04801|01005;04802|01005;04803|0 +1005;04804|01005;04998|01005;04999|01005;10000|01005;10100|01005;1010 +1|01005;10200|01005;10201|01005;10300|01005;10400|01005;10500|01005;1 +0600|01005;10601|01005;10700|01005;10800|01005;10900|01005;10943|0100 +5;11000|01005;11100|01005;11101|01005;11200|01005;11201|01005;11202|0 +1005;11300|01005;11301|01005;11400/ ) { print OUTFILE; } else { print OUTFILE2; } } close(INFILE); close(OUTFILE); close(OUTFILE2);
      I'm guessing that this is still in "test" stages... it does not look like you have 60000 elements in your regex yet. ;)

      It's looks like your patterns are supposed to match whole fields -- for example, "01005;11200" should match a line like this:

      012345;23456;01005;11200;000111222;111222333
      but it should not match a line like this:
      012345;23456;02006;22300;000001005;112004444
      The code you posted will match both lines, because the regex does not include ";" before and after the long conjunction of field values.

      Since you seem to be dealing with flat-table data, and your regex patterns involve matching certain combinations of third and fourth column values on each table row, you should consider treating handling things in a more table-like manner: read the target patterns into a hash, then read each row of the flat table file, pull out the 3rd and 4th fields, and see if they consistute an existing hash key.

      In any case, you do want to make sure your script will load your target patterns from a list file, rather than putting all the values in the perl code like you've done here. For example:

      use strict; use warnings; ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ) or die "Usage: $0 input.table target.list\n"; my ( $infile, $targfile ) = @ARGV; my $outfile = "OK.txt"; my $errfile = "ERROR.txt"; open( IN, $infile ) or die "$infile: $!"; open( OUT, $outfile ) or die "$outfile: $!"; open( ERR, $errfile ) or die "$errfile: $!"; open( TARG, $targfile ) or die "$targfile: $!"; my %target; while (<TARG>) { chomp; # target.list has lines like "01005;11400" $target{$_} = undef; } close TARG; while (<IN>) { my @fields = split /;/; # assuming no quoted ";" within fields my $check = join ';', @fields[2,3]; # line-initial value is $field +s[0] # so 3rd and 4th are @fields[ +2,3] if ( exists( $target{$check} )) { print OUT; } else { print ERR; } } close OUT; close ERR;

      As for this comment of yours:

      i use no struct strict ... etc because others need to change the script easyly and they have totaly no clue of perl
      If others, with less knowledge of perl than you have, are going to be altering this script, then that's the most important reason to include  use strict; use warnings; -- that way, when they screw something up, there's a much better chance that the problem will be caught (and explained) before things get worse.

      (If these other people are just making adjustments to the list of target patterns, that is another very good reason for keeping that list in a separate file, so it can be updated without having to touch the perl script.)

      One last point: if your target patterns are not always being sought in the same columns of the table -- e.g. sometimes your target string is expected to match columns 3 and 4, and other times it is expected to match columns 5 and 6 -- then you might need to revert back to the regex approach. In that case, you should assign the conjunction of strings to a scalar, and form the regex like this:

      my @targ_strings = <TARG>; chomp @targ_strings; my $targ_regex = join "|", @targs; while (<IN>) { if ( /;(?:$targ_regex);/ ) { print OUT; } else { print ERR; } }
        hi thx for this suggestion but there are only txt lines and some of them contain the paaterns i use ;D no others ;D and its running fine ;D in the above code i dont include all 60000 becouse of the readable ;D so far the script filters over 200 files and get out lots of wrong files so i can speed up the process near 80 % faster and this is greate ;D
      and yes i use no struct ,D etc because others need to change the script easyly and they have totaly no clue of perl
      (my $strict='struct') =~ s/u/i/; and yes, this is one more reason to use it, not just the opposite!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://496871]
Approved by muntfish
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (14)
As of 2014-07-14 16:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (268 votes), past polls