http://www.perlmonks.org?node_id=968970

perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:

I have tab delim file as follows
#file1 #version1.1 #columns with the information as follows state1 class1 report_version_1.1 9428 4567 . . cal +l=1;times=5;toss=head->tail;sno=A1B1;effect=positive state1 class1 report_version_1.1 3862 4877 . . cal +l=1;times=5;toss=head->tail;sno=A1B2;effect=negative state1 class1 report_version_1.1 2376 4567 . . cal +l=1;times=5;toss=head->tail;sno=;effect=positive state2 class1 report_version_1.1 4378 2345 . . cal +l=1;times=5;toss=tail->tail;sno=A1B3;effect=positive,negative, both state2 class1 report_version_1.1 1289 4835 . . cal +l=1;times=5;toss=head->tail;sno=;effect=positive

Note: There are no column headers in the file just the three top comments.
I am trying to parse out the 8 column(starting from call=1) which is basically a string separated by semi colons. I need to remove all the entries that have part1: no sno (value/name) (for eg: for 3 row sno=; i.e there is no record) and also part2: those that have same toss results i.e toss=tail->tail is not needed since both are tail. Since I am learning perl I try and divide such things in parts and hence Here is what I have come up with so far for the part1: sno…
#!usr/bin/perl use warnings; #inputfile my $input_file = "/Users/myfolder/myfile.txt"; die "Cannot open $input_file \n" unless (open(IN, $input_file)); #open file my @LINES =<IN>; #output file #Open output file and write the needed results die "output1.txt" unless(open( OUT,"> output1.txt")); my @infos; while(<IN>){ my @fields = split ';', $_; my $state = $fields[1]; my $class1 = $fields[2]; my $report_version_1.1 = $fields[3]; my $value1 = $fields[4]; my $value2 = $fields[5]; my $dot1 = $fields[6]; my $dot2 = $fields[7]; my $info = $fields[8]; if ( $fields[8] =~ /^[sno]/ =~ /^[sno]/ ) { push @infos, $_; print OUT " $state\ $class\ $report_version_1.1\ $value1\ $value2\ $do +t1 \ $dot2 \ $info\n"; } } exit;
Any other better ways to solve this for both sno and toss parts together?

Replies are listed 'Best First'.
Re: split and matching
by JavaFan (Canon) on May 04, 2012 at 19:51 UTC
    my @LINES =<IN>;
    This reads all the lines, and stuffs them into @LINES.
    while(<IN>){
    You've already read all the lines! What do you expect to read now?
    my @fields = split ';', $_;
    I thought you said it was a tab delimited file?

    if ( $fields[8] =~ /^[sno]/ =~ /^[sno]/ ) {
    What are you trying to do here? If you want to know that there is a value for sno, do something like:
    if ($fields[8] =~ /(?:^|;)sno=[^;]/) { ... }
    And if you have a string literal, there is really no need to backslash all your spaces (unless it's the literal is a regexp pattern under the /x modifier, and you want the regex engine to see the spaces). Your print statement will be much more readable if you get rid of them.
Re: split and matching
by Kenosis (Priest) on May 04, 2012 at 22:50 UTC

    Unless the exclusionary items you're looking for in column 8 also appear in another column, it's not necessary to isolate column 8 to check for them. If you find them within an entry--as a whole--you can skip that entry:

    use strict; use warnings; my $regex = join '|', map qq|\Q$_\E|, qw(sno=; tail->tail head->head); open my $file, '<file1.txt' or die "$!"; while( <$file> ) { next if /$regex/; print } close $file;

    $regex is created by first using qw to make a list of quoted words from the space-delimited exclusionary items. This list is passed to map which uses qq|\Q$_\E| to escape all special characters in each list item ($_ contains a list item). The results of map are joined with '|' for use as an or between the exclusionary items in the subsequent pattern match.

    while we're reading each of the file's lines, if the pattern match finds an exclusionary item, the next line's requested, else the line's printed--both the matching and print operating on perl's default scaler: $_.

    Hope this helps!

      Thank you for the input. It is right that the exclusionary items are all in column 8 only. I tried inputting the file and running code with $regex and it runs. But my output file is still blank.
Re: split and matching
by Anonymous Monk on May 04, 2012 at 19:53 UTC

    Any other better ways to solve this for both sno and toss parts together?

    Seeing how that doesn't do what you're hoping it does, there surely must be

    Start with

    use strict; use warnings; my @fields = ( 0 .. 7, "something" ); if ( $fields[8] =~ /^[sno]/ =~ /^[sno]/ ) { print "uh oh\n"; } else { print "oh no\n"; } __END__ oh no

    Then run as perl -Mre=debug uhohohno.pl and see what happens

    Then rewrite it to do what you want, see perlintro/perlrequick

Re: split and matching
by AnomalousMonk (Archbishop) on May 07, 2012 at 20:20 UTC

    The statement
        my $report_version_1.1 = $fields[3];
    in the OPed code and the subsequent statement
        print OUT "... $report_version_1.1 ...";
    lead me to suspect that we are not seeing the actual code that is being executed. The first statement will not compile under any circumstances, and the second will yield a warning if warnings are enabled as shown in the posted code.

    What's really going on here?

Re: split and matching
by aaron_baugher (Curate) on May 07, 2012 at 20:58 UTC

    I probably wouldn't use perl for this, since it's a straightforward grep task. Maybe this will help you see what you need to do:

    grep -v 'sno=;' <infile | grep -v 'tail->tail' | grep -v 'head->head' +>outfile

    (Yes, I know those can be combined into a single grep with a more complicated test, but so can they be combined into a single regex in Perl. To me, a pipeline of simple greps is easier to create and to understand later.)

    Aaron B.
    My Woefully Neglected Blog, where I occasionally mention Perl.