http://www.perlmonks.org?node_id=1018295

brtch has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have some file with lines in this format: xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx where x is an alphabet, y is a number. CATCH: The only thing that's standard across the file are the spaces and the / charecter. i.e, the first word can be of 3 or 4 or 5 charecters long. Can you help me find a perl code that takes each of the strings and compares with strings (i have),i.e, say if the first string in the line == Feb.

Replies are listed 'Best First'.
Re: Processing words in a file.
by kcott (Archbishop) on Feb 12, 2013 at 07:24 UTC

    G'day brtch,

    Welcome to the monastery.

    Providing a little more context to your question would have been preferable. I'm assuming the first three fields are: month day hours:minutes:seconds. I'll leave you to extrapolate from there.

    Perl has different operators for string and numerical comparisons. '==' is the numerical equality operator; 'eq' is for strings. See perlop for all the different operators; perlop - Equality Operators specifically discusses '==' and 'eq'.

    How you go about breaking up your line for comparison will depend on how much detail you want (e.g. do you want to look at 'yy:yy:yy' as a whole or are you interested in the subfields). I see two main options you might pursue: using the split function or using a regular expression.

    Using split can be as simple as:

    $ perl -Mstrict -Mwarnings -E ' my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my @fields = split / / => $line; say $fields[2]; ' yy:yy:yy

    The problem with this level of simplicity is when further down your code you hit $fields[7] and have to backtrack to determine which field index 7 refers to. Ways around this include giving symbolic names to the indices or capturing each field into a meaningfully named variable:

    $ perl -Mstrict -Mwarnings -E ' use constant { MONTH => 0, DAY => 1, TIME => 2, }; my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my @fields = split / / => $line; say $fields[TIME]; ' yy:yy:yy
    $ perl -Mstrict -Mwarnings -E ' my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my ($month, $day, $time, $rest) = split / / => $line; say $time; ' yy:yy:yy

    If you want to get at the subfields, then a regular expression solution might be better:

    $ perl -Mstrict -Mwarnings -E ' my $line = q{xxx 1 12:34:56 xxx/xxxxx xxx xxxx xxx xxx}; my $line_re = qr{^(\w+) (\d+) (\d+):(\d+):(\d+) (.*)}; my ($month, $day, $hour, $min, $sec, $rest) = $line =~ m{$line_re}; say $hour; ' 12

    All of those parts in parentheses are called Capture Groups. The link I've provided discusses these (as well as Named Capture Groups which I'll leave you to research if you're interested).

    -- Ken

      Thanks Monks, The issue got resolved.
Re: Processing words in a file.
by vinoth.ree (Monsignor) on Feb 12, 2013 at 05:39 UTC

    I guess you need first word of each line from the file

    use strict; use warnings; open FH, '<', "filename.txt" or die "Can not open file $!"; while(<FH>) { my ($first_word) = $_ =~ /^(\w+)/; if ($first_word eq 'your string') { #Your wish. } }
Re: Processing words in a file.
by frozenwithjoy (Priest) on Feb 12, 2013 at 07:10 UTC
    Are you interested in checking all strings or just the first string? Could you include a few lines from an actual file and indicate your desired output? Thanks!
      Intrested in all strings....
        Also, I need a condition like this: If, $first_word == xyz, followed by if $second_word == abc as the lines have same format, I want the loop to be on a per line basis. ---LOOP SHOULD BE ON PER LINE BASIS