Processing words in a file.

brtch has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have some file with lines in this format: xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx where x is an alphabet, y is a number. CATCH: The only thing that's standard across the file are the spaces and the / charecter. i.e, the first word can be of 3 or 4 or 5 charecters long. Can you help me find a perl code that takes each of the strings and compares with strings (i have),i.e, say if the first string in the line == Feb.

Comment on Processing words in a file.

Replies are listed 'Best First'.
Re: Processing words in a file. by kcott (Archbishop) on Feb 12, 2013 at 07:24 UTC
G'day brtch, Welcome to the monastery. Providing a little more context to your question would have been preferable. I'm assuming the first three fields are: `month day hours:minutes:seconds`. I'll leave you to extrapolate from there. Perl has different operators for string and numerical comparisons. '`==`' is the numerical equality operator; '`eq`' is for strings. See perlop for all the different operators; perlop - Equality Operators specifically discusses '`==`' and '`eq`'. How you go about breaking up your line for comparison will depend on how much detail you want (e.g. do you want to look at '`yy:yy:yy`' as a whole or are you interested in the subfields). I see two main options you might pursue: using the split function or using a regular expression. Using `split` can be as simple as: `$ perl -Mstrict -Mwarnings -E ' my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my @fields = split / / => $line; say $fields[2]; ' yy:yy:yy` [download] The problem with this level of simplicity is when further down your code you hit `$fields[7]` and have to backtrack to determine which field index `7` refers to. Ways around this include giving symbolic names to the indices or capturing each field into a meaningfully named variable: `$ perl -Mstrict -Mwarnings -E ' use constant { MONTH => 0, DAY => 1, TIME => 2, }; my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my @fields = split / / => $line; say $fields[TIME]; ' yy:yy:yy` [download] `$ perl -Mstrict -Mwarnings -E ' my $line = q{xxx yy yy:yy:yy xxx/xxxxx xxx xxxx xxx xxx}; my ($month, $day, $time, $rest) = split / / => $line; say $time; ' yy:yy:yy` [download] If you want to get at the subfields, then a regular expression solution might be better: `$ perl -Mstrict -Mwarnings -E ' my $line = q{xxx 1 12:34:56 xxx/xxxxx xxx xxxx xxx xxx}; my $line_re = qr{^(\w+) (\d+) (\d+):(\d+):(\d+) (.)}; my ($month, $day, $hour, $min, $sec, $rest) = $line =~ m{$line_re}; say $hour; ' 12` [download] All of those parts in parentheses are called Capture Groups. The link I've provided discusses these (as well as Named Capture Groups* which I'll leave you to research if you're interested). -- Ken	[reply] [d/l] [select]
Re^2: Processing words in a file. by brtch (Initiate) on Feb 12, 2013 at 09:44 UTC
Thanks Monks, The issue got resolved.	[reply]
Re: Processing words in a file. by vinoth.ree (Monsignor) on Feb 12, 2013 at 05:39 UTC
I guess you need first word of each line from the file `use strict; use warnings; open FH, '<', "filename.txt" or die "Can not open file $!"; while(<FH>) { my ($first_word) = $_ =~ /^(\w+)/; if ($first_word eq 'your string') { #Your wish. } }` [download]	[reply] [d/l]
Re: Processing words in a file. by frozenwithjoy (Priest) on Feb 12, 2013 at 07:10 UTC
Are you interested in checking all strings or just the first string? Could you include a few lines from an actual file and indicate your desired output? Thanks!	[reply]
Re^2: Processing words in a file. by brtch (Initiate) on Feb 12, 2013 at 07:16 UTC
Intrested in all strings....	[reply]
Re^3: Processing words in a file. by brtch (Initiate) on Feb 12, 2013 at 07:19 UTC
Also, I need a condition like this: If, $first_word == xyz, followed by if $second_word == abc as the lines have same format, I want the loop to be on a per line basis. ---LOOP SHOULD BE ON PER LINE BASIS	[reply]

Back to Seekers of Perl Wisdom