first two words-pattern matching

shamala has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, i have a file whose contents are as below:

<first line> :ctx     ctx_3   ctx_3   040425          0       0       
+07-May-2004 03:29      07-May-2004 03:29        0.00    hobbit       
+   N/A     failed          PLE-QA N/A      0
<second line>        Not Published   466008691
<third line>plsql   tkmain_ps4      tkmain_ps4      040425          0 
+      0       07-May-2004 03:29       07-May-2004 03:29       0.00   
+ hobbit          N/A     failed PLE-QA   N/A     0
<...>        Not Published   466008691
<nth line>plsql   tkmain_pu3      tkmain_pu3      040425          0   
+    0       07-May-2004 03:29       07-May-2004 03:29       0.00    h
+obbit          N/A     failed PLE-QA   N/A     0
        Not Published   466008691
rdbms   tkmain_2n1      tkmain_2n1      040425          0       0     
+  07-May-2004 01:37       07-May-2004 01:37       0.00    hobbit     
+     N/A     failed PLE-QA   N/A     0
        Not Published   465894765
[download]

I want only the first two words(for ex:ctx ctx_53) from each of the lines that begin with the similar kind of words like the third line has - plsql tkmain_ps4 and i want this to be matched. I have written the following but isnt working...could any of you please help...i am a begginner at perl.

#! /usr/local/bin/perl
      #use strict;
       open(rerun,"rerunlrg")
       while(my $pattern=<rerun>){
        if(my $pattern=~ m/^(\w\s+\w\s).*?/){
        print "matched\n";
        print "$1\n";
        }
[download]

Thanks ppl

Edited by Chady -- added code tags around file contents.

Comment on first two words-pattern matching Select or Download Code

Replies are listed 'Best First'.
Re: first two words-pattern matching by davido (Cardinal) on May 12, 2004 at 06:14 UTC
Can we assume that `<nth line>` really is just your notation for the beginning of a new line, where lines end with "\n"? If that's the case: `use strict; # Don't comment it out. open RERUN, "rerunlrg" or die "Couldn't open input file.\n$!"; while ( my $line = <RERUN> ) { print "Matched:\n$1\n\n" if $line =~ m/^(\S+\s+\S+)/; } close RERUN;` [download] Go ahead and give that a try. One problem your regexp has is that you're not putting a quantifier after your \w metachars. That means that you're trying to match a single word character, followed by any positive amount of whitespace, followed by a single word character, followed by a single whitespace, followed by any amount of anything. You really want to be matching words of arbitrary length, presumably longer than a single character each. Another problem your script has is that you're not checking for success when you open the file. You should probably be invoking die if there is a failure to open the file. See the example I provided. You can read up on this issue in perlopentut. Dave	[reply] [d/l] [select]
Re: first two words-pattern matching by matija (Priest) on May 12, 2004 at 06:19 UTC
`^(\w\s+\w\s)` That matches a single character folowed by one or more spaces, folowed by another single character, folowed by one space. What you probably want is something along the lines of `^(\w+\s+\w+\b)` [download] Note that I removed the final space from your match by replacing it with \b - I assume you don't really need that final space.	[reply] [d/l] [select]
Re: first two words-pattern matching by TilRMan (Friar) on May 12, 2004 at 06:29 UTC
`use strict; use warnings; open RERUN, "rerunlrg" or die "Can't open rerunlrg: $!"; while (<RERUN>) { next if $. % 2 == 0; # Skip the second, fourth, ... line if (m[(\S+)\s+(\S+)]) { print "Matched: $1 $2\n"; } } close RERUN;` [download]	[reply] [d/l]
Re: Re: first two words-pattern matching by shamala (Acolyte) on May 12, 2004 at 06:46 UTC
Hey TilRman ..dat was neat!!! thanks	[reply]
Re: first two words-pattern matching by saskaqueer (Friar) on May 12, 2004 at 06:18 UTC
`while (<DATA>) { # We could use split( /\s+/, $_, 3 ) to explicitly # set the max split limit to 3, but split() is smart # enough to see we are capturing into 2 variables and # automatically sets the limit to 3. Smart perl! my ($first, $second) = split( /\s+/ ); print "Line $.: '$first $second'\n"; } __DATA__ this is the first line right here line two comes next way down here then third line is here fourth comes next` [download]	[reply] [d/l]
Re: first two words-pattern matching by snadra (Scribe) on May 12, 2004 at 10:52 UTC
Hello, your file seems to be a csv file. Wich is not seperated by commas but by tabs. But the seperator is not important. Maybe the Text::CSV module or other modules about csv can help you as well. snadra	[reply]


The stupid question is the question not asked
	PerlMonks