No such thing as a small change

Regular Expression problem when Extracting Start\ VALUE \End

by gasho (Beadle)
on Sep 30, 2005 at 14:19 UTC

gasho has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to come up with universal code :) that will extract value between $StartTag value $EndTag from single line. I am having problem when special characters are involved.
my $line = <DATA>; my @wanted_substrings=(); #No Problem #my $StartTag='START'; #my $EndTag='END'; #Error Unmatched ) in regex; marked by <-- HERE in m/TRicky\(.*?) #my $StartTag="TRicky\\"; #my $EndTag="\\endTricky"; #No Error but no value VALUE #my $StartTag="Next\$"; #my $EndTag="\^Next"; #No Error but no value VALUE my $StartTag="Last\+"; my $EndTag="\+some"; if ($line=~/$StartTag(.*?)$EndTag/g) { push(@wanted_substrings,$1) ; } print join "\n", @wanted_substrings; __DATA__ CharSTARTanotherENDCharTRicky\VALUEE\endTrickyNext$VALUE^NextLast+VALU +E+some
#Forgot to mention if I do not use $StartTag or $EndTag # and insted use actual string that it will work. #Instead if ($line=~/$StartTag(.*?)$EndTag/g) #This one works if ($line=~/TRicky\\(.*?)\\endTricky/g) #Problem is that I have to use varialble $ because #I am using it as an arg in my sub sub getInfoFromSingleLineMultiLineFile { #$stag,$etag uses as arguments my ($InputFile,$stag,$etag)=@_; my ($line,@wanted_substrings); #Openning file for reading open(IFH,"$InputFile") || die "Can't open file: $InputFile\n"; while($line=<IFH>) { if ($line =~ m/$stag(.*?)$etag/g) { push(@wanted_substrings,$1); } } return @wanted_substrings; }
Thanks in advance Gasho

Re: Regular Expression problem when Extracting Start\ VALUE \End
by japhy (Canon) on Sep 30, 2005 at 14:35 UTC
    Backslashes are a pain in the back. Slash. The problem is that your regex ends up being /TRicky\(.*?)\endTricky/ because your variables interpolate. When that gets compiled as a regex, it's a problem because the trailing backslash of "TRicky\" has escaped the opening parenthesis. I would suggest using my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/; The qr// operator will keep things properly backslashed later, because the content is treated like a regex.

      Thank you all for quick responses
      #Works fine my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/;
Re: Regular Expression problem when Extracting Start\ VALUE \End
by philcrow (Priest) on Sep 30, 2005 at 14:31 UTC
    Why not just make sure the string is a single line (say with split) then use extract_tagged from Text::Balanced. Unless you are just trying to teach yourself regexes, this module is ideal.


      I got an error when tried to use Text::Balanced I verified that I have under /lib/Text Thanks
      use Text::Balanced; $text='blabla<Else><LogEntry message="FAIL TESTCASE "/><FailTestCase/> +</Else>blabla'; ($extracted, $remainder) = extract_tagged($text); print $extracted; #Error #Undefined subroutine &main::extract_bracketed called at C:\InstallV3\
        Text::Balances does not export functions into the main namespace by default. This means you have two options. First, you could ask for the function by name:
        use Text::Balanced qw( extract_tagged ); # The rest of your code from above here.
        This will bring extract_tagged into your module's namespace.

        Alternatively, you could fully qualify the name:

        use Text::Balanced; my $text = 'sometexthere'; ($extracted, $remainder) = Text::Balanced::extract_tagged($text);
Re: Regular Expression problem when Extracting Start\ VALUE \End
by salva (Canon) on Sep 30, 2005 at 14:41 UTC
    use quotemeta to escape special regex chars on the start and end strings:
    my $StartTag = quotemeta("Last+"); my $EndTag = quotemeta("+some");
Re: Regular Expression problem when Extracting Start\ VALUE \End
by injunjoel (Priest) on Sep 30, 2005 at 17:15 UTC
    Greetings all,
    In the spirit of TIMTOWTDI my suggestion is to use \Q\E.
    sub getInfoFromSingleLineMultiLineFile { #args and file opening... while($line = <IFH>){ if($line =~ /\Q$stag\E(.*?)\Q$etag\E/){ push(@wanted_substrings,$1); } } return @wanted_substrings; }
    That should get you what you want.

