Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Regular Expression problem when Extracting Start\ VALUE \End

by gasho (Beadle)
on Sep 30, 2005 at 14:19 UTC ( [id://496428]=perlquestion: print w/replies, xml ) Need Help??

gasho has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to come up with universal code :) that will extract value between $StartTag value $EndTag from single line. I am having problem when special characters are involved.
my $line = <DATA>; my @wanted_substrings=(); #No Problem #my $StartTag='START'; #my $EndTag='END'; #Error Unmatched ) in regex; marked by <-- HERE in m/TRicky\(.*?) #my $StartTag="TRicky\\"; #my $EndTag="\\endTricky"; #No Error but no value VALUE #my $StartTag="Next\$"; #my $EndTag="\^Next"; #No Error but no value VALUE my $StartTag="Last\+"; my $EndTag="\+some"; if ($line=~/$StartTag(.*?)$EndTag/g) { push(@wanted_substrings,$1) ; } print join "\n", @wanted_substrings; __DATA__ CharSTARTanotherENDCharTRicky\VALUEE\endTrickyNext$VALUE^NextLast+VALU +E+some
#Forgot to mention if I do not use $StartTag or $EndTag # and insted use actual string that it will work. #Instead if ($line=~/$StartTag(.*?)$EndTag/g) #This one works if ($line=~/TRicky\\(.*?)\\endTricky/g) #Problem is that I have to use varialble $ because #I am using it as an arg in my sub sub getInfoFromSingleLineMultiLineFile { #$stag,$etag uses as arguments my ($InputFile,$stag,$etag)=@_; my ($line,@wanted_substrings); #Openning file for reading open(IFH,"$InputFile") || die "Can't open file: $InputFile\n"; while($line=<IFH>) { if ($line =~ m/$stag(.*?)$etag/g) { push(@wanted_substrings,$1); } } return @wanted_substrings; }
Thanks in advance Gasho

Replies are listed 'Best First'.
Re: Regular Expression problem when Extracting Start\ VALUE \End
by japhy (Canon) on Sep 30, 2005 at 14:35 UTC
    Backslashes are a pain in the back. Slash. The problem is that your regex ends up being /TRicky\(.*?)\endTricky/ because your variables interpolate. When that gets compiled as a regex, it's a problem because the trailing backslash of "TRicky\" has escaped the opening parenthesis. I would suggest using my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/; The qr// operator will keep things properly backslashed later, because the content is treated like a regex.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Thank you all for quick responses
      #Works fine my $StartTag = qr/TRicky\\/; my $EndTag = qr/\\endTricky/;
Re: Regular Expression problem when Extracting Start\ VALUE \End
by philcrow (Priest) on Sep 30, 2005 at 14:31 UTC
    Why not just make sure the string is a single line (say with split) then use extract_tagged from Text::Balanced. Unless you are just trying to teach yourself regexes, this module is ideal.

    Phil

      I got an error when tried to use Text::Balanced I verified that I have Balanced.pm under /lib/Text Thanks
      use Text::Balanced; $text='blabla<Else><LogEntry message="FAIL TESTCASE "/><FailTestCase/> +</Else>blabla'; ($extracted, $remainder) = extract_tagged($text); print $extracted; #Error #Undefined subroutine &main::extract_bracketed called at C:\InstallV3\ +Test.pl
        Text::Balances does not export functions into the main namespace by default. This means you have two options. First, you could ask for the function by name:
        use Text::Balanced qw( extract_tagged ); # The rest of your code from above here.
        This will bring extract_tagged into your module's namespace.

        Alternatively, you could fully qualify the name:

        use Text::Balanced; my $text = 'sometexthere'; ($extracted, $remainder) = Text::Balanced::extract_tagged($text);
        Phil
Re: Regular Expression problem when Extracting Start\ VALUE \End
by salva (Canon) on Sep 30, 2005 at 14:41 UTC
    use quotemeta to escape special regex chars on the start and end strings:
    my $StartTag = quotemeta("Last+"); my $EndTag = quotemeta("+some");
Re: Regular Expression problem when Extracting Start\ VALUE \End
by injunjoel (Priest) on Sep 30, 2005 at 17:15 UTC
    Greetings all,
    In the spirit of TIMTOWTDI my suggestion is to use \Q\E.
    sub getInfoFromSingleLineMultiLineFile { #args and file opening... while($line = <IFH>){ if($line =~ /\Q$stag\E(.*?)\Q$etag\E/){ push(@wanted_substrings,$1); } } return @wanted_substrings; }
    That should get you what you want.

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://496428]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-06-13 11:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.