Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Regex match last

by Anonymous Monk
on Aug 20, 2012 at 10:09 UTC ( #988422=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have the following text:


I want to match only when <c:anything/> ie only when / comes before >. Please suggest a regex .The regex I tried was:


But it is not matching. Help

Replies are listed 'Best First'.
Re: Regex match last
by ww (Archbishop) on Aug 20, 2012 at 10:38 UTC


    C:\>perl -E "my $str='<c:t="AD2343"/><c:p>65677676</c:p>'; if ( $str = +~ m|(<c:.*)?(?:[/>]{2})| ) {say $1;}" <c:t=AD2343

    Your code asks the regex engine to match a 'c', a colon and any number of anything thereafter(except newlines).

    Because you didn't provide the actual code, we can't be sure just what other issues may be in play... such as the previously mentioned use of an alternate regex marker.

    Update (based on the redefined problem in Re^2: Regex match last): You'll probably have fewer problems in the long run if you use an html parser of one flavor or another, rather than trying to parse html with regexen.

      I am just capturing a group in the xml and replacing it with another group. I did'nt think that is parsing. Is it?

        It's called parsing whenever you want to do anything with (specific bits of) the data you read :)
Re: Regex match last
by moritz (Cardinal) on Aug 20, 2012 at 10:18 UTC
      i want to match only <c/> not </c> or <c>.
Re: Regex match last
by Neighbour (Friar) on Aug 20, 2012 at 10:14 UTC
    Try this. It uses @ as alternate regex-character (instead of /) and captures any (single) tag within a line.
    #!/usr/bin/perl use strict; use warnings; my $data = '<c:t="AD2343"/><c:p>65677676</c:p>'; if ($data =~ m@.*(<c:[^>]*/>).*@) { print("Match: [$1]\n"); }

      This does not match the required text in the following code :

      <w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Standard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>

      I want only these to be matched :

      1<w:pStyle w:val="Standard"/> 2<w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" +w:cstheme="minorHAnsi"/> 3<w:sz w:val="20"/> 4<w:r w:rsidRPr="00AD741F"/>

        #! perl -slw use strict; m[(<[^/>]+/>)] and print "'$1'" while <DATA>; __DATA__ <w:body> <w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7 +" w:rsidP="00A654E7"> <w:pPr> <w:pStyle w:val="Standard"/> <w:rPr> <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cs +theme="minorHAnsi"/> <w:sz w:val="20"/> <w:szCs w:val="20"/> </w:rPr> </w:pPr> <w:r w:rsidRPr="00AD741F"/> </w:body>


        C:\test>junk23 '<w:pStyle w:val="Standard"/>' '<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthe +me="minorHAnsi"/>' '<w:sz w:val="20"/>' '<w:szCs w:val="20"/>' '<w:r w:rsidRPr="00AD741F"/>'

        BTW: Your sample XML is broken. The second level tag, <w:p ...> is never closed which will break strict XML parsers

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        No kidding. Your original post specified <c:.../> and your current 'does not match' code uses <w:.../>. You asked for a tool to turn a cross-head screw into a board, and when given a cross-head screwdriver say 'ok, but I cannot use it to turn this slotted screw'.


        In that case, you'll want:
        #!/usr/bin/perl use strict; use warnings; my $data = '<w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsi +dRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Stand +ard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorH +Ansi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/>< +/w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>'; print "Matches found:\n" . join (",\n", $data =~ m@(<[^>]*/>)@g) . "\n +";
        Matches found: <w:pStyle w:val="Standard"/>, <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthem +e="minorHAnsi"/>, <w:sz w:val="20"/>, <w:szCs w:val="20"/>, <w:r w:rsidRPr="00AD741F"/>
        show your code
Re: Regex match last
by BillKSmith (Vicar) on Aug 20, 2012 at 13:08 UTC

    Your regex does match your sample string! (But not in the way you want.) A very good reason to use a module.

    A Simple fix is:

    use strict; use warnings; use English; my $html = '<c:t="AD2343"/><c:p>65677676</c:p>'; $html =~ m{<c:.*/>}; print $MATCH, "\n";
Re: Regex match last
by Anonymous Monk on Aug 20, 2012 at 12:45 UTC
    How many times will the substring appear? Not many? Great. Put it in a while-loop with the 'g' modifier and be done.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://988422]
Approved by moritz
[Corion]: ambrus: He isn't familiar with GIGO (or nIGO) yet?
[Corion]: Also, is it impossible in the general case, but doable in your specific case, maybe? I find that working through a counterexample usually makes people see the light
[Corion]: Uiiih! Let's Encrypt will start issuing wildcard certificates, that's cool!

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2017-12-12 13:07 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (332 votes). Check out past polls.