Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Syntactic Confectionery Delight
 
PerlMonks  

Regex match last

by Anonymous Monk
on Aug 20, 2012 at 10:09 UTC ( #988422=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have the following text:

<c:t="AD2343"/><c:p>65677676</c:p>

I want to match only when <c:anything/> ie only when / comes before >. Please suggest a regex .The regex I tried was:

<c:.*[/$]>

But it is not matching. Help

Comment on Regex match last
Select or Download Code
Re: Regex match last
by Neighbour (Friar) on Aug 20, 2012 at 10:14 UTC
    Try this. It uses @ as alternate regex-character (instead of /) and captures any (single) tag within a line.
    #!/usr/bin/perl use strict; use warnings; my $data = '<c:t="AD2343"/><c:p>65677676</c:p>'; if ($data =~ m@.*(<c:[^>]*/>).*@) { print("Match: [$1]\n"); }

      This does not match the required text in the following code :

      <w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Standard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>

      I want only these to be matched :

      1<w:pStyle w:val="Standard"/> 2<w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" +w:cstheme="minorHAnsi"/> 3<w:sz w:val="20"/> 4<w:r w:rsidRPr="00AD741F"/>
        show your code

        #! perl -slw use strict; m[(<[^/>]+/>)] and print "'$1'" while <DATA>; __DATA__ <w:body> <w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7 +" w:rsidP="00A654E7"> <w:pPr> <w:pStyle w:val="Standard"/> <w:rPr> <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cs +theme="minorHAnsi"/> <w:sz w:val="20"/> <w:szCs w:val="20"/> </w:rPr> </w:pPr> <w:r w:rsidRPr="00AD741F"/> </w:body>

        Outputs:

        C:\test>junk23 '<w:pStyle w:val="Standard"/>' '<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthe +me="minorHAnsi"/>' '<w:sz w:val="20"/>' '<w:szCs w:val="20"/>' '<w:r w:rsidRPr="00AD741F"/>'

        BTW: Your sample XML is broken. The second level tag, <w:p ...> is never closed which will break strict XML parsers


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        In that case, you'll want:
        #!/usr/bin/perl use strict; use warnings; my $data = '<w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsi +dRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Stand +ard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorH +Ansi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/>< +/w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>'; print "Matches found:\n" . join (",\n", $data =~ m@(<[^>]*/>)@g) . "\n +";
        Output:
        Matches found: <w:pStyle w:val="Standard"/>, <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthem +e="minorHAnsi"/>, <w:sz w:val="20"/>, <w:szCs w:val="20"/>, <w:r w:rsidRPr="00AD741F"/>

        No kidding. Your original post specified <c:.../> and your current 'does not match' code uses <w:.../>. You asked for a tool to turn a cross-head screw into a board, and when given a cross-head screwdriver say 'ok, but I cannot use it to turn this slotted screw'.

        --MidLifeXis

Re: Regex match last
by moritz (Cardinal) on Aug 20, 2012 at 10:18 UTC
      i want to match only <c/> not </c> or <c>.
Re: Regex match last
by ww (Bishop) on Aug 20, 2012 at 10:38 UTC

    Alternate:

    C:\>perl -E "my $str='<c:t="AD2343"/><c:p>65677676</c:p>'; if ( $str = +~ m|(<c:.*)?(?:[/>]{2})| ) {say $1;}" <c:t=AD2343

    Your code asks the regex engine to match a 'c', a colon and any number of anything thereafter(except newlines).

    Because you didn't provide the actual code, we can't be sure just what other issues may be in play... such as the previously mentioned use of an alternate regex marker.

    Update (based on the redefined problem in Re^2: Regex match last): You'll probably have fewer problems in the long run if you use an html parser of one flavor or another, rather than trying to parse html with regexen.

      I am just capturing a group in the xml and replacing it with another group. I did'nt think that is parsing. Is it?

        It's called parsing whenever you want to do anything with (specific bits of) the data you read :)
Re: Regex match last
by Anonymous Monk on Aug 20, 2012 at 12:45 UTC
    How many times will the substring appear? Not many? Great. Put it in a while-loop with the 'g' modifier and be done.
Re: Regex match last
by BillKSmith (Hermit) on Aug 20, 2012 at 13:08 UTC

    Your regex does match your sample string! (But not in the way you want.) A very good reason to use a module.

    A Simple fix is:

    use strict; use warnings; use English; my $html = '<c:t="AD2343"/><c:p>65677676</c:p>'; $html =~ m{<c:.*/>}; print $MATCH, "\n";
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://988422]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2014-04-18 01:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (460 votes), past polls