Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Regex problem

by mboudreau (Acolyte)
on Jun 10, 2010 at 21:27 UTC ( #844127=perlquestion: print w/replies, xml ) Need Help??
mboudreau has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have been staring at this problem all day, and my error refuses to get up and wave its arms at me.

I'm writing a perl script to munge an XML DTD (for reasons we needn't go into). One of the script's tasks is to find parameter entity definitions that are empty and remove the references to those entities from any element definitions.

# $entity = the name of the entity (already initialized) # $text = the full DTD (already initialized) # comment out the empty entity definition # e.g., <!ENTITY % foo " " > # so far this section works fine, wrapping the empty entity definition + in comment tags if ( $text =~ /(<!ENTITY\s+%\s+$entity\s+\"\s+\"\s+>)/ ) { print "Commenting out empty entity '$entity'\n"; my $entity_def = $1; $text =~ s/$entity_def/<!--\n$entity_def\n-->/; } # here's the problem: I want to find # <!ELEMENT bar (#PCDATA %foo;)* > # and remove the "%foo;" if ( $text =~ /(<!ELEMENT\s+(\S+)\s+\(.*?$entity.+?\).+?>)/ ) { my $element_def = $1; my $element_name = $2; print "Found empty entity '$entity' in $element_name " . "content model: $element_def\n"; my $new_element_def = $element_def; $new_element_def =~ s/\|?\s*%$entity;//; print "Looking for $element_def\n"; # this NEVER PRINTS--WHY? print "Found it!\n" if $text =~ /$element_def/; }

I always manage to capture the element definition in $element_def, but my final regex never works, and I can't figure out why.

Replies are listed 'Best First'.
Re: Regex problem (XML)
by toolic (Bishop) on Jun 10, 2010 at 23:17 UTC
    Update: Jenda is correct in that I jumped the gun on this one. I presumed an XML parser would be able to handle this, but I have no proof. I will defer to Jenda's expertise in this matter. I apologize for making an irresponsible assertion.

    Use an XML parser.

    XML parsing vs Regular expressions

      Now I would be most interested in seeing how you would use a XML parser to modify a DTD. Especially considering the fact that DTD is not XML. Most interested.

      Sometimes shouting "use a XML parser" whenever you notice the "XML" keyword doesn't cut it.

      Enoch was right!
      Enjoy the last years of Rome.

Re: Regex problem
by choroba (Bishop) on Jun 10, 2010 at 21:50 UTC
    I am not sure how element definitions look like in DTDs, but would not /\Q$element_def/ solve your problem?
      Bingo! I was ignoring the fact that the element definition contains regex metacharacters. I haven't had an occasion to use \Q before. Thanks!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://844127]
Approved by toolic
[LanX]: Jeez ... the new location for GPW so remote that hotels are cheaper than AirBnB oO
[LanX]: ... and I can get faster to the Dutch PW or French PW ...
LanX ...(by train from Frankfurt)

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (11)
As of 2018-03-19 11:50 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (239 votes). Check out past polls.