Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Regex problem

by mboudreau (Novice)
on Jun 10, 2010 at 21:27 UTC ( #844127=perlquestion: print w/ replies, xml ) Need Help??
mboudreau has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have been staring at this problem all day, and my error refuses to get up and wave its arms at me.

I'm writing a perl script to munge an XML DTD (for reasons we needn't go into). One of the script's tasks is to find parameter entity definitions that are empty and remove the references to those entities from any element definitions.

# $entity = the name of the entity (already initialized) # $text = the full DTD (already initialized) # comment out the empty entity definition # e.g., <!ENTITY % foo " " > # so far this section works fine, wrapping the empty entity definition + in comment tags if ( $text =~ /(<!ENTITY\s+%\s+$entity\s+\"\s+\"\s+>)/ ) { print "Commenting out empty entity '$entity'\n"; my $entity_def = $1; $text =~ s/$entity_def/<!--\n$entity_def\n-->/; } # here's the problem: I want to find # <!ELEMENT bar (#PCDATA %foo;)* > # and remove the "%foo;" if ( $text =~ /(<!ELEMENT\s+(\S+)\s+\(.*?$entity.+?\).+?>)/ ) { my $element_def = $1; my $element_name = $2; print "Found empty entity '$entity' in $element_name " . "content model: $element_def\n"; my $new_element_def = $element_def; $new_element_def =~ s/\|?\s*%$entity;//; print "Looking for $element_def\n"; # this NEVER PRINTS--WHY? print "Found it!\n" if $text =~ /$element_def/; }

I always manage to capture the element definition in $element_def, but my final regex never works, and I can't figure out why.

Comment on Regex problem
Download Code
Re: Regex problem
by choroba (Abbot) on Jun 10, 2010 at 21:50 UTC
    I am not sure how element definitions look like in DTDs, but would not /\Q$element_def/ solve your problem?
      Bingo! I was ignoring the fact that the element definition contains regex metacharacters. I haven't had an occasion to use \Q before. Thanks!
Re: Regex problem (XML)
by toolic (Chancellor) on Jun 10, 2010 at 23:17 UTC
    Update: Jenda is correct in that I jumped the gun on this one. I presumed an XML parser would be able to handle this, but I have no proof. I will defer to Jenda's expertise in this matter. I apologize for making an irresponsible assertion.

    Use an XML parser.

    XML parsing vs Regular expressions

      Now I would be most interested in seeing how you would use a XML parser to modify a DTD. Especially considering the fact that DTD is not XML. Most interested.

      Sometimes shouting "use a XML parser" whenever you notice the "XML" keyword doesn't cut it.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://844127]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2014-12-19 09:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (75 votes), past polls