Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Search and replace again

by Anonymous Monk
on Apr 19, 2010 at 21:06 UTC ( #835594=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I thought I had my regular expressions and one liners under control, but found that I couldn't solve this relatively simple problem - back to newbiew status ! I have a file with lots of XML in it. I am looking for a particular string that might occur between two tags - I need to delete that string if it occurs. i.e. If the XML reads like this <image attrib="one" attrib2="two" scope="local"/> I want to delete the scope="local" string. If this string occurs outside the context of the <image> tag then I do NOT want to touch it Ideas ?

Replies are listed 'Best First'.
Re: Search and replace again
by kennethk (Abbot) on Apr 19, 2010 at 21:19 UTC
    What have you tried? What didn't work? See How do I post a question effectively?.

    When dealing with XML, it's generally easier to use pre-rolled solutions like XML::Simple or XML::Twig; however, you can accomplish your goal with character classes - note this assumes that > does not occur in string context - I don't remember what characters are valid in attribute strings:

    my $string = '<image attrib="one" attrib2="two" scope="local"/>'; $string =~ s/(<image\s[^>]*?)scope="local"/$1/g; print "$string\n"; __END__ <image attrib="one" attrib2="two" />

    See perlretut for more info on character classes.

      use strict; use warnings; use XML::Twig qw( ); binmode STDOUT; my $t = XML::Twig->new( twig_handlers => { 'image[@scope="local"]' => sub { $_->del_att('scope'); }, }, ); $t->parsefile($ARGV[0]); $t->flush();
Re: Search and replace again
by choroba (Bishop) on Apr 19, 2010 at 21:19 UTC
      use strict; use warnings; use XML::LibXML qw( ); my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($ARGV[0]); my $root = $doc->documentElement(); for my $node ($root->findnodes('//image[@scope="local"]')) { $node->removeAttribute('scope'); } binmode STDOUT; print $doc->toString();

      I'm not even gonna try an XML::Simple solution.

        I was curious why you binmode STDOUT in your code (and in your other example) - is it to ensure you have UNIX line endings (and not CRLFs) in the output if Perl is running on Windows?

Re: Search and replace again
by Anonymous Monk on Apr 19, 2010 at 23:50 UTC
    back to newbiew status !

    A newbie generally accepts the advice NOT to use regex to parse xml.

Re: Search and replace again
by Jenda (Abbot) on Apr 20, 2010 at 11:22 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', rules => { _default => 'raw', image => sub { my ($tag, $attr) = @_; delete $attr->{scope} if $attr->{scope} eq 'local'; return $tag => $attr; } } ); $parser->filterfile( $input_file, $output_file);

    Enoch was right!
    Enjoy the last years of Rome.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://835594]
Approved by kennethk
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2019-10-18 01:38 GMT
Find Nodes?
    Voting Booth?