Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: how to strip XML into Plain Text file

by sleepingsquirrel (Hermit)
on Jan 26, 2005 at 00:21 UTC ( #425086=note: print w/ replies, xml ) Need Help??


in reply to how to strip XML into Plain Text file

perl -p -e 's/<[^>]*>//g' <foo.xml


-- All code is 100% tested and functional unless otherwise noted.


Comment on Re: how to strip XML into Plain Text file
Download Code
Re^2: how to strip XML into Plain Text file
by Fletch (Chancellor) on Jan 26, 2005 at 01:00 UTC

    ... <img alt="Next >>" src="../next_button.jpg" />*Boom*

    And this is why you use a real parser, not just a regex . . .

    Update: Just to clarify the above is a pathological case and if you're reasonably sure that it probably won't occur then go ahead and use the simple s///; but be aware that it's not bulletproof and know where to find the right tool when the sledgehammer doesn't cut it any more.

      Since we're being pedantic about it, is '>' actually allowed inside attribute values in XML?

        xmllint doesn't gripe about it:

        freebie:~ 677> cat foo.xml + 9:34:27 <?xml version="1.0" encoding="utf8" ?> <testing> <img alt="Next >>" src="../next_button.jpg" /> </testing> freebie:~ 678> xmllint --noout foo.xml + 9:34:29 freebie:~ 679> + 9:34:35

        Yes. Only < is not.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://425086]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2015-07-06 21:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (83 votes), past polls