Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Problems? Is your data what you think it is?
 
PerlMonks  

Re: problem with removing something in XML file

by Sandy (Deacon)
on Sep 18, 2009 at 16:47 UTC ( #796161=note: print w/ replies, xml ) Need Help??


in reply to problem with removing something in XML file

Normally, one should take the advice of previous suggestions before demanding more answers, but... nonetheless...

Don't know why your regular expression is so complicated.

Assuming that all <REF > statements are always complete on a single line...

XML File Before

</S></TEXT><TEXT><S Entail="142" s_id="0"> Annan urges return to democracy in <REF C-ENTID="Nepal" EXT="Nepal" ID +="104" S&#1058;YPE="PROPNAME">Nepal</REF></S> <S Entail="138-139-142" s_id="1"> UN Secretary General Kofi Annan on Tuesday expressed deep concern over + events in <REF A-CLASS="No-Reference" A-REFTYPE="Entity" C-ENTID="Nepal" EXT="Ne +pal" ID="105" S&#1058;YPE="PROPNAME">Nepal</REF> and urged a return to democracy, after <REF C-ENTID="King Gyanendra Bir Bikram" COMMENT="Coref direction is f +orward" EXT="King Gyanendra Bir Bikram" ID="100" S&#1058;YPE="APNAME" +> King Gyanendra Bir Bikram</REF> dismissed <REF A-CLASS="Entity-Entity" A-DIR="Backward" A-RELTYPE="Ide +ntity" A-RESTYPE="Intra" A-TYPE="Referential" ANT-ID="105" ID="101"> the country</REF> 's coalition government and imposed an indefinite st +ate of emergency. </S><S Entail="138-139-143" s_id="2">
perl one-liner (on DOS)
perl -pibak -e "s/<\/?REF.*?>//ig" junk.txt
Result:
</S></TEXT><TEXT><S Entail="142" s_id="0"> Annan urges return to democracy in Nepal</S> <S Entail="138-139-142" s_id="1"> UN Secretary General Kofi Annan on Tuesday expressed deep concern over + events in Nepal and urged a return to democracy, after King Gyanendra Bir Bikram dismissed the country 's coalition government and imposed an indefinite state of + emergency. </S><S Entail="138-139-143" s_id="2">
Sandy

UPDATE: Also assumes that there are no embedded ">" inside the REF tag


Comment on Re: problem with removing something in XML file
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://796161]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2014-04-19 18:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls