Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Need a regex to replace incomplete html entities

by Laurent_R (Canon)
on Nov 20, 2016 at 12:02 UTC ( [id://1176202]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Need a regex to replace incomplete html entities
in thread Need a regex to replace incomplete html entities

If I understand you correctly, the important difference is the semi-colon: you want to replace &#38, but not if it is followed by a semi-colon (i.e. you don't want to replace &). The poor formatting in your post made it difficult to understand that.

The easy solution is to use a negative look-ahead, as already suggested in other posts, but I doubt that sed supports look-ahead assertions (it may depend which version).

Besides, even for a 200 MB file, this should not be a problem in Perl. Last time I compared the performance of Perl and sed, I did not find a really significant performance difference between them, but, again, this may depend on the implementation of the sed version you're using.

Replies are listed 'Best First'.
Re^4: Need a regex to replace incomplete html entities
by Chris Daniel (Novice) on Nov 20, 2016 at 14:45 UTC
    You got correct Laurent. Thanks for the update and look ahead assertion.

    The reason I focus on sed command is, I want to parse the xml file which has similar multiple <Remarks> tag.

    But since file consist of incomplete html entities, parser is not able to parse the file.
    Hence I was planning to use sed command to replace the code and then parse it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1176202]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found