Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: scalable chomping

by ccn (Vicar)
on Oct 29, 2008 at 14:02 UTC ( #720229=note: print w/replies, xml ) Need Help??

in reply to scalable chomping

Read the file line by line using $/ = '+'.

    Apply regexps on each line to remove \n and replace record separators

This is onliner as example

perl -l -0x2B -pe 's/\n//g;s/[XYZ]/;/g' corruptedfile > recoveredfile
where 'X', 'Y', 'Z' are characters to be replaced with record separator ';'

Replies are listed 'Best First'.
Re^2: scalable chomping
by TGI (Parson) on Oct 29, 2008 at 17:14 UTC

    If X Y and Z can legitimately be in the file you are going to have to do more work. Keep track of values that you have "fixed" substitutions in, and what the original character was. You will then have a list of 'known suspect values' as well as a way to get the original value.

    The best approach (short of retrieval from a backup) would be to do as much parsing and sanity checking on the data as you process the file. Trivial/Obvious fixes can be automated, but anything questionable needs to be flagged and ask for human intervention.

    Good luck. I think you'll need it :/.

    TGI says moo

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://720229]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2023-03-31 12:08 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (75 votes). Check out past polls.