Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Dirtiest Data

by jesuashok (Curate)
on Jun 13, 2006 at 13:27 UTC ( #555035=note: print w/replies, xml ) Need Help??

in reply to Dirtiest Data

Hi monks,

Even I had a very terrible experience with the Dirty data. There I faced a the folloing Issues :-
1) The record willl be splitted into multiple lines. sometimes it would be 3 lines, some times it would be more than that. Then I applied a Intelligence to my script to solve that Issue by finding a unique factor from the data file.

2) Then date filed in the data file willl be very horrible. some times it would be mmddyy or sometimes it would be yymmdd and so on. we got mad because of this data file and we found very difficult to load this data in Oracle. because oracle will not load the wrong dates. Perl helped a lot for solving all those types of issues.

"Keep pouring your ideas"

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://555035]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2020-01-28 08:27 GMT
Find Nodes?
    Voting Booth?