Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Regex with malformed CSV files

by bluto (Curate)
on Jan 10, 2006 at 20:33 UTC ( #522309=note: print w/ replies, xml ) Need Help??


in reply to Regex with malformed CSV files

I'd probably just document the fact that Outlook is broken and not supported, or do what jZed suggests. You may be able to do a better job if you parse the problem lines yourself and examine individual fields for clues on where you are at in a "line".

For example, most fields will probably not have embedded newlines in them; a zip code will probably not have embedded quotes or commas and will probably be short; email addresses will tend to have '@' in them; quoted fields will probably not strech 100's of characters; etc.

If you enforce some rules like this, your parser may be able to determine most of the time where it is. Of course, you could go stark raving mad in a futile effort trying to figure out the perfect ruleset...


Comment on Re: Regex with malformed CSV files
Re^2: Regex with malformed CSV files
by Anonymous Monk on Jan 10, 2006 at 21:16 UTC
    well so far the above solution solved the malformed portion of outlook regarding newlines. to solve the problem of the embedded quotes i'm using...
    $line =~ s/(?<!,)"{1,2}(?!,)/""/g;
    basically just adds two quotes if it finds one, or two quotes that are not preceeded or followed by a (,) comma. This still won't solve a rule if there's one such as

    "my address is ",320 main street" virginia". but in all honesty... how often is it going to happen? :) thanks for the help everyone.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://522309]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (12)
As of 2014-07-23 12:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (142 votes), past polls