Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: CSV headers. Feedback wanted

by Tux (Canon)
on Feb 11, 2016 at 07:22 UTC ( [id://1154920]=note: print w/replies, xml ) Need Help??


in reply to Re: CSV headers. Feedback wanted
in thread CSV headers. Feedback wanted

As you are stating yourself already, this is no standard CSV, and no auto-detection would work. The new header method is intended for the majority of CSV data file that contain a sane header.

As this will be a new auxiliary / helper method that is not integrated in the normal flow, it is completely optional. Lets say it is just as optional as fragment or callbacks are, which are also there just to ease attacking specific problems in CSV parsing.

Your example is clear, and from real-life data. Not just some data invented to point at the weaknesses new functionality, but with this data, it is obvious that this new method is not something that would help you at all. This new method is not created to help you here.

In your case [fragment could help a lot: read the first line, detect the headers and the header count and then set fragment to only read the data part just before continuing reading the rest of the data.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^3: CSV headers. Feedback wanted
by bitingduck (Chaplain) on Feb 11, 2016 at 14:57 UTC
    It is legal CSV - the only thing goofy about it is that it has a semicolon on one of the places between commas, rather than useful header or data information. If I'm blindly using your new approach to read CSV files because, for example, I'm an unwitting recipient of the file, or because (more likely) I've forgotten about the format, I'd like the autodetect to warn me "Hey, this isn't quite standard CSV". I don't have a problem reading the data - we just read a whole line, use a regex to split it at the semicolon and away we go. It's intended as an example of something that will trip up a user of your autodetect. The lines in the file are long enough that a lazy user would look at the upper left and say "looks like CSV, I'll apply my generic CSV reader that use's Tux's nifty autodetect"
Re^3: CSV headers. Feedback wanted
by RonW (Parson) on Feb 12, 2016 at 23:57 UTC

    First, I agree that guessing separators in a weird situation is too risky. Also, I think that detecting the presence of multiple possible separators and throwing an error makes sense.

    Yes, I know that the data resulting from choosing the wrong separator should look weird, but a person might not realize what they are seeing. Throwing an error when an easily detectable "weirdness" is seen will help alert the user to the situation.

      Your point about looking weird to the reader reminds me of one incident that I forgot about that happened with the files. There's a notes field that happens to be too small to see much of anything, and one user put in a note that accidentally embedded a linefeed rather than entering the note due to differences in "return" vs "enter" behavior of Labview. It confused people for about a day until they got someone to put in a thumb drive at the source and extract a copy of the original data (most users didn't have much understanding of the data flow). So I'd definitely prefer a way to throw an error or warning for multiple separators.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1154920]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-25 10:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found