|
|
|
Your skill will accomplish what the force of many cannot |
|
| PerlMonks |
CSV headers. Feedback wantedby Tux (Canon) |
| on Feb 10, 2016 at 13:18 UTC ( [id://1154857]=perlmeditation: print w/replies, xml ) | Need Help?? |
|
Given small CSV data files or big(ger) CSV data files with a filter so that all of the data fits into memory, the Text::CSV_XS' csv function will most likely accomodate the common usage:
This function also supports the common attributes for new:
or even with shortcuts and aliasses:
If there is lots to process inside each row, not all rows would fit into memory, or the callback structure and options for csv will obscure the code, reverting to the low level interface is the only way to go:
Quite often a CSV data source has got one header line that holds the column names, which is easy to ask for in the csv funtion:
Or in low-level
This week I was confronted with a set of CSV files where the separator character was changing based on the content of the file. Oh, the horror! If the CSV file was expected to contain amounts, the program that did the export chose to use a ; separator and in other cases it used the default ,. IMHO the person that decided to do this should be fired without even blinking the eye. This implied that on opening the CSV data stream, I - as a consumer - had to know in advance what this specific file would be like. Which made me come up with a new thought: "If a CSV stream is supposed to have a header line that definess the column names, it is (very) unlikely that the column names will contain unpleasant characters like embedded newlines, semi-colons, or comma's. Remember, these are column names, not data rows. Not that it is prohibited to have header fields that have comma's or other non-word characters, but let us assume that it is uncommon enough to warrant support for easy of use." So I wanted to convert this:
where the $csv instance has to know what the separator is, to
which will do the same, but also detect and set the separator. where the new header method will read the first line of the already opened stream, detect the separator based on a default list of allowed separators, use the detected sparator to set sep_char for given $csv instance and use it to parse the line and return the result as a list. As this came to me as common practice, before you parse the rest of your CSV, I came up with a local method (not (yet) in Text::CSV_XS) that does this for me:
it even has some documentation :)
After two days of intensive use, I thought this might be useful to add to Text::CSV_XS so we all can profit, but I want to get it right from the start, so I ask for feedback (already got some from our local PM group) Let the bikeshedding commence ...
Things I envision in this function is to also auto-detect encoding when the line includes a BOM and set it to the stream using binmode or have some option to allow this new method to not only return the headers, but use them to set the column names:
Enjoy, Have FUN! H.Merijn
Back to
Meditations
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||