Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
  1. Localize *READ before opening a file on it.
  2. Do your own error processing like it says to in perlstyle. (eg ...or die "Cannot read $file: $!")
  3. Use the third -1 argument to split as documented in "perldoc -f split".
  4. I have been bitten moving between Linux and Windows by DOS endings. Instead of chomp I like to s/\r?\n$//. YMMV.
  5. Should two rows have the same first entry you are silently overwriting data. This can be a Bad Thing. Either put in an error check or use an array instead of a hash for the rows.
  6. If you are incrementing a variable, use a for() loop instead of while(). It may work the same, but the person maintaining it finds the looping construct more obvious.
  7. Switch the order in the hash. You said elsewhere you think about it one way. My experience says that that decision will come back to bite you. The only way that you will find yourself wanting to access that data structure is row by row. If that is not a good enough reason for you, then let me tell you that if you reverse it then you can later choose to extract out references to each row and access them directly. The double hash lookup is slower by a factor of 2 and forces you to write more code everywhere.
  8. Document how this function works. A short comment helps immensely.
  9. The idea behind this code will never support the full CSV spec or anything close to it. Document that limitation.

OK, let me put my wallet where my mouth is and show you a hasty rewrite that takes all of that into account:

=pod =item open_flatfile Takes the name of a pseudo-CSV flatfile and an optional delimiter as a +rgs. (The delimiter can be a regular expression created with qr//.) It ope +ns the file, uses the first line a a header, and returns the data as an array of hashes. This will not handle CSV files with escaped fields. =cut sub open_flatfile { my $file = shift; my $delim = shift || "\t"; local *FH; open (FH, "<$file") or die "Cannot read $file: $!"; my @contents = <FH>; close (FH); s/\r?\n$// foreach @contents; my @header = split ($delim, (shift @contents), -1); # Create an anonymous sub to do the work my $extract_row = sub { my @cols = split($delim, shift(@_), -1); my %row = map {($header[$_], $cols[$_])} 0..$#header; return \%row; }; return map {$extract_row->($_)} @contents; }
Cheers,
Ben

In reply to A few style suggestions by tilly
in thread Another flatfile thingy by BBQ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-24 22:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found