- Localize *READ before opening a file on it.
- Do your own error processing like it says to in perlstyle. (eg ...or die "Cannot read $file: $!")
- Use the third -1 argument to split as documented in "perldoc -f split".
- I have been bitten moving between Linux and Windows by DOS endings. Instead of chomp I like to s/\r?\n$//. YMMV.
- Should two rows have the same first entry you are silently overwriting data. This can be a Bad Thing. Either put in an error check or use an array instead of a hash for the rows.
- If you are incrementing a variable, use a for() loop instead of while(). It may work the same, but the person maintaining it finds the looping construct more obvious.
- Switch the order in the hash. You said elsewhere you think about it one way. My experience says that that decision will come back to bite you. The only way that you will find yourself wanting to access that data structure is row by row. If that is not a good enough reason for you, then let me tell you that if you reverse it then you can later choose to extract out references to each row and access them directly. The double hash lookup is slower by a factor of 2 and forces you to write more code everywhere.
- Document how this function works. A short comment helps immensely.
- The idea behind this code will never support the full CSV spec or anything close to it. Document that limitation.
OK, let me put my wallet where my mouth is and show you a hasty rewrite that takes all of that into account:
=pod
=item open_flatfile
Takes the name of a pseudo-CSV flatfile and an optional delimiter as a
+rgs.
(The delimiter can be a regular expression created with qr//.) It ope
+ns
the file, uses the first line a a header, and returns the data as an
array of hashes. This will not handle CSV files with escaped fields.
=cut
sub open_flatfile {
my $file = shift;
my $delim = shift || "\t";
local *FH;
open (FH, "<$file") or die "Cannot read $file: $!";
my @contents = <FH>;
close (FH);
s/\r?\n$// foreach @contents;
my @header = split ($delim, (shift @contents), -1);
# Create an anonymous sub to do the work
my $extract_row = sub {
my @cols = split($delim, shift(@_), -1);
my %row = map {($header[$_], $cols[$_])} 0..$#header;
return \%row;
};
return map {$extract_row->($_)} @contents;
}
Cheers,
Ben
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|