No qualms at all, I'm trying to get some existing data
into a database, and I've got a lot of it. The file structure
is a bit odd, and I need to read in N lines of header,
then M lines of secondary data, before looping through
line by line for a while and then going back to the
header structure.
I don't want to read it all into memory because
the file is about 160Mb with about 8 million lines. The
header is always a fixed number of lines, the secondary
data is optional but a fixed number of lines and the
bulk of the data is usually somewhere between 100 and
10,000 lines.
I was just surprised that this wasn't as easy / neat
to do as I expected. I'm quite pleased with my original
map one liner, but nobody has really commented on whether
it was really all that bad.
Have fun,
rdw
| [reply] |
I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad.
OK, I'll comment on that. In my mind map is a way to go from X to f(X) for a bunch of X's. If f(X) doesn't depend on X, it makes my brain go tilt a bit, but I can probably get used to it. Hence, I'll almost certainly try a different solution before I accept the void-arg map alternative.
Hmm. What you really probably have is a state machine. I could see a big Switch statement based on state (reading header A, reading header B, in the body) with eof(IN) at the top, and if eof is detected while in header A or B, then carp out.
See, I get worried about when the unusual happens. Maybe it's just my 30 years
of programming, but any time I see someone write a "read 10 lines here" loop,
I think "what if there aren't 10 lines?". That's what makes me good at QA. :)
-- Randal L. Schwartz, Perl hacker
| [reply] |
Thanks for that - I appreciate your comments. I'll
confess to being a bit of a map abuser - I
tend to use it whenever I need to create a list or
hash, although I do try to comment serious abuse
whenever possible.
Your QA point is a good one too - I work for a well
known and well established website built almost entirely
with perl and going through some old code I've found
all sorts of mistakes - now I'm trying to import all
the millions of lines of bad data it's created into
a replacement system.
Most of the mistakes are due to
bad assumptions, often using regexps to match parts of
strings out, but never testing whether the match was
successful and getting a previous value of $1 or something.
I sometimes wish that there were much more warnings about
that sort of thing.
Have fun,
rdw
| [reply] |
would it faster just lynux/unix command line?
tail $fileName -n $start | head -n $length
where $start =100 and $length =10000-100
So that no readin file and thus not much memory used. I used this to fetch block of lines within a file with more than 100 million lines. Average time to get results was ~ 1 minute.
| [reply] |