Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Reading multiple lines?

by lolindrath (Scribe)
on Nov 29, 2000 at 05:19 UTC ( #43811=note: print w/ replies, xml ) Need Help??


in reply to Reading multiple lines?

Hmm, it kind of seems like you're trying to reinvent a database system. They read in only one record from a disk at a time. Any qualms against using a database?

--=Lolindrath=--


Comment on Re: Reading multiple lines?
Re: Re: Reading multiple lines?
by rdw (Curate) on Nov 29, 2000 at 14:42 UTC

    No qualms at all, I'm trying to get some existing data into a database, and I've got a lot of it. The file structure is a bit odd, and I need to read in N lines of header, then M lines of secondary data, before looping through line by line for a while and then going back to the header structure.

    I don't want to read it all into memory because the file is about 160Mb with about 8 million lines. The header is always a fixed number of lines, the secondary data is optional but a fixed number of lines and the bulk of the data is usually somewhere between 100 and 10,000 lines.

    I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad.

    Have fun,

    rdw

      I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad.
      OK, I'll comment on that. In my mind map is a way to go from X to f(X) for a bunch of X's. If f(X) doesn't depend on X, it makes my brain go tilt a bit, but I can probably get used to it. Hence, I'll almost certainly try a different solution before I accept the void-arg map alternative.

      Hmm. What you really probably have is a state machine. I could see a big Switch statement based on state (reading header A, reading header B, in the body) with eof(IN) at the top, and if eof is detected while in header A or B, then carp out.

      See, I get worried about when the unusual happens. Maybe it's just my 30 years of programming, but any time I see someone write a "read 10 lines here" loop, I think "what if there aren't 10 lines?". That's what makes me good at QA. :)

      -- Randal L. Schwartz, Perl hacker

        Thanks for that - I appreciate your comments. I'll confess to being a bit of a map abuser - I tend to use it whenever I need to create a list or hash, although I do try to comment serious abuse whenever possible.

        Your QA point is a good one too - I work for a well known and well established website built almost entirely with perl and going through some old code I've found all sorts of mistakes - now I'm trying to import all the millions of lines of bad data it's created into a replacement system.

        Most of the mistakes are due to bad assumptions, often using regexps to match parts of strings out, but never testing whether the match was successful and getting a previous value of $1 or something. I sometimes wish that there were much more warnings about that sort of thing.

        Have fun,

        rdw

      would it faster just lynux/unix command line? tail $fileName -n $start | head -n $length where $start =100 and $length =10000-100 So that no readin file and thus not much memory used. I used this to fetch block of lines within a file with more than 100 million lines. Average time to get results was ~ 1 minute.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://43811]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2014-12-22 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls