Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: CSV Manipulation

by CountZero (Bishop)
on Nov 16, 2011 at 19:19 UTC ( #938444=note: print w/replies, xml ) Need Help??


in reply to CSV Manipulation

Or if you want to go a bit more "high level", look into DBD::CSV.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: CSV Manipulation
by Tanktalus (Canon) on Nov 16, 2011 at 20:09 UTC

    This. If I have a choice, and I usually do, I use DBD::CSV. It makes me think about the data as data rather than a string, and has the added advantage of making transitioning to SQLite or a full-fledged RDBMS (DB2, Oracle, etc.) easier.

      One note here. The OP did not mention the size of the CSV data file. DBD::CSV uses Text::CSV_XS under the hood, but it will have to read the complete file into memory to be able to do any database-like operations. With a file of 2Gb, that might result in say 20Gb of memory use (perl overhead). When files are that big - again, I don't know how large the file of the OP is - switching to basic streamed IO processing is usually a lot easier.

      I fully agree though that DBD::CSV is the best step towards RDBMS's where those memory limits are not applicable (for the end-user script).

      YMMV

      update: I just did a quick test with the OP data extended to a 1Mb CSV file. Reading that into memory using getline_all () resulted in a 10Mb data structure (reported by Devel::Size::total_size ()).


      Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938444]
help
Chatterbox?
[Corion]: Meh. My bank removed (part of) their CSV download facilities. Now I will either have to implement a full scraper or automate the download using the HBCI interface instead (or just get a new account elsewhere...)
[Corion]: On the upside, I spend a lot of time thinking this weekend about how to actually implement rate limiting for futures, and if things work out, maybe even loading a configuration from an external file makes sense
[Corion]: I've also found some interesting invariants that I have to think/write about more. A simple rate limiter will never change the order of the input, while a limiter that allows for parallel execution will change the order. But my API currently allows for bo
[Corion]: ... for both, and I'm not sure if I want to add the cruft from the parallel API (a token that you need to hold on to while you hold the lock) to the rate limiting API too, to allow seamless up/downgrades, or not.
[Corion]: Also, rate limiting will look great with await: my $token = await $limiter-> limit($hostname); instead of my $f = $limiter->limit( $hostname )->then(sub { my( $token)=@_; ... });

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2017-10-23 08:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My fridge is mostly full of:

















    Results (277 votes). Check out past polls.

    Notices?