http://www.perlmonks.org?node_id=938408

packetstormer has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am looking for some advice on the best method to use to make changes to a csv file. An few example lines of the source:

john smith,York,22,leaf,second,901,enter john doe,Oxford,45,elm,first,901,enter

I will need to enter the file (which may be quite big) and check, say, the fourth, fifth and sixth field in each line. Then if they match certain values, replace them with other values.

I am not looking for any code just advice on the best way to go about it. For example, I think I could easily ready each line into an array, run an if block and then use a regex to replace the values on a line by line basis. I am concerned that this may not be the best approach.
Any suggestions?

Replies are listed 'Best First'.
Re: CSV Manipulation
by Tux (Canon) on Nov 16, 2011 at 16:12 UTC
Re: CSV Manipulation
by CountZero (Bishop) on Nov 16, 2011 at 19:19 UTC
    Or if you want to go a bit more "high level", look into DBD::CSV.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      This. If I have a choice, and I usually do, I use DBD::CSV. It makes me think about the data as data rather than a string, and has the added advantage of making transitioning to SQLite or a full-fledged RDBMS (DB2, Oracle, etc.) easier.

        One note here. The OP did not mention the size of the CSV data file. DBD::CSV uses Text::CSV_XS under the hood, but it will have to read the complete file into memory to be able to do any database-like operations. With a file of 2Gb, that might result in say 20Gb of memory use (perl overhead). When files are that big - again, I don't know how large the file of the OP is - switching to basic streamed IO processing is usually a lot easier.

        I fully agree though that DBD::CSV is the best step towards RDBMS's where those memory limits are not applicable (for the end-user script).

        YMMV

        update: I just did a quick test with the OP data extended to a 1Mb CSV file. Reading that into memory using getline_all () resulted in a 10Mb data structure (reported by Devel::Size::total_size ()).


        Enjoy, Have FUN! H.Merijn
Re: CSV Manipulation
by jandrew (Chaplain) on Nov 16, 2011 at 16:44 UTC

    In the spirit of TIMTOWTDI

    Since your file seems to only contain 7 fields per line with no extended free form text you may want to review the possibility of using Tie::File. I have found this module helpful when parsing files of the size that kill your memory limits. Then you could just split the line for testing and manipulation of values.

Re: CSV Manipulation
by furry_marmot (Pilgrim) on Nov 16, 2011 at 19:17 UTC
    If you're just processing each line separately, I would think the easiest way, especially for large files, would be
    1. open the first file
    2. open an output file
    3. then read a line from the first file,
    4. process the line, including converting to fields or editing the fields in place
    5. write the line to the new file.

    If no errors, the new file contains all the processed data.

    Text::CSV shows an example of testing a field in place right at the top of the perldoc. It's also the best tool for CSV files.

    --marmot