http://www.perlmonks.org?node_id=1058403

rjbioinf has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I frequently write scripts that parse simple tab-delimited or comma-delimited files that have been created by other people. Typically they are created in Excel and they can be from windows or OS X platforms. I often run into the problem that a simple filehandle loop ( while(<FH>) ) will only read either the first or final line of the file. This is because the new line characters are not always \n as Perl likes- \r for OS X and \r\n for Windows. Preferably I would like to not have to bother working out what the problem is for each file and change the characters to \n using dos2unix or whatever, so I was wondering if anyone can suggest a good solution for this; is there a recommended module that has a method that performs this function?

  • Comment on Best practice for reading delimited file

Replies are listed 'Best First'.
Re: Best practice for reading delimited file
by Tux (Canon) on Oct 16, 2013 at 08:04 UTC

    Simple answer! Text::CSV_XS or Text::CSV. Both automatically deal with line endings and have attributes to use TAB as sep-char.


    Enjoy, Have FUN! H.Merijn
Re: Best practice for reading delimited file
by farang (Chaplain) on Oct 16, 2013 at 19:37 UTC

    For v5.10 or later, generic newlines can be normalized with \R in a substitution expression.

    s/\R/\n/;
    See perlrebackslash under Misc section.

      Thank you all for the suggestions.  s/\R/\n/; looks the simplest solution.
Re: Best practice for reading delimited file
by Lennotoecom (Pilgrim) on Oct 16, 2013 at 09:32 UTC
    ./script.pl file1 file2 file3 file4 while(<>){ print $` if /$/; }
    100% working on unixes and windowses:)

      Consider print ${^PREMATCH} if /$/p;, (Perl v5.10+) as using $`...anywhere in a program imposes a considerable performance penalty on all regular expression matches. Source: perlvar.

        thank you sir

        prematch is just a wordier name for $`

        A considerable performance penalty by any other name doth make thine system slow all the same!

        Edit: The documentation apparently doesn't mean what it appears to say.