Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Best practice for reading delimited file

by rjbioinf (Acolyte)
on Oct 16, 2013 at 07:57 UTC ( #1058403=perlquestion: print w/replies, xml ) Need Help??
rjbioinf has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I frequently write scripts that parse simple tab-delimited or comma-delimited files that have been created by other people. Typically they are created in Excel and they can be from windows or OS X platforms. I often run into the problem that a simple filehandle loop ( while(<FH>) ) will only read either the first or final line of the file. This is because the new line characters are not always \n as Perl likes- \r for OS X and \r\n for Windows. Preferably I would like to not have to bother working out what the problem is for each file and change the characters to \n using dos2unix or whatever, so I was wondering if anyone can suggest a good solution for this; is there a recommended module that has a method that performs this function?

  • Comment on Best practice for reading delimited file

Replies are listed 'Best First'.
Re: Best practice for reading delimited file
by Tux (Abbot) on Oct 16, 2013 at 08:04 UTC

    Simple answer! Text::CSV_XS or Text::CSV. Both automatically deal with line endings and have attributes to use TAB as sep-char.


    Enjoy, Have FUN! H.Merijn
Re: Best practice for reading delimited file
by farang (Chaplain) on Oct 16, 2013 at 19:37 UTC

    For v5.10 or later, generic newlines can be normalized with \R in a substitution expression.

    s/\R/\n/;
    See perlrebackslash under Misc section.

      Thank you all for the suggestions.  s/\R/\n/; looks the simplest solution.
Re: Best practice for reading delimited file
by Lennotoecom (Pilgrim) on Oct 16, 2013 at 09:32 UTC
    ./script.pl file1 file2 file3 file4 while(<>){ print $` if /$/; }
    100% working on unixes and windowses:)

      Consider print ${^PREMATCH} if /$/p;, (Perl v5.10+) as using $`...anywhere in a program imposes a considerable performance penalty on all regular expression matches. Source: perlvar.

        thank you sir

        prematch is just a wordier name for $`

        A considerable performance penalty by any other name doth make thine system slow all the same!

        Edit: The documentation apparently doesn't mean what it appears to say.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1058403]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (16)
As of 2018-07-23 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (468 votes). Check out past polls.

    Notices?