Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Best practice for reading delimited file

by rjbioinf (Acolyte)
on Oct 16, 2013 at 07:57 UTC ( #1058403=perlquestion: print w/replies, xml ) Need Help??

rjbioinf has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I frequently write scripts that parse simple tab-delimited or comma-delimited files that have been created by other people. Typically they are created in Excel and they can be from windows or OS X platforms. I often run into the problem that a simple filehandle loop ( while(<FH>) ) will only read either the first or final line of the file. This is because the new line characters are not always \n as Perl likes- \r for OS X and \r\n for Windows. Preferably I would like to not have to bother working out what the problem is for each file and change the characters to \n using dos2unix or whatever, so I was wondering if anyone can suggest a good solution for this; is there a recommended module that has a method that performs this function?

  • Comment on Best practice for reading delimited file

Replies are listed 'Best First'.
Re: Best practice for reading delimited file
by Tux (Abbot) on Oct 16, 2013 at 08:04 UTC

    Simple answer! Text::CSV_XS or Text::CSV. Both automatically deal with line endings and have attributes to use TAB as sep-char.


    Enjoy, Have FUN! H.Merijn
Re: Best practice for reading delimited file
by farang (Chaplain) on Oct 16, 2013 at 19:37 UTC

    For v5.10 or later, generic newlines can be normalized with \R in a substitution expression.

    s/\R/\n/;
    See perlrebackslash under Misc section.

      Thank you all for the suggestions.  s/\R/\n/; looks the simplest solution.
Re: Best practice for reading delimited file
by Lennotoecom (Pilgrim) on Oct 16, 2013 at 09:32 UTC
    ./script.pl file1 file2 file3 file4 while(<>){ print $` if /$/; }
    100% working on unixes and windowses:)

      Consider print ${^PREMATCH} if /$/p;, (Perl v5.10+) as using $`...anywhere in a program imposes a considerable performance penalty on all regular expression matches. Source: perlvar.

        thank you sir

        prematch is just a wordier name for $`

        A considerable performance penalty by any other name doth make thine system slow all the same!

        Edit: The documentation apparently doesn't mean what it appears to say.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1058403]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2019-11-13 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (72 votes). Check out past polls.

    Notices?