Excel File Dos2Unix not working

by ZWcarp (Beadle)
on Jul 12, 2011 at 10:39 UTC
ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Glorious Monks, I come before you humbled seeking wisdom.

So a coworker sent me an excel file, which I need to do some simple parsing with in order to utilize. I have run into issues before with excel files but usually gotten around them with a couple dos2unix style commands. I'm not sure what is different about this file, but my usual methods just aren't working. First I saved the excel file as a tab delimited file, and I've noticed that if I use lets say

cut -f1 Filename.txt I only get the very first row. I first tried using the command perl -i -pe 'tr/\015/\n/d' Filename.txt

To try and remove any carriage returns and replace with \n . Usually this works however, this time for what ever reason I'm still having issues. I've tried od -tc on the file to look for any weird characters that might be screwing up my line read in. Does anyone have any ideas of what might be causing the issue?

Update: Problem was due to color markup in the excel file which left hidden characters behind, I solved this by just using text edit to get rid of "rich text" . Not sure how the same thing would be done in perl . Thanks all for your responses

by Tux (Monsignor) on Jul 12, 2011 at 10:45 UTC

    When I read your post, I guess you mean "Exported data from Excel files", as .txt is something completely different.

    Perl has an excellent module to parse native Excel files, it is called Spreadsheet::ParseExcel. If the API is too difficult (for you), you could consider the wrapper module Spreadsheet::Read, which uses Spreadsheet::ParseExcel under the hood. With both you can be the one controlling what and how you deal with the data in the spreadsheet(s).

    Excel uses a binary format that is very portable across architectures, making it relatively easy to read those files on Windows as well as on AIX, HP-UX, MacOSX, etc etc.

    Enjoy, Have FUN! H.Merijn
by choroba (Abbot) on Jul 12, 2011 at 11:42 UTC
    Can you further specify the "issues" you are having?

