Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Perl Issue - File downloaded in windows has a Control M character in it??

by venky144 (Initiate)
on Oct 24, 2012 at 18:31 UTC ( #1000682=perlquestion: print w/ replies, xml ) Need Help??
venky144 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, For the last few months, we have been facing issues with the Perl script given below.

We have two different versions of Perl over here. Below are the details of the Active Perl versions: Newer Version: (Downloaded file has an error) Active Perl 5.10.0.1005 PPM Version 4.05 Older Version: (Downloaded file which was correct) Active Perl 5.10.0.1004 PPM Version 4.03 The upstream job that works on the files was working fine with the file downloaded by older version (Active Perl 5.10.0.1004), but fails when uses the file downloaded by newer version (Active Perl 5.10.0.1005) After our troubleshooting, we found out that the file downloaded by newer version come with empty rows between data, since the .csv files came with spaces (empty rows), it is not being processed and fails. It is because perl is generating the output file in unix format which has a control ^M character at the end of each line. so to convert the unix format file to linux format file, i need to use a command call "unix2dos" which is not available in windows. I can download the utility and put in system32 but my admin doesnt allow it, he wants me to do everything in Perl. There are many of scripts that fail and the issue is same. This is a Production Issue we had been facing from past 8-9 months. This error came out while we were migrating and testing all the scripts from old server to a new one due to Companyís Hardware Recycle Policy. We didnít have the Active Perl 5.10.0.1004, so have installed Active Perl 5.10.0.1005 in the new one. Once this issue is solved, we would migrate all the scripts left over in the old one to current production. Can you please help out with this issue I am facing?
use WWW::Mechanize; use Storable; #--------------------------------------------------------------------- +--------- # Define Variables for this process #--------------------------------------------------------------------- +--------- # File Down load path is used as a varible: $DownloadFilePath=<Path na +me> $DownloadFilePath="D:\\LIPAPSM\\DSSourcefiles\\"; #===================================================================== +========= # START OF PROGRAM #===================================================================== +========= $one_day = 60*60*24 ; my($day, $month, $year) = (localtime time+$one_day)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $Datetotal=($year+1900).$month.$day; $filedate=join "", $Datetotal."damlbmp_zone.csv"; $url = join "",'http://mis.nyiso.com/public/csv/damlbmp/'.$filedat +e; $m = WWW::Mechanize->new(autocheck => 1); $m->get($url); $filename=join "",$DownloadFilePath.$filedate; $m->save_content($filename);

Comment on Perl Issue - File downloaded in windows has a Control M character in it??
Download Code
Re: Perl Issue - File downloaded in windows has a Control M character in it??
by Anonymous Monk on Oct 24, 2012 at 19:06 UTC
    The effect of unix2dos can be effortlessly achieved by a m/// regular-expression which is left as an exercise to the reader.
      what regular expression? i am new to perl and i have been assigned to do this...i am an ETL guy
        s/\r\n/\n/g should do it.

        See perldoc perlop and search for "Regex Quote-Like Operators." Also see perldoc perlre.

        This command replaces the \r\n sequence with \n for every occurrence in the string.

        Most of the time these commands invoke sed scripts, which can apply a regular-expression to modify the content of a file.
Re: Perl Issue - File downloaded in windows has a Control M character in it??
by aitap (Deacon) on Oct 24, 2012 at 19:08 UTC

    After our troubleshooting, we found out that the file downloaded by newer version come with empty rows between data, since the .csv files came with spaces (empty rows), it is not being processed and fails.
    If a UNIX-newlined text file is read as DOS-newlined textfile, it looks like there is no newlines at all because DOS newline is "\r\n" when UNIX newline is just "\n".

    So there is no way of empty lines appearing because of UNIX-newlined file being viewed as DOS-newlined. Can you provide a small hexdump of a damaged file?

    Sorry if my advice was wrong.
      what is hexdump? we are working on windows... when i used the command call unix2dos, i am getting the output i needed...but i cant use the unix2dos utility...all i need to do is using perl only...
        Hexdump is a hexadecimal representation of bytes in a file. You can obtain it by running something like this:
        open my $f,"<:raw","file.csv" || die $!; my $c; for (1..8) { for (1..16) { read $f,$c,1 || exit 0; print unpack "H2",$c; } print "\n"; }
        You can rewrite unix2dos in Perl using PerlIO :crlf output layer.
        Sorry if my advice was wrong.
Re: Perl Issue - File downloaded in windows has a Control M character in it??
by fishmonger (Pilgrim) on Oct 24, 2012 at 20:14 UTC
    This problem has nothing to do with the version of perl; it's due to the way the file was transferred from the *nix system to the Windows system. The usinx2dos program does not need to be put into the system32 directory. You can download it to your home directory and use it from there, or you can simply edit the file with Perl or with a decent text editor. C:\>perl -pi.bak -e "s/\n/\r\n/g" yourfile
      can you please let me know where to put that in my script...as i told u guys..i am completely new to perl... i am etl devleoper...there are many such scripts that download the files and they are used by upstream processes...its all automated...i cant manually go to each and every file
Re: Perl Issue - File downloaded in windows has a Control M character in it??
by bart (Canon) on Oct 24, 2012 at 20:15 UTC
    I can't help wondering why you are using WWW::Mechanize where probably LWP::Simple with getstore would likely do. After all you seem to be only fetching a file from an URL... No login, no navigating towards a download page.

    I am guessing WWW::Mechanize assumes HTML, thus, text contents, and treats it that way. getstore, on the other hand, saves (all) files as binary by default.

    update From the WWW::Mechanize docs:

    $mech->save_content( $filename )

    Dumps the contents of $mech->content into $filename. $filename will be overwritten. Dies if there are any errors.

    If the content type does not begin with "text/", then the content is saved in binary mode.

    And no way to override it. Nice.
      thanks...your solution worked well... LWP Simple worked... here is the modified code and it worked well..perl seems interesting
      use Storable; use LWP::Simple; #--------------------------------------------------------------------- +--------- # Define Variables for this process #--------------------------------------------------------------------- +--------- # File Down load path is used as a varible: $DownloadFilePath=<Path na +me> $DownloadFilePath="D:\\LIPAPSM\\DSSourcefiles\\"; #===================================================================== +========= # START OF PROGRAM #===================================================================== +========= $one_day = 60*60*24 ; my($day, $month, $year) = (localtime time+$one_day)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $Datetotal=($year+1900).$month.$day; $filedate=join "", $Datetotal."damlbmp_zone.csv"; $url = join "",'http://mis.nyiso.com/public/csv/damlbmp/'.$filedat +e; $filename=join "",$DownloadFilePath.$filedate; mirror($url, $filename)
        Now lets cleanup that messy filedate code.
        use POSIX qw(strftime); my $filedate = strftime("%Y%m%d", localtime(time() + 86400));

        And these lines are odd - you join strings with 'join' and '.':

        $filedate=join "", $Datetotal."damlbmp_zone.csv"; $url = join "",'http://mis.nyiso.com/public/csv/damlbmp/'.$filedate; $filename=join "",$DownloadFilePath.$filedate;

        You can simply quote them as one:

        $filedate = "${Datetotal}damlbmp_zone.csv"; $url = "http://mis.nyiso.com/public/csv/damlbmp/$filedate"; $filename = "${DownloadFilePath}$filedate";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1000682]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-09-19 02:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (129 votes), past polls