Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Code Misses a Replacement

by monger (Friar)
on Jul 07, 2005 at 15:32 UTC ( #473203=perlquestion: print w/replies, xml ) Need Help??

monger has asked for the wisdom of the Perl Monks concerning the following question:

I've written a quick and dirty script to change a comma seperated DB dump to a tab delimited file for importing. Here's the code:
my $log_file = "/c/mysql.out"; my $out_file = "/c/mysql.dshield"; open LOG, "$log_file" || die "Can't open log file: $!"; open OUT, ">$out_file" || die "Can't open output file: $!"; while (<LOG>) { s/,/\t/g; print OUT $_; } close OUT || die "Can't close the output file: $!"; close LOG || die "Can't close the log file: $!";
Here's a snippet of what it's parsing:

2005-07-06 00:00:00-05:00,85099794,1,,1038,, 1434,udp,

Here's what is dumped out after the script chews it up:

2005-07-06 00:00:00 -05:00 85099794 1 1038192.168.1.20 1434 udp

So, why would this miss replacing the third from last comma with a tab? It simply deletes the comma without replacement. I can't figure this out! Help please?? Monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

Replies are listed 'Best First'.
Re: Code Misses a Replacement
by dbwiz (Curate) on Jul 07, 2005 at 15:45 UTC

    I agree with ww. The tab is there.

    Try this:


    Then, you will see the tab even if your settings prevent it.

    On a side note, I would do the whole affair with a one-liner:

    perl -pe 's/,/\t/g' < /c/mysql.out > /c/mysql.dshield
      dbwiz, Thanks for the bracket tip. That got it. And I'll likely use the one liner for an eventual multi-lang batch job.


      Monger +++++++++++++++++++++++++ Munging Perl on the side
Re: Code Misses a Replacement
by ww (Archbishop) on Jul 07, 2005 at 15:36 UTC
    Can't tell for sure without verbatim of output, but since there appears to be a \s in the output in the spot where the third from last comma was in the original, suspect the issue is appearance, ONLY. Look at the output with an editor (hex, whatever) that lets you see the actual bytes...

    A tab can appear to be a single space, depending on its location, tabwidth, etc.

Re: Code Misses a Replacement
by Xaositect (Friar) on Jul 07, 2005 at 15:48 UTC

    This may makes things more complicated than you need, but I should point out that most CSV dumps use quotation marks to escape strings that have commas in them. This is something to watch for: some,comma-delniated,"file with a, comma",in the data

    You might take a look at Text::CSV, you could do something like: (untested)

    use Text::CSV; my $csv = Text::CSV->new(); while (<>) { $csv->parse($_); print join("\t", $csv->fields()); }
    That's pretty simplistic, and won't handle tabs in the data, but you get the idea.

    Xaositect -

      I would agree that the dump may do many interesting things when it goes to CSV, including escaping certain chars. I suggest the following code (which I use a variation of to convert Semi-Colon SV files to CSV files):

      use IO::File; use Text::CSV_XS; for (@ARGV) { my $out_fname = $_.'.dshield'; my $inf = new IO::File ( $_,'<' ) or die "Cannot read $_"; my $outf = new IO::File ( $out_fname ,'>' ) or die "Cannot write $o +ut_fname"; my $csv_in = new Text::CSV_XS; # defaults work for most CSV's my $csv_out = new Text::CSV_XS({sep_char=>"\t"}); # use tabs until ($inf->eof) { my $line = $csv_in->getline($inf); $csv_out->print($outf, $line); } } ## IO::File objects close automatically when they go out of scope

      This gets used as: file1.out {file2.out} {...}
      , and writes the results to file1.out.dshield, etc. By using the Text::CSV_XS module, you will be certain of processing CSV and Tab-SV files correctly. Though it's more code, it performs quite well and it will likely save you grief in the future.
      Larry Wall is Yoda: there is no try{}
      The Code that can be seen is not the true Code
Re: Code Misses a Replacement
by Roy Johnson (Monsignor) on Jul 07, 2005 at 16:36 UTC
    tr/,/\t/ is probably a better choice than s/,/\t/g, just because it's more tuned for the job.

    Caution: Contents may have been coded under pressure.
Re: Code Misses a Replacement
by Transient (Hermit) on Jul 07, 2005 at 15:39 UTC
    I can't speak as to why it might be missing the third to last tab (but it probably isn't, it's probably just spacing it as one space).

    But, I would suggest doing a s/\|\|/or/g on your source code (that is, replacing the ||'s with or's) or else you won't know when you have a failure in your open or close functions. (||'s precedence is higher than what you want in these cases)
Re: Code Misses a Replacement
by samizdat (Vicar) on Jul 07, 2005 at 15:39 UTC
    Try replacing the comma with \x2C . It also seems to have inserted an extra \t after the main portion of the timestamp.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://473203]
Approved by coreolyn
Front-paged by cchampion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2023-02-06 12:26 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (34 votes). Check out past polls.