Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Monks!
I have a file in the form of:
nick 5 nick 10 george 2 peter 3 george 14 nick 20
and I want to output:
george:2,14 nick:5~~10~~20 peter:3

I had an old script that used to work:
use strict; use warnings; my %res; while (<>) { chomp; my ( $name, $rest ) = split /\t/; push @{ $res{$name} }, $rest; } for $a( sort keys %res ) { print "$a:". join( "~~", @{ $res{$a} } ); print "\n"; }
Now it produces weird results, can you please help me fix it? Perhaps I introduced a bug at some point and now I can't get it to work.
Current output (weird):
~~14ge:2 ~~20:5 peter:3

Replies are listed 'Best First'.
Re: Why is my code producing weird output?
by hv (Prior) on Aug 30, 2023 at 23:29 UTC

    I think it is possible that your input file has at some point acquired Windows CR-LF line endings, but you are running it on a Unixish system that expects LF line endings. So the chomp() is stripping the LF but leaving the CR, which then gets included in the joined string that you output.

    Without changing the input file, one option to fix that would be to replace the chomp to remove all trailing whitespace:

    while (<>) { s{\s+$}{}; ...

    It is common that a string with a lone CR ("carriage return") is displayed to the terminal by "returning the carriage" as a typewriter would to the first column without starting a new line (which is what LF, "line feed", does). This could explain the output you are seeing. Piping the output to a hex dump program such as od -x would help to confirm that.

      When you know that you have windows line separators, I prefer to handle them the same way that perl on windows does. Use the "IO Layer" crlf. Add the statement

      binmode STDIN, ':crlf';

      before your input loop.
      Thank you all! It seems that my input file had some CRs after all! Now it works fine :)
Re: Why is my code producing weird output?
by GrandFather (Saint) on Aug 30, 2023 at 23:30 UTC

    Maybe your input data is not what you think it is? Instead of piping the data in try using an external file or baking the data into a test script:

    use strict; use warnings; my %res; while (<DATA>) { chomp; my ( $name, $rest ) = split /\t/; push @{ $res{$name} }, $rest; } for $a( sort keys %res ) { print "$a:". join( "~~", @{ $res{$a} } ); print "\n"; } __DATA__ nick 5 nick 10 george 2 peter 3 george 14 nick 20


    george:2~~14 nick:5~~10~~20 peter:3

    Note that that result is not what you say you want, but your code suggests the comma you show as a separator for george is a typo anyway.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Why is my code producing weird output?
by NetWallah (Canon) on Aug 30, 2023 at 23:23 UTC
    The code works fine for me : Output:
    $ perl < pm1-dat.txt george:2~~14 nick:5~~10~~20 peter:3
    My perl : This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-gnu-thread-multi

    Your code uses an un-declared $a which is also used by sort.
    Your perl seems to encounter side-effects of that.
    I recommend you use a declared variable NOT named $a, or $b.

                    "These opinions are my own, though for a small fee they be yours too."

      Note that in the line for $a( sort keys %res ), the work for the sort is completed before the first value is assigned to $a. So while this might not be best practice, I do not think it is causing the problem in this case.

Re: Why is my code producing weird output?
by eyepopslikeamosquito (Archbishop) on Sep 03, 2023 at 09:46 UTC

    Given this question, asked anonymously the day before, contains test data with the identical first two lines namely:

    nick 5 nick 10
    I'm guessing you are the same anonymonk. If so, it would've been good to mention that.

    Further to hv's excellent suggestion of using s{\s+$}{} (that presumably fixed your problem), you might consider writing a standalone program that does nothing more than verify that your input data files are well-formed.

    Though you didn't rigorously define the format of your input files in either of your questions, I'm guessing that to be well-formed, each line in your data files must match:


    Is that right? If so, to avoid future pain, you might consider writing a simple data validation program, for example:

    use strict; use warnings; my $fname = shift or die "usage: $0 file\n"; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; my $lcnt = 0; my $line; while ( defined($line = <$fh>) ) { ++$lcnt; chomp $line; $line =~ /^\s+/ and die "error: line $. contains leading whitespac +e\n"; $line =~ /\s+$/ and die "error: line $. contains trailing whitespac +e\n"; length($line) or die "error: line $. is empty\n"; $line =~ /^[a-z]+\t\d+$/ or die "error: line $. ($line) does not ma +tch word TAB number\n"; } close $fh; warn "file '$fname': $lcnt lines, no data format errors detected\n";

    Running this program on Linux against a CRLF terminated Windows file produces:

    error: line 1 contains trailing whitespace
    Obviously, you could make the crude data validation program above more elaborate. Alternatively, you might add more rigorous file format checks to your original program.