Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Leading Characters Not Visible for Certain Array Elements While Printing

by justsomeguy (Novice)
on Oct 30, 2013 at 20:59 UTC ( [id://1060447]=perlquestion: print w/replies, xml ) Need Help??

justsomeguy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks:

In a program that parses a text file by opening and splitting a simple two-column file into array elements, I find that about 50% of the 3700 or so entries print out with the first element visually truncated. I don't mean that the value has changed, just that anything I try to print out in that first position is missing the first 2 characters on about half of the lines as they are displayed. If I 'pad' the line with 2 spaces on the print line, visibility is restored. I can print out the values separately in the program without difficulty. This seems to affect any array values or other variables (or literals) I try to print.

This has me stumped. I've searched Google on strings like "perl print not visible" or "perl print truncates" to no avail. Keep in mind that about half of the entries print with no drama at all. I have scrubbed the source data for unprintable characters, etc. Anything--and I mean anything (even the word RECORD) is truncated on the output of the program for those certain records.

Here's a sample of the code with some redactions:

while (defined(my $line = <FH>)) { chomp $line; my @hostpair = split(' ',$line); my $host = $hostpair[0]; my @FQDN = split(/ /,`host $host`,2); my $OWNER; if (exists $MYDB{uc($host)}) { $OWNER = $MYDB{uc($host)} } else { $OWNER = "OTHER"; } print "$host \t $FQDN[0]\t $hostpair[1] \t $OWNER \n"; }

The source data is a simple two-column space-delimited file showing short hostname in column 1 and a three-letter identifier in column 2

Your help is much appreciated!

  • Comment on Leading Characters Not Visible for Certain Array Elements While Printing
  • Download Code

Replies are listed 'Best First'.
Re: Leading Characters Not Visible for Certain Array Elements While Printing
by Laurent_R (Canon) on Oct 30, 2013 at 21:55 UTC

    Hi, may be you could try to complement the following line of code:

    chomp $line;

    with this additional one:

    $line =~ s/\r//g;

    The reason I am suggesting this is that, if I understand correctly what you get, you may be using a Windows file under Unix/Linux or another similar partly incompatible combination.

    In this case, chomp is not enough to clean your data from additional invisible characters. Without going into all the details, Windows new lines are composed of two characters (\r\n) and Unix of only one (\n). If you process a Windows file under Unix, Perl believes it is a Unix file and chomp removes the \n character, leaving the \r. In terms of old typewriters or teletypes, \n is the line feed character, whereas \r the carriage return character. If you are left with only the \r character, then the next line may partly overwrite the last one. The ASCII codes for those 2 characters are 10 and 13.

    This is just one hypothesis, but your description of the problem makes it quite similar to problems I have encountered a number of times dealing with different platforms. Normal FTP in ASCII mode between platform usually handles the conversion, but binary FTP mode or SFTP often sc*w things up.

    I have made my description on differences between Unix and Windows, but you can get similar issues between Mac and Windows, Unix and Mac, VMS and Unix, etc.

      I'll give it a shot, though the data came from the output of bash shell scripts running on the same Linux host. I know what you mean, but haven't run into the linefeed/return issue when Windows or DOS isn't involved.

Re: Leading Characters Not Visible for Certain Array Elements While Printing
by graff (Chancellor) on Oct 31, 2013 at 02:02 UTC
    The first reply seems plausible - I can't imagine any other cause, based on the information you've given so far. When in doubt about line terminations on input, I just do:
    s/\s+$//; # instead of chomp
    Apart from that, I have to ask: where does your input file come from? How confident are you that it would never contain something that you wouldn't want included in a shell command line?

    If you were to add "-T" on the shebang line of your script, it would die rather than do  `host $host`, and with good reason.

    Now, if you were to use a module, like Net::DNS::Nslookup (and there may well be others), you could get the IP addresses you want without having to run a shell command in back-ticks for each one. And you wouldn't need to worry about pesky little details like an input file that might contain shell-magic characters in the first column. Instead, your script could look like this:

    use strict; use Net::DNS::Nslookup; # ... do whatever you do open FH and load MYDB # (I'll fake it for now...) my %MYDB = ( 'WWW.PERLMONKS.ORG' => 'TPF' ); while (<DATA>) { s/\s+$//; my @hostpair = split ' '; my $HOST = uc( $hostpair[0] ); my $nslookup = Net::DNS::Nslookup->get_ips( $hostpair[0] ); for my $addr ( split( /\n/, $nslookup )) { my ( $name, $ip ) = split( /,/, $addr ); my $OWNER = ( exists( $MYDB{$HOST} )) ? $MYDB{$HOST} : "OTHER" +; print "$hostpair[0]\t$ip\t$OWNER\n"; } } __DATA__ www.perlmonks.org google.com
    Maybe that doesn't do exactly what you intended, and maybe you'd rather just use the command-line "host" utility - in that case, do be careful about your input...

      I'd love to use a module like that, unfortunately I'm kinda limited on what modules I can add to the standard distribution. I'm also running "host" just to get the FQDN for what amounts to a giant hostfile for a software asset scanning program. I'll check to see if there's a module that can quickly provide that--one I can possibly sneak into my libpath somewhere. Thanks!

Re: Leading Characters Not Visible for Certain Array Elements While Printing
by Laurent_R (Canon) on Oct 31, 2013 at 10:10 UTC

    I would suggest that you carefully examine your output file with an editor which is able to display hexadecimal encodings to find out what you have at the beginning and the end of your lines. My earlier hypothesis on a Windows file being used under Unix or Linux may be wrong, but there can be other reasons why you would get similar effects. Or just try the remedy I suggested in my earlier post, and just forget about the problem if it solves the issue.

Re: Leading Characters Not Visible for Certain Array Elements While Printing
by Lennotoecom (Pilgrim) on Oct 30, 2013 at 23:38 UTC
    try the 'auto' button on your monitor
    I suppose that the missing two characters in some lines
    just out of screen due to incorrect monitor settings
    whereas the completely visible lines have two characters ahead of them.
    Respond if that was the case please. ^ ^

    P.S. there is no point in downvoting me
    better upload the part of your original file
    so it would be possible to mull it over.

      Thought of that myself, but I'm using various PuTTY windows on the desktop and there's no horizontal scrolling evident. Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1060447]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-23 17:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found