Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

different length of a line from linux and windows textfile?

by Microcebus (Beadle)
on Mar 17, 2014 at 14:35 UTC ( [id://1078608]=perlquestion: print w/replies, xml ) Need Help??

Microcebus has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, using the following code:
open(FILE,"textfile.txt"); while(1) { $line=<FILE>||last; @line=split("\t",$line); $last=0; if($line[0]ne$previous_value) { # do something $last=1 } elsif($line[1]>=1) { # do something } $previous_value=$line[0]; if($last==1) { seek(MAP,-length$line,1); last; } } close FILE; exit;
will I obtain different results for (length $line) when I use files that come from linux and windows machines (newline vs. form feed and carriage return?)

I want to go back one line while reading a file if a certain condition is true. Will this code work on linux and windows machines using files from a different OS?

Replies are listed 'Best First'.
Re: different length of a line from linux and windows textfile?
by LanX (Saint) on Mar 17, 2014 at 14:45 UTC
    Most probably a problem with trailing newlines.

    Try to chomp your input $line.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      A complaint about different line length from Windows files is likely due to "\r", which chomp will do nothing to.

      s/\s+$//

      Will trim all trailing whitespace, which is what I recommend over chomp. It catches "\n" and "\r" as well as spaces and tabs (which should never be allowed to have significance at the end of a line where they are extra invisible).

      - tye        

        You are right that chomp wont pick up \r on its own, however it absolutely will pick up \r\n and correctly remove both characters, so the only time this would be an issue is if your file is corrupted, or specifically crafted to utilise \r in some way.

Re: different length of a line from linux and windows textfile?
by no_slogan (Deacon) on Mar 17, 2014 at 16:28 UTC

    seek works with byte positions; length counts characters. If your file is in text mode, a newline is two bytes on windows, but one byte on linux; it's one character in either case.

    Technically, when your file is not in binary mode, you shouldn't pass anything to seek that's not the result of tell. So you might a) do a tell before you read the line, or b) binmode the file.

      If your file is in text mode, a newline is two bytes on windows, but one byte on linux; it's one character in either case.

      No. "\r\n" is two characters. And length("\r\n") is indeed 2, even in Perl, even on Windows. And bytes::length("\n") is indeed 1, even in Perl, even on Windows.

      What binmode (usually) does is prevent read operations on Windows from converting the two character string "\r\n" (from the file) into the single character "\n" when storing the results in your Perl string.

      Unix actually does very similar things, it is just that these changes are done at the devices boundary rather than at the file system boundary. Sending "\n" to a device like a TTY in Unix usually causes the two characters "\r\n" to be sent to the device instead (just like writing the single character "\n" to an ordinary file in Windows).

      But you are correct in suggesting that mixing seek and length can run into such problems. You should indeed use binmode and bytes::length() when figuring out where to seek. But the reasons for that have nothing to do with "\n" being a single character of two bytes on any platform that I am aware of.

      Of course, doing that isn't sufficient to make such a use of seek actually fully portable. The only fully portable way to use seek with a non-zero offset is to feed it a value you previously got from tell. For example, using seek with non-zero offsets on VMS can be quite surprising, depending on the type of file involved (VMS's file system layer is called RMS for Record Management System and most files are not streams of bytes but streams of records where byte offsets are hard to interpret). But you can get away with seeking by arbitrary byte offsets when dealing with ordinary Unix and Windows file systems.

      - tye        

        "\n" is a newline. One character that's represented by two bytes (CR LF) in windows text files, one byte (LF) in unix text files, and one byte (CR) in macos9 text files. That's all I was saying.

        (Also, unix input devices usually convert CR to LF.)

        "\r\n" isn't really a thing. It's either CR LF or LF CR. If you really need a CR LF (as in some network protocols), you should probably write it as "\x0d\x0a".

        But we're kind of splitting hairs here.

        And bytes::length("\n") is of course the "byte representation" of newline, not the "text file representation."
Re: different length of a line from linux and windows textfile?
by Marshall (Canon) on Mar 18, 2014 at 10:15 UTC
    Using seek() is in general not a good idea and particularly not for text input.

    That is a byte oriented thing and not a character oriented thing.
    There are many complications that can arise between bytes and characters.
    The translations between "\n" or "\r" or "\r\n" in different O/S'es are well known, but "messy".

    open(FILE, "<", "textfile.txt") or die "Cannot open textfile: $!"; while (<FILE>) { print; #just to get started... # something.. } # using a seek() is almost never "right".
    Show some example input and what you want to do.
Re: different length of a line from linux and windows textfile?
by sundialsvc4 (Abbot) on Mar 17, 2014 at 17:33 UTC
    Hmmm.... is this a problem? Heretofore, we have always innocently assumed that a Perl build "for Windows/Linux" will only encounter text-files that conform to the conventions of the same (build ...) system. But is this, in fact, a serious limitation of this tool that all of us use every day? (And, if this be so, what should :-O we do about it, that won't break all of the software that is out there right now?)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1078608]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-03-28 17:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found