Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

chop vs chomp

by Moron (Curate)
on May 10, 2007 at 14:10 UTC ( #614637=perlmeditation: print w/replies, xml ) Need Help??

I am getting a bit fed up with people telling me to use chomp instead of chop. I know the manual says it's "safer" but I beg to differ. (Update: for portability to e.g. Windows I've added something special at the end. oops, no, chop is in fact portable)

So what is wrong with chomp()? I can already hear many of you say.

If we try to simulate a functional description of chomp() in the common or garden context, it might begin to become clear:

"Read in each line of a file (line being delimited by "\n") - test whether there really is a "\n" there - if so chop it off else carry on regardless".

I can't imagine anyone asking for this. And if an analyst knew you were interpreting their spec. that way you'd be apt to be corrected. There are two possibilities: a spec. contains provisions for handling data quality or it doesn't and 99 times out of 100 it doesn't - that's because systems analysts prefer (quite rightly) to focus on positive rather than negative functionality.

But why chomp() is a no-no is that negative functionality is apt to be introduced to handle known and/or anticipated functional problems. In one mission-critical system, incomplete files were checked for by whether a trailer record was placed on the end. If a file succeeds that check, it's impossible for records between the header and trailer not to be terminated with "\n". A trailer is a proper functional remedy, because a file might accidentally break just after a "\n" so you can't use the presence of "\n" alone to test for file completeness.

Having established completeness of a file, the only way a record can lose its "\n" unexpectedly is by means of a programming error, such as making a conditional substitution that happens implicitly to remove it but only where the pattern matches. Perhaps chomp() is popular to sloppy people who do that kind of thing and patch it up with chomp() first and ask questions never. Such patching up is grossly negligent because it confuses the testing process needed to find mistakes. Using chomp() can make it harder to detect the real fault that chomp(0 fails to patch up.

The advantage of chop() being used so that it might indeed chop off a character off the end of a \w+ is that it will show up in testing that a programming error has occurred and needs investigation, whereas chomp() is apt to hide the error until the system or acceptance testing phase of the system. I'd hate to mistakenly hire people who allowed that to happen out of a bad programming habit!

If we modify $/, e.g. to ';' to parse a Perl program, then of course the last line won't generally terminate with $/ but with "\n". In that case, it is clearly wrong to chomp regardless because the presence of $/ is a syntax requirement. In such cases, chomp() is no good anyway because it returns the length rather than the content of what is chopped. Instead we need to do something like:

( chop() eq ';' ) or SomeErrorHandling();
The greatest benefit of chomp() therefore is that it makes an easy test for sloppy programming -- ask a candidate to write a simple program that reads in a file you tell them in the spec. always has "\n" on the end of every line including the last and if they use chomp(), you already know enough about how they work and what quality of unit testing they are capable of rendering to their own code before it gets inflicted on others....

Update: Unless of course you are writing code that is supposed also to be portable, including to Windows. The exception is setting $/ to some multivalued character like EOL - it ISN'T multivalued for Wondows -- test it! Only in such very isolated cases do you need a special version e.g:

{ sub Chonk { # $/-aware chop # parm by ref my $sref = shift() || $$_; # default $_ $$sref = substr( $$sref, 0, length( $$sref ) - length( $/ ) ); return substr( $$sref, -length( $/ ) ); }
hmm chop @array returns only what was chopped off the last element, even in array context, but I haven't decided what to do with this Chonk() that only came about because of this topic, but which might survive, who knows. Suggestions? I suppose I also expected someone to say : chomp() or die; should take care of your woes. It would at least reduce some of my objections about lifecycle issues. ____________________________________________

^M Free your mind!

Replies are listed 'Best First'.
Re: chop vs chomp
by merlyn (Sage) on May 10, 2007 at 19:07 UTC
    Having established completeness of a file, the only way a record can lose its "\n" unexpectedly is by means of a programming error, such as making a conditional substitution that happens implicitly to remove it but only where the pattern matches.
    You are apparently unfamiliar with the idea that Unix files can have a missing newline at the end. As a programmer, I must deal with files like that.

    Are you suggesting that instead of writing a simple chomp in each program that reads possibly-newline-terminated strings, I explictly put the code in there for that? I hope not. If Perl were that way, I'd be quickly writing "randalchomp" that acts like chomp does now, and figure out a way to include it in every program.

    Chomp is there because it perfectly fills a need. That's why we use it.

      I explained before that point already in the OM that the proper approach is something else e.g. to put a trailer on the end. Some established services create a separate .done file to signal competeness.

      Anyone who tried to rely on carriege return (line completion) as an indication of file completion for commercial data provision would simply lose the business - a file can be incomplete but accidentally terminate after the "\n" therefore giving a false impression and that stands to happen too regularly -- 1 in n cases of incomplete files where n is the average length of a line.

      Such problems that should be allowed to happen and fixed another way entirely.

      "instead of a simple chomp". There is no instead of - chomp's functionality is professionally unacceptable from the start. I am saying let it break and fix the cause of the data quality issue instead of hiding it with chomp so you never found it in the first place - otherwise you are second-guessing the testing and correction process.

      chomp() or die; would resolve some of the issues but its probably better to use chop as a habit in case you forget - the impact of chop on an unterminated line might be harder to find than die() but at least you are letting it have an impact that can be picked up during testing.


      ^M Free your mind!

        I explained before that point already in the OM that the proper approach is something else e.g. to put a trailer on the end.

        You have three possibilities then.

        • Use your time machine to go back some thirty-five years to the start of Unix and force all text files to end with a trailing newline.
        • Fix all of the files in the past thirty-five plus years of Unix to include the trailing newline, plus all of the utilities that manipulate those files, to match your idea of reality.
        • Admit that your idea doesn't match reality.

        Maybe all progress depends on the unreasonable man, but you can't turn it around to say that all unreasonable ideas imply progress.

        Now if you do have a time machine, I apologize profusely in the hope that you'll let me borrow it briefly. I have a really good business plan that depends on having a working time machine.

        (I'm also still waiting for you to fix the bugs in Chump or Chimp or Chorq or whatever your "replacement" is.)

        Anyone who tried to rely on carriege return (line completion) as an indication of file completion for commercial data provision would simply lose the business

        Could you please point out where in the perl documentation is it stated that chop or chomp have anything to do with file completeness validation?

        There is no instead of - chomp's functionality is professionally unacceptable from the start.

        Sure - if you use it for the wrong purpose, and if you expect it to do things it doesn't. If a file has been checked for completeness and its lines were found suitable for chomp, then chomp is the right tool to use. It is not the other way round.

        I'm looking forward to your rant on autovivification as another "professionally unacceptable functionality" in perl...


        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: chop vs chomp
by philcrow (Priest) on May 10, 2007 at 14:50 UTC
    As Fletch alludes. There's a more important safety mechanism in chomp than silently skipping the action if there is nothing to chomp. It also allows operating system independence, since it eats the line ending for your OS. If you ask an applicant to write a script, they may choose chomp, merely because they don't know which operating systems you'll eventually ask them to deploy on.

    Sorry, but porting to a new OS should not require chop hacking on code when a chomp in the first place would have handled the problem.

    The Gantry Web Framework Book is now available.

      It's more than just OS line ending issues: sometimes a logical record is more than one line. I've dealt with files in the past where the records consisted of several newline-terminated lines with a four character record separator along the lines of "EOR\n". chomp can handle removing this transparently (local $/ = "EOR\n"; while( <IN> ) { chomp; _handle_rec( $_ ) }), chop can't.

      And that's the important distinction: chomp deals with removing the current logical record ending, chop deals with removing a single trailing character.

        OK but just because chop isn't right for that, that doesn't imply chomp is -- see proposed Chonk() replacement in updated OM.

        ^M Free your mind!

      No, "\n" gets chopped or chomped whatever the OS thinks "\n" is.

      ^M Free your mind!

Re: chop vs chomp
by chromatic (Archbishop) on May 10, 2007 at 18:05 UTC

    Chonk is ridiculous. I count at least three misfeatures in the first line alone. Meanwhile, chomp has worked for years without those issues.

    You're spending a lot of time justifying a poor decision.

Re: chop vs chomp
by eric256 (Parson) on May 10, 2007 at 19:20 UTC

    It would seem to me (and I don't know the history) but this is probably the best reason I've seen why chomp returns what it does instead of the modified string. This way you can test its return and decide if worked as expected or if it is an error. Assuming that we all have your same need and care if there was a line ending is ridiculous. By your logic if i wanted to only remove the line ending i would have to first check if it had an ending and then only precede to remove it if needed. It seems like you are putting the burden then on the normal case instead of on your special case. I've worked with many data files and formats and not having to care about the line endings is a blessing not a curse. In addition if your are counting on the presence of a line ending to signify a valid record instead of checking the actual structure of the record i would think you are going to have more trouble, not less, in the future.

    Eric Hodges
      Yes, I was expecting someone to realise this earlier. chomp() or die would resolve a lot of the problem. chomp() on its own is unlikely to be a good habit.

      ^M Free your mind!


        chomp() on its own is the normal mode of use. "or die" is only useful if you *must* die when the last line does end with a newline, *and* that absence is material.

        As a (nominally) professional programmer, I take exception to your repeated insistence that such use is Not The Way Of The Professional Programmer. It does not help make your case, and it leads one to wonder what you mean when you say that. (cue Princess Bride quote).

Re: chop vs chomp
by Fletch (Chancellor) on May 10, 2007 at 14:42 UTC

    Because I'm sure no one ever has to deal with more than single character long record separator. chop handles those just fine right?

      okay I'll make a recommendation for that in an update to the original meditation, but it won;t be to use chomp()!

      ^M Free your mind!

        The example you've added makes very little sense; all you've done is reimplemented Perl's built-in "remove logical line ending" operator (chomp) in a less flexible way (yours won't work on a LIST of lines, nor will it work nicely on lvalues) with a nastier calling convention (leaving aside the matter of what you think $$_ would do).

        At this point I think I'm just going to have to mutter "It's the eponymy, stupid" to myself and just leave the thread.

Re: chop vs chomp
by graff (Chancellor) on May 11, 2007 at 07:52 UTC
    If we modify $/, e.g. to ';' to parse a Perl program, then of course the last line won't generally terminate with $/ but with "\n". In that case, it is clearly wrong to chomp regardless because the presence of $/ is a syntax requirement. In such cases, chomp() is no good anyway because it returns the length rather than the content of what is chopped.

    Either I'm not understanding what you are trying to say in that part, or else you don't understand what $/ and chomp are really doing.

    Setting aside the issue of how foolish it would be to "parse" a perl script by setting $/ = ';' , let's look at what actually happens, given a perl script like this:

    #!/usr/bin/perl use strict; for my $letter ( qw/h e l l o , w o r l d/ ) { $_ .= $letter; } print; while (1) { last if ( /^o/ ); $_ =~ s/.// }
    If some other perl script sets $/=";" and reads this one as data in a  while(<>) loop, it will go through five iterations; the first four records will have ";" as the very last character, and chomp, if applied, will remove it. The fifth record will end in "}", possibly followed by any sort and amount of whitespace, and chomp will have no effect on that.

    Do you think that's a problem? I don't. That is the documented behavior. To the best of my knowledge, and based on all my own experience, that is the most desirable behavior, when the objective is to simply remove the input record separator when one is present.

    There are other, less common situations where the old "chop" (remove last character, no matter what it is) is still useful, and I'm glad it's still available for that purpose.

    If you are complaining about the fact that many novice perl users don't understand $/ and chomp, I can understand your frustration, and I agree more people should know more about it. But if you are complaining that you're having trouble using chomp, I have to wonder why. It's not that complicated.

    When there are issues like a single script meant to be used on any OS and having to handle a mixture of CRLF and LF line terminations (in different files or even in a single file), then I agree that chomp is maybe not the best approach, because $/ does not allow that sort of flexibility. For that I would do something like  s/[\r\n]+$// instead.

    (In fact, I've done that in a number of scripts, and it has served me quite well.)

Re: chop vs chomp
by hardburn (Abbot) on May 10, 2007 at 20:03 UTC

    Having the insight to see that a common piece of functionality is insufficient for the task at hand is good. It's unnecessary, though, to extend that to the general case of, say, parsing a random CSV file.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      Fair point. But a CSV file needs to be parsed, albeit perhaps rudimentarily, so it implies a lexical analyser and your problem should be addressed by tokenising EOF and EOL. I've seen many people try to parse without a lexical analyser. I wonder how many got stuck precisely because they used chomp() - as an experienced parser-writer I can see a whole minefield of traps waiting for people who depart from the tried and tested ways of old.

      ^M Free your mind!


        If you are reading a CSV file, the general case would allow newlines to be embedded in fields. chomp() is silly. Rolling your own is silly unless you need to do it as an exercise. Use the Module, Luke! Raising the CSV specter is a red herring in the context of discussing the appropriate use of chomp.

Re: chop vs chomp
by shmem (Chancellor) on May 11, 2007 at 04:54 UTC
    *shrug* nuts vs. bolts?
    So what is wrong with chomp()?

    Nothing is wrong with chomp, and there's nothing wrong with chop either. Both work as documented, and both have their uses.


    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      OK then read it as "the habitual use of chomp() without an "or die". Update: and for chop being used for the same purpose ... (chop() eq $/ ) or die;

      ^M Free your mind!


        How are you going to distinguish the normal use of chomp() from "habitual" use? Under most circumstances, I'm using chomp() to remove the EOL from lines read via the diamond operator. In normal use, it Just Doesn't Matter if the last line has an EOL or not. Using chomp() means I don't have to care, nor do I have to jump through extra hoops to not care.

        I don't understand why you are so adamant about this.

Re: chop vs chomp
by samizdat (Vicar) on May 10, 2007 at 16:25 UTC
    I vote that we go back to shell scripts with IFS changes on the fly... ;^)

    Don Wilde
    "There's more than one level to any answer."
Re: chop vs chomp
by akho (Hermit) on May 10, 2007 at 14:25 UTC
    Or they may be guided by an idiom they've seen a million times.

    Don't think that's a good test.

      Any good professional programmer will have become aware of the impact of chomp() on development lifecycles long ago, notwithstanding its popularity.

      Update: Although I have to agree after all, because they might be doing it out of e.g. Windows experience.


      ^M Free your mind!

Re: chop vs chomp
by DrHyde (Prior) on May 11, 2007 at 09:00 UTC
    In all your drivel about records, you seem to have forgotten that perl is really really good at handling TEXT. Text doesn't have to finish with \n so it's perfectly acceptable to say "read all the lines in this file and if they have a trailing \n strip it off". chomp($line) is a rather handy way of expressing that. I use it most days, and always without error.
      Syntax doesn't matter in text. That doesn't justify developing irresponsible behaviour that would impact quality if you were hired to do something where it did matter. If you were writing Perl in Perl, for example, you might want to have $/ = ';'. Undisciplined use of chomp() is the fastest way to prove yourself unsuited to this task - sure you won't see the error - doesn't mean you did it right though!

      ^M Free your mind!

        I'm sorry to say this, but... if you were writing Perl in Perl, setting $/=";" is the fastest way to prove yourself unsuited to this task. not chomp.

        He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.
        Chady |
        Are you a Linux user in Lebanon? join the Lebanese GNU/Linux User Group.
Re: chop vs chomp
by monarch (Priest) on May 11, 2007 at 00:39 UTC
    I've never used chomp() or chop().

    I usually put portability as one of my top priorities when I program. Thus, I fear mishandling of different end-of-line sequences. So I almost always use the following skeleton code:

    while ( defined( my $line = <HANDLE> ) ) { $line =~ s/\s*[\n\r]*\Z//s; . . }
Re: chop vs chomp
by Moron (Curate) on May 11, 2007 at 12:16 UTC
    It's a pity so many are willing to sacrifice professionalism (whether they have the experience or not to realise it) for the sake of clinging to a habit. I had hoped to be hiring from this site one day soon, but judging by the reactions it looks like I will have to be very careful how I scrutinise candidates.

    ^M Free your mind!

      Yup, because no one else here knows what they're doing at all.

      Nobody at all. Not a one. merlyn, chromatic, the lot of them: complete amateurs.

      But you know The One True Way and all who oppose this righteous knowledge must be purged in the Cleansing Fires of Chop.


        Well that is funny I admit. I suppose if I did want to pick on something worthy of religious fanaticism it would be individual versus groupthink - the fact that you even thought of the idea (albeit sarcastically implied) of validity being dependent on the reputation of those supporting it rather than being able to take an objective view should be something for you to take as a warning. (Hehe I'd send round a masked robed troupe of antigroupthink police - trouble is they all wandered off in different directions ;))

        ^M Free your mind!

      Well, personally I don't do much applying for jobs these days, but just in case I do, maybe you could put up a marker here if you're involved in any jobpostings? That would save me the bother of even considering them, because I sure as eff would not want to work for someone with your asinine concept of professionalism.

      All dogma is stupid.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://614637]
Approved by ChemBoy
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2017-12-16 07:54 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (449 votes). Check out past polls.