Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Redefining chomp()

by Anonymous Monk
on Mar 24, 2004 at 13:37 UTC ( #339389=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large application that makes use of chomp(), which I am currently moving from windows to linux.

The data will be a mixture of files created on windows and a mixture that are created on linux. What I need to do is modify chomp(), global, so that it yanks the all of the trailing \r's and \n's, if they exist.

I'm trying to avoid having to find all occurances of chomp() and either replace it with a custom function, or add some code at each occurance of it.

Replies are listed 'Best First'.
Re: Redefining chomp()
by Limbic~Region (Chancellor) on Mar 24, 2004 at 14:09 UTC
    Anonymous Monk,
    As has already been pointed out, chomp does probably does not work the way you think.
    • It removes trailing $/ from the end of the line
    • It returns the number of chars removed
    • It can work on a list
    • In the case of a list, it will return total number removed
    I do not believe just modifying $/ will work for you. For one, it will likely mess up reading in new files. Secondly, I am under the impression you want it to auto-detect if $/ should be "\n" or "\r\n" depending on what it is working on.
    Here is a start:
    BEGIN { *CORE::GLOBAL::chomp = sub { my $count; for ( @_ ) { $count += $_ =~ s/[\r\n]$//g; } return $count; } }
    This breaks in a lot of ways.
    • It will remove all trailing newlines instead of just one in the case of my $string = "foo\n\n\n"
    • It doesn't see where it is supposed to stop working (without helping parens) in the case of print chomp $foo, "\n";
    • Probably a lot of others I didn't find
    Once fixed appropriately, this could be stuck in a module and then you could just use Chomp;

    Cheers - L~R

      As there is no quantifier on your character class, I have trouble understanding how this will remove multiple newlines as you say. Am I missing something?

      Also, if the Windows newline is in fact "\r\n", this will not work, again because there is no quantifier.

      It seems to me if you instead do

      $count += $_ =~ s/\r?\n$//g;

      ... then you alleviate the problem of killing all trailing newlines and, assuming "\r\n" is what all windows newlines are, you are matching both windows and unix newlines.

        ryantate,
        My regex fu is non-existant as you can see. That does not really matter much as I said it would need to be fixed. I was on my way to a meeting so I didn't get to spend a lot of time on it. After thinking about it some more, I think the following would work a lot better.
        package Chomp; use Scalar::Readonly ':all'; BEGIN { *CORE::GLOBAL::chomp = sub { readonly_on( $/ ); my ($count, $fix) = (0, ''); local $/ = "\r\n"; for ( @_ ) { my ($first, $second) = (0, 0); eval { $first = chomp }; if ( $@ ) { die $@ if $_ !~ /^\r?\n$/; $fix = $_; last; } if ( ! $first ) { local $/ = "\n"; $second = chomp; } $count += $first + $second; } readonly_off( $/ ); return $fix ? ($count , $fix) : $count; }; } 42; # Then a script that uses it #!/usr/bin/perl use strict; use warnings; use Chomp; my $foo = "foo\n\r\n"; my $bar = "bar\n\n\n"; print chomp $foo, $/; # prints 2 print chomp $bar, "\n"; # prints 1 print chomp ($foo, $bar), "\n"; # prints 2
        This does have the unfortunate side effect of not allowing someone to do:
        chomp($/); # $/ = undef;
        I know this is ugly and there are probably a few more gotchas in there, but It was kind of fun to work on. Note: This can be done without a module, but it is much uglier. Anyone wanting to see that should say so.

        Cheers - L~R

        I have trouble understanding how this will remove multiple newlines as you say. Am I missing something?
        If the trailing /g on an anchored s///g actually had some effect, maybe it would strip off all trailing newlines. But as it is, it doesn't.

        The PerlMonk tr/// Advocate
      This was exactly what I was looking for. I don't anticipate that it will break anything that I am doing. Though, testing will be in order. Modifying $/ global would, indeed, be a very bad thing to do. I knew this was a possible solution, but i would have rather done it locally to each chomp().
        If that's all you wanted to do, a simple perl -pi -e 's/chomp([^;]*);/{local $/="\r\n";chomp$1;}/gm;' <your files here> would have sufficed ... wouldn't it?

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        Wouldn't this be a good place to use local? Something like
        { local $/ = "\n\r"; chomp; }
        protect the global state of $/ and let you deal with changing it's behavior right before the chomp. The scope could be increased until you've captured all of your chomp calls. Any new chomps you write, unless they're in the same scope as the local call would use the default value of $/
        code is untested.
Re: Redefining chomp()
by dragonchild (Archbishop) on Mar 24, 2004 at 13:45 UTC
    If you read the description of chomp, you will notice that ... it deletes the terminating string corresponding to the current value of $/ ....

    Further down, it says

    With version 5.6, the meaning of chomp changes slightly in that input disciplines are allowed to override the value of the $/ variable and mark strings as to how they should be chomped. This has the advantage that an input discipline can recognize more than one variety of line terminator ...

    If you can, I'd look at that. If you can't, look at CORE::chomp().

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: Redefining chomp()
by jaa (Friar) on Mar 24, 2004 at 14:08 UTC

    Why not pass your data files through dos2unix before letting them near your script?

    or transfer them into the *nix world with ASCII mode ftp?

Re: Redefining chomp()
by matija (Priest) on Mar 24, 2004 at 13:46 UTC
    Chomp removes any trailing string that corresponds to $/ or use English; $INPUT_RECORD_SEPARATOR, so my guess would that you just need to modify that...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://339389]
Approved by Limbic~Region
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2019-07-17 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?