Rhodium has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm reading in a file. Each line consists of some data, and if the line is longer that 80 chars it continues onto the next line but appends a + to that line.
foo bar foo bar foo bar...bar +foo bar
I want to reconstruct this line into a single line so I can work with it later. Here is what I came up with, but it doesn't work nicely.. Can anyone help me out..
#each element in the array is a single line my @netlist = <NETLIST>; close(NETLIST); #while it is not the end of the file while($line = shift @netlist){ chomp $line; my $tmp = $line; $line = shift @netlist; chomp $line; if ($line =~ /^\+/){ # Is this a continuation.. s/^\+//; #remove the continuation $tmp = $tmp . $line; print "$tmp \n"; }else{ pop (@netlist, $line); print "$tmp"; } }
Thanks much. If there is a simpler more <it>perlish</it> way to do this I am really open to suggestions. Additionally as some point I am going to need to reconstruct this long line so it doesn't go over 80 chars, and doesn't break up words. I am really open to suggestions here.. Thanks so much!!

Replies are listed 'Best First'.
Re: Look ahead and join if the line begins with a +
by japhy (Canon) on Apr 11, 2002 at 04:20 UTC
    Here are several approaches; they are of varying degrees of difficulty.
    # simplistic my @lines = scalar <FILE>; while (<FILE>) { if (s/^\+//) { $lines[-1] .= $_ } else { push @lines, $_ } } # fancy my @lines; while (<FILE>) { if (s/^\+//) { $lines[@lines && -1] .= $_ } else { push @lines, $_ } } # compact (UPDATED: 1 -> -1) my @lines; $lines[s/^\+// ? @lines && -1 : @lines] .= $_ while <FILE>;

    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Uhm, these are all broken. You are forgetting to remove the trailing newline before you concatentate. See Re: (MeowChow) Re2: Look ahead and join if the line begins with a + PS The compact example is even more broken - try running it on the sample data set.

      my @lines = s//join'',<DATA>/e ? s/\n\+//g ? split "\n" : ($_) :($_);




      # vulgar my @lines; $lines[@lines - s/^\+//] .= $_ while <FILE>;
                     s aamecha.s a..a\u$&owag.print


        # vulgar AND broken my @lines; $lines[@lines - s/^\+//] .= $_ while <DATA>; print @lines __DATA__ This +is +supposed +to +be +one +line! Oops

        I'm sure that this is what you meant, but even this is broken as it will possibly remove the last char in the file if their is no trailing newline.

        $lines[@lines - s/^\+//] .= substr $_,0,-1 while <DATA>;

        This will work:

        $lines[ @lines + 1 - s -^\053|\012--g ] .= $_ while <DATA>;




      Kudos japhy
      Your simplistic takes home the prize.. but you forgot to remove \n..
      @cleanet = scalar <NETLIST>; while (<NETLIST>) { if (s/^\+// or s/^\*\+//) { $cleanet[-1] =~ s/\n/ /; $cleanet[-1] .= $_ } else { push @cleanet, $_ } }
      Thanks so much, now how do I get this back to where I started from.. After 80 characters and NOT on a word(something separated with a space) break it into two lines and put back the "+"?
      Thanks again.
        Of course, that simplifies to   $cleanet[-1] =~ s/\n/ $_/;,  but see merlyn's solution below.

Re: Look ahead and join if the line begins with a +
by belg4mit (Prior) on Apr 11, 2002 at 03:57 UTC
    my @list; #Read first, ask questions later is probably not optimal for this #Scope our variable, merge, read, and assign all at once while(chomp(my $line = <NETLIST>)){ #Save a match, substitution returns true iff we susbtituted if ($line =~ s/^\+//){ #Append to previous line (last line in the array) $list[-1] .= $line; } else{ #Save the line push(@list, $line); } } #Map is good, nice way to get all the newlines back #join would work if you the file need not end in a newline print map {"$_\n"} @list;
    my @list; while(chomp(my $line = <NETLIST>)){ #Trinary is good ;-) $line =~ s/^\+// ? $list[-1] .= $line : push(@list, $line); } print map {"$_\n"} @list;
    my @list; #for is very perlish, so is using $_ chomp && $_ =~ s/^\+// ? $list[-1] .= $_ : push(@list, $_) for <NETLIS +T>; print map {"$_\n"} @list;

    perl -pe "s/\b;([mnst])/'\1/mg"

      You wrote:
      while(chomp(my $line = <NETLIST>)){
      What if the last line is missing its \n?

      Since chomp will return 0 if no characters are removed, it's probably safer to go for the less fun

      while(defined(my $line = <NETLIST>)){ chomp;


      Edit:Forgot the defined, needed in case of last lines containing only perlish false values, like 0...

        Simple, you don't do that ;-). Seriously, you have a valid point but I don't think it's unreasonable to know your data. It's also not unreasonable to code in a seemingly data independent manner either, but it may just be a throw away tool.

        You could also test to $_ upon exiting the loop, and act accordingly, probably undef'ing $_ withing the loop on each execution.

        perl -pe "s/\b;([mnst])/'\1/mg"

           #Map is good, nice way to get all the newlines back {local $"="\n"; print "@list\n"}

        Cool, but does the interpolation cost much (for a large list)?

        perl -pe "s/\b;([mnst])/'\1/mg"

Re: Look ahead and join if the line begins with a +
by emilford (Friar) on Apr 11, 2002 at 04:10 UTC
    This works as well...
    #!/usr/bin/perl -w use strict; my $last_line; my $i; my $num_lines; open(FILE, "<test.txt") || die "Unable to open file: $!"; chomp(my @lines = <FILE>); # get the lines close(FILE); $num_lines = @lines; for($i=0; $i < $num_lines; $i++) { if($i == $num_lines-1) { # make sure we don't access # uninitialized index of array print "$lines[$i]\n"; # print the last line } elsif($lines[$i+1] =~ m/\+/) { # look ahead $lines[$i+1] =~ s/^\+//; print "$lines[$i] $lines[++$i]\n"; } else{ print "$lines[$i]\n"; } }
    Hope this helps. -Eric
      I like this solution. It someowhat does what I want..but what about multiple "+" lines..
      This is a test + this + that
      So this won't work nicely.. But I worked with this one for awhile and tried to get it right..
      Thanks anyway.
Re: Look ahead and join if the line begins with a +
by stephen (Priest) on Apr 11, 2002 at 04:36 UTC

    The other methods listed here are probably more efficient, especially if you're going to do something with the lines individually later. However, since you're preloading the file into memory as it is, here's something simpler:

    use strict; my $all_text; { local($/); $all_text = <NETLIST>; } $all_text =~ s/\n\+/ /gs; print $all_text;

    You might want to check out Text::Wrap for reflowing it.


    Update: Thanks to japhy for pointing out a mental misstep on my part...

    Update 2: Thanks to broquaint for a reminder on local()...

      The $/ variable is not filehandle-based (yet -- wait for Perl 6). You'll want to local()ise it in a block, read from the filehandle, and then leave the scope.

      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Just a small note - local() automatically undef()ines whatever gets passed to it.
      $foo = "I am a package var, yes I am"; { local $foo; print "\$foo is no more\n" if not defined $foo; } print qq(\$foo returns!\n) if defined $foo; __output__ $foo is no more $foo returns!



      If you can fit all the lines and actually want to process them as separate lines then you can use a negative lookahead assertion to split just on linebreaks that aren't followed by plus signs:

      { local $/; foreach (split /\n(?!\+)/, <>) { s/\n\+/ /g; print "$_\n"; } }

      Then you just have to substitute out the internal linebreaks and pluses that are left. (You may not want the space in there depending on your data.)

      Note that as a side-effect, the split has chomped the input, so a \n may be needed in the output.

Re: Look ahead and join if the line begins with a +
by Juerd (Abbot) on Apr 11, 2002 at 06:40 UTC

    Nice job for MJD's Tie::File.

    use Tie::File; tie my @lines, 'Tie::File', 'filename' or die $!; for (my $i = 1; $i <= $#lines; $i++) { if ($lines[$i] =~ /^\+/) { $lines[$i - 1] =~ s/\n/substr $lines[$i], 1/e; splice @lines, $i, 1; redo; } }


      I see you promoting the fairly expensive Tie::File repeatedly, without paying much attention to much simpler solutions:
      my @lines = do { local (*ARGV, $/); @ARGV = 'filename'; (my $s = <>) =~ s/\n\+//g; $s =~ /(.*\n?)/g; };
      Please be aware of the expense of Tie::File.

      -- Randal L. Schwartz, Perl hacker

        I see you promoting the fairly expensive Tie::File repeatedly

        Yes, and just like you, I use dot star, and more inefficient functions. Why? Because sometimes, I just don't care about speed. I am quite sure I won't notice a few milliseconds in execution time, or even a few seconds with larger files (you would need VERY (100 MB+) large files to notice the difference when you're replacing continuation characters). I would however notice the time spent on thinking, if I code it.

        Of course, a regex solution is faster. I guess that in this case it will be approximately three times as fast as the Tie::File solution I gave. However, TIMTOWTDI, and slow isn't necessarily bad.

        A regex-solution given by me would probably have been something like:

        perl -i -pe'BEGIN { undef $/ } s/\n\+//g' filename

        without paying much attention to much simpler solutions

        Tie::File is new for me, as it is for many other people. I have found it to be very efficient in certain cases, and will continue to use it. To learn to use something, you must use it for real life problems. So I did, and I use this monastery for that. In this case it's not the most efficient method of dealing with the problem, but if we're going to talk about efficiency, I'd like to ask why you repeatedly promote (if that word is appropriate, and I think it is not) CGI.pm, instead of more efficient alternatives. I think it has to do with programming style. To me, style is much more important than efficiency (unless something is done many, many times in a loop). And maybe you think using diamonds, substitution, m//g in list context, and local are "simpler", but I tend to disagree.

        Please be aware of the expense of Tie::File.

        I am. Constantly, and have been ever since I first saw it.

        Yes, I reinvent wheels.

Re: Look ahead and join if the line begins with a +
by Maclir (Curate) on Apr 11, 2002 at 15:06 UTC
    Some good solutions presented here - but - experience shows that the area of greatest difficulty in designing working algorithms is in dealing with "edge conditions".

    Ask yourself - those who proposed things like

    What happens if the first line of the file starts with a continuation character? Sure, this is logically an invalid case, but . . .

      My three examples protect against that, by the way.

      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;