Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

removing lines that are in the end of a file

by bhargavkanakiya (Initiate)
on Apr 05, 2013 at 12:06 UTC ( #1027128=perlquestion: print w/ replies, xml ) Need Help??
bhargavkanakiya has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have thousands of files with the following format:
0.642375 125 SIL
1.0705 125 ઔર્
1.3651875 125 આત્
1.519875 125 મ
...
...
...
7.2140627 125 સે
7.478125 125 હટ્
7.622625 125 જા
7.956125 125 ઓ
8.192375 125 SIL
8.252 125 SIL
8.464 125 SIL
8.706 125 SIL
... and so on
Also, the number of lines are not same in all the files. I just want one line with SIL at the end (shown in red) and wish to get rid of the others following it.
Can anyone help me how can I resolve this using perl? I'd be great if i can input all the files at a time process it and save it back with required lines. I am acquainted with file handling. All I seek help for is the formatting (i.e., getting rid of those extra lines). Thank you.

Comment on removing lines that are in the end of a file
Re: removing lines that are in the end of a file
by hdb (Parson) on Apr 05, 2013 at 12:19 UTC

    In your loop over the lines of each file, you say

    while( my $line = <FILE> ) { chomp($line); if( $line =~ / SIL$/ ) { print $line, "\n"; last; } }

    UPDATE: Removed ~ in assignment to $line, thanks to Loops!

Re: removing lines that are in the end of a file
by trizen (Friar) on Apr 05, 2013 at 12:21 UTC
    Is this what you are looking for?
    use strict; use warnings; use Tie::File; # usage: perl script.pl [file1] [file2] [...] foreach my $file (grep { -f } @ARGV) { tie my @file, 'Tie::File', $file or die "Can't tie into file $file: $!"; my $regex = qr/\sSIL\s*\z/; foreach my $i (reverse 0 .. $#file) { if ($file[$i] =~ /$regex/) { 1 while ($file[--$i] =~ /$regex/); print "$file[$i + 1]\n"; $#file = $i + 1; last; } } untie @file; }
Re: removing lines that are in the end of a file
by thundergnat (Deacon) on Apr 05, 2013 at 13:59 UTC
    If it is exactly that line in every file, and not just "the first value larger than 8" y You could just play games with the input record separator.

    UPDATE: Sorry, misread question. Changed program. Note that your data set has a line that ends in "SIL" before the one you highlighted in red. If you want everything up to the SECOND line ending in "SIL" just repeat the "print" line.

    { local $/ = "SIL\n"; #open my $in, '<', $whatever or die "$!"; #open my $out, '>', $output or die "$!"; #print $out scalar <$in>; print scalar <DATA> for 1..2; } __DATA__ 0.642375 125 SIL 1.0705 125 ઔર્ 1.3651875 125 આત્ 1.519875 125 મ ... ... ... 7.2140627 125 સે 7.478125 125 હટ્ 7.622625 125 જા 7.956125 125 ઓ 8.192375 125 SIL 8.252 125 SIL 8.464 125 SIL 8.706 125 SIL
Re: removing lines that are in the end of a file
by BrowserUk (Pope) on Apr 05, 2013 at 15:20 UTC

    The answers so far do not seem to have noticed the first line which also contains "SIL".

    This is pretty trivial using File::ReadBackwards:

    #! perl -slw use strict; use File::ReadBackwards; my $file = $ARGV[ 0 ]; tie *BW, 'File::ReadBackwards', $file; my $lastpos = -s( $file ); while( <BW> ) { last unless /SIL/; $lastpos = tell( BW ) } truncate $file, $lastpos; close BW;

    A sample run:


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks a lot to you. This also removes the last line with SIL. This is not how i wanted but still you have made my task a lot easier. I can add last SIL using other software that i am currently working on.

        This also removes the last line with SIL. This is not how i wanted

        ReadBackwards() has to do strange things with tell, so a little extra work is involved. Try this version:

        #! perl -slw use strict; use File::ReadBackwards; my $file = $ARGV[ 0 ]; tie *BW, 'File::ReadBackwards', $file; my $lastpos = -s( $file ); my $len = 0; while( <BW> ) { $len = length() +1; next unless length >1; last unless /SIL/; $lastpos = tell( BW ); } truncate $file, $lastpos + $len; close BW;

        Note: You won't need the +1 in the  $len = length() +1; line if you are not on Windows.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: removing lines that are in the end of a file
by davido (Archbishop) on Apr 05, 2013 at 15:40 UTC

    truncate combined with tell seems to be pretty straight-forward:

    open my $fh, '+<', 'test.txt' or die $!; my $last_matched = 0; my $told = 0; while( my $line = <$fh> ) { my $matched = $line =~ m/SIL$/; truncate $fh, $told and last if $matched && $last_matched; $last_matched = $matched; $told = tell $fh; } close $fh or die $!;

    Dave

      Your solution would truncate the OPs sample data before the first line. (Which also contains "SIL").

      Ie. It would empty the file.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        No, it would truncate after the first line that ends in SIL that is followed by a line ending in SIL; that much I did test. It works as it should for the data the OP posted, but your read-backwards solution is better because it doesn't make the assumption that SIL lines are only repeated in succession at the end of the file in the real data-set. Had I seen yours before posting I wouldn't have bothered since it's a more reliable variation on the tell and truncate theme. ;)


        Dave

Re: removing lines that are in the end of a file
by kcott (Abbot) on Apr 05, 2013 at 19:02 UTC

    G'day bhargavkanakiya,

    You say you're OK with the file handling. Here's my solution for the data handling.

    1. Read a line and put it in a buffer.
    2. If line ends with "SIL", go back to 1 (i.e. read another line).
    3. Output all lines in the buffer.
    4. Clear the buffer.
    5. Go back to 1 (i.e. read another line).
    6. When all lines read, output the first line in the buffer.

    Here it is on the commandline:

    $ perl -Mstrict -Mwarnings -E '
        my @buffer;
        my @input = <>;
        say "========\n Output\n========";
        for (@input) {
            push @buffer, $_;
            next if /SIL\s*$/m;
            print shift @buffer while @buffer;
        }
        print $buffer[0] if @buffer;
    '
    0.642375 125 SIL 
    1.0705 125 ઔર્ 
    1.3651875 125 આત્ 
    1.519875 125 મ
    7.2140627 125 સે 
    7.478125 125 હટ્ 
    7.622625 125 જા 
    7.956125 125 ઓ 
    8.192375 125 SIL 
    8.252 125 SIL 
    8.464 125 SIL 
    8.706 125 SIL
    ========
     Output
    ========
    0.642375 125 SIL 
    1.0705 125 ઔર્ 
    1.3651875 125 આત્ 
    1.519875 125 મ
    7.2140627 125 સે 
    7.478125 125 હટ્ 
    7.622625 125 જા 
    7.956125 125 ઓ 
    8.192375 125 SIL 
    

    I found some lines had additional whitespace at the end: this may be valid data; a result of your cut-n-paste operation; related to all those special characters; or, something else. Anyway, I used the /SIL\s*$/m regexp to get around this — you may want to modify that for your real-world application.

    -- Ken

Re: removing lines that are in the end of a file
by gsiems (Chaplain) on Apr 05, 2013 at 19:40 UTC
    Another approach using split and join:
    #!/usr/bin/env perl use strict; use warnings; foreach my $file (grep { -f } @ARGV) { my $data = slurp_file ($file); my $new_data = join ('', (split /(SIL\n)/, $data, 5)[ 0 .. 3 ]); open my $OUT, '>', $file or die $!; print $OUT $new_data; close $OUT; } sub slurp_file { local (*ARGV, $/); @ARGV = shift; <> }

    Update

    This only works if there are no additional SIL lines interspersed between the first line and the SILL lines at the end.

Re: removing lines that are in the end of a file
by pvaldes (Chaplain) on Apr 05, 2013 at 20:25 UTC

    Slurp the entire file and substitute \n.*?$

    UPDATED: ok, i see, this is a different question than "remove the last line of the file".

    Slurp the entire file and print $OUTFILE ^.*?125\sSIL

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1027128]
Approved by Ratazong
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-08-01 10:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (2 votes), past polls