Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

unhead and untail

by halley (Prior)
on Apr 11, 2003 at 22:14 UTC ( [id://249987]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info Ed Halley <ed@halley.cc> This program is free software; you can redistribute it and/or modify it under the same terms as Perl
Description: The historically common head(1) and tail(1) commands are for keeping the head or tail of stream input, usually by a count of lines. This pair of scripts differ in three respects:
  • these scripts don't show the head or tail, they show everything except the head or tail,
  • these scripts don't count lines but instead work on a single regex (regular expression) to find a matching "cut here" point in the text,
  • these scripts can actually modify/edit text files in place if filenames are given.

Check out the pod output for a complete man-page.

#!/usr/bin/perl -w

#---------------------------------------------------------------------
+-------
# Copyright (C) 2001-2003 Ed Halley
#---------------------------------------------------------------------
+-------

=head1 NAME

unhead - remove the lines before a match from the input stream

untail - remove the lines following a match from the input stream

=cut

# Both 'unhead' and 'untail' are identical; they determine the task fr
+om
# the $0 perl variable.  The scripts can be linked or symlinked or sim
+ply
# copied, and will still work as long as they have the correct filenam
+es.

=head1 SYNOPSIS

    unhead  '--BEGIN MESSAGE--'  *.txt

    untail  '--END MESSAGE--'  *.txt

=head1 DESCRIPTION

The historically common C<head>(1) and C<tail>(1) commands are for
keeping the head or tail of stream input, usually by a count of lines.
This pair of scripts differ in two respects: these scripts work on a
single regex (regular expression) to find a matching "cut here" point 
+in
the text, and these scripts edit text files in place.

By default, the first argument should be a regular expression that sho
+uld
match at least once in each subsequent file.  If no additional argumen
+ts
are given, or if the filename is a hyphen (-), then the standard input
and standard output streams are assumed.  For filenames, each file is
processed in turn, creating a backup file with a tilde (~) appended to
the original name.

The C<unhead> variant will remove the header above the given match, an
+d
write out the tail (including the first matching line).  The C<untail>
variant will remove the trailing below the given match, and only write
out the head (not including the first matching line).

=cut

#---------------------------------------------------------------------
+-------

use warnings;
use strict;

my $want = ($0 =~ /untail/)? 'head' : 'tail';
my $suffix = '~';

my $pattern = shift(@ARGV);
die "First argument should be a regular expression string" if not $pat
+tern;
$pattern = qr/$pattern/;
@ARGV = ('-') if not @ARGV;

exit(main(@ARGV));

#---------------------------------------------------------------------
+-------

sub main
{
    while (@_)
    {
    my $inp = *STDIN;
    my $outp = *STDOUT;
    my $file = shift;
    if ($file ne '-')
    {
        die if not -w $file;
        $inp = *FIN; $inp = *FIN; # hush hush warning warning
        $outp = *FOUT; $outp = *FOUT;
        unlink($file.$suffix) if -f $file.$suffix;
        rename($file, $file.$suffix);
        open($inp, $file.$suffix)
        or die "Cannot open $file";
        open($outp, '>'.$file)
        or die "Cannot open $file";
    }

    my $have = 'head';
    while (<$inp>)
    {
        $have = 'tail' if m{$pattern};
        print $outp $_ if $want eq $have;
    }

    if ($file ne '-')
    {
        close($outp);
        close($inp);
    }
    }
}

__END__

#---------------------------------------------------------------------
+-------

=head1 IDEAS

    # chop off email routing header info (first empty line)
    cat email.msg | unhead '^$'  

    # just students from "Danziger, Jane" to "Funicello, Thomas"
    query-students | sort | unhead '^D' | untail '^G'

    # keep the center section of a YACC grammar
    unhead '^%%$' *.y ; untail '^%%$' *.y

=head1 BUGS

The C<unhead> variety will discard everything and leave an empty file 
+if
no lines match the given pattern.  This is not a bug, but worth a
warning.  In the case of actual files, you still have the tilde backup
+.

The script for both C<unhead> and C<untail> are identical, and one can
+ be
a link to the other.  The script determines which part to keep based o
+n
the script's name.

=head1 LICENSE

Copyright (C) 2001-2003  Ed Halley  <ed@halley.cc>

This program is free software; you can redistribute it and/or modify i
+t
under the same terms as Perl itself.  For details on the Perl Artistic
License, read the F<http://www.perl.com/language/misc/Artistic.html>
page.

=cut
Replies are listed 'Best First'.
Re: unhead and untail
by dmitri (Priest) on Apr 15, 2003 at 19:54 UTC
    A couple of thoughts:

    The cutting up of input stream as you describe can be much done much easier with sed, for instance, instead of

    query-students | sort | unhead '^D' | untail '^G'
    do
    query-students | sort | sed '/^D/,/^G/p; d'
    Also, neither head nor tail modify files they operate on. I understand, you were probably solving a specific problem, but this script can be generalized even more.
      unhead is superfluous even with inplace editing given recent versions of GNU sed, as they sport an -i switch for inplace editing just like Perl has. What you still can't do with sed is specify "X lines before end of file" addresses.

      Makeshifts last the longest.

        > What you still can't do with sed is specify "X lines before end of file" addresses.

        Neither can this utility, read the description.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://249987]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2025-06-12 16:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.