Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

unhead and untail

by halley (Prior)
on Apr 11, 2003 at 22:14 UTC ( #249987=sourcecode: print w/ replies, xml ) Need Help??

Category: Utility Scripts
Author/Contact Info Ed Halley <ed@halley.cc> This program is free software; you can redistribute it and/or modify it under the same terms as Perl
Description: The historically common head(1) and tail(1) commands are for keeping the head or tail of stream input, usually by a count of lines. This pair of scripts differ in three respects:
  • these scripts don't show the head or tail, they show everything except the head or tail,
  • these scripts don't count lines but instead work on a single regex (regular expression) to find a matching "cut here" point in the text,
  • these scripts can actually modify/edit text files in place if filenames are given.

Check out the pod output for a complete man-page.

#!/usr/bin/perl -w

#---------------------------------------------------------------------
+-------
# Copyright (C) 2001-2003 Ed Halley
#---------------------------------------------------------------------
+-------

=head1 NAME

unhead - remove the lines before a match from the input stream

untail - remove the lines following a match from the input stream

=cut

# Both 'unhead' and 'untail' are identical; they determine the task fr
+om
# the $0 perl variable.  The scripts can be linked or symlinked or sim
+ply
# copied, and will still work as long as they have the correct filenam
+es.

=head1 SYNOPSIS

    unhead  '--BEGIN MESSAGE--'  *.txt

    untail  '--END MESSAGE--'  *.txt

=head1 DESCRIPTION

The historically common C<head>(1) and C<tail>(1) commands are for
keeping the head or tail of stream input, usually by a count of lines.
This pair of scripts differ in two respects: these scripts work on a
single regex (regular expression) to find a matching "cut here" point 
+in
the text, and these scripts edit text files in place.

By default, the first argument should be a regular expression that sho
+uld
match at least once in each subsequent file.  If no additional argumen
+ts
are given, or if the filename is a hyphen (-), then the standard input
and standard output streams are assumed.  For filenames, each file is
processed in turn, creating a backup file with a tilde (~) appended to
the original name.

The C<unhead> variant will remove the header above the given match, an
+d
write out the tail (including the first matching line).  The C<untail>
variant will remove the trailing below the given match, and only write
out the head (not including the first matching line).

=cut

#---------------------------------------------------------------------
+-------

use warnings;
use strict;

my $want = ($0 =~ /untail/)? 'head' : 'tail';
my $suffix = '~';

my $pattern = shift(@ARGV);
die "First argument should be a regular expression string" if not $pat
+tern;
$pattern = qr/$pattern/;
@ARGV = ('-') if not @ARGV;

exit(main(@ARGV));

#---------------------------------------------------------------------
+-------

sub main
{
    while (@_)
    {
    my $inp = *STDIN;
    my $outp = *STDOUT;
    my $file = shift;
    if ($file ne '-')
    {
        die if not -w $file;
        $inp = *FIN; $inp = *FIN; # hush hush warning warning
        $outp = *FOUT; $outp = *FOUT;
        unlink($file.$suffix) if -f $file.$suffix;
        rename($file, $file.$suffix);
        open($inp, $file.$suffix)
        or die "Cannot open $file";
        open($outp, '>'.$file)
        or die "Cannot open $file";
    }

    my $have = 'head';
    while (<$inp>)
    {
        $have = 'tail' if m{$pattern};
        print $outp $_ if $want eq $have;
    }

    if ($file ne '-')
    {
        close($outp);
        close($inp);
    }
    }
}

__END__

#---------------------------------------------------------------------
+-------

=head1 IDEAS

    # chop off email routing header info (first empty line)
    cat email.msg | unhead '^$'  

    # just students from "Danziger, Jane" to "Funicello, Thomas"
    query-students | sort | unhead '^D' | untail '^G'

    # keep the center section of a YACC grammar
    unhead '^%%$' *.y ; untail '^%%$' *.y

=head1 BUGS

The C<unhead> variety will discard everything and leave an empty file 
+if
no lines match the given pattern.  This is not a bug, but worth a
warning.  In the case of actual files, you still have the tilde backup
+.

The script for both C<unhead> and C<untail> are identical, and one can
+ be
a link to the other.  The script determines which part to keep based o
+n
the script's name.

=head1 LICENSE

Copyright (C) 2001-2003  Ed Halley  <ed@halley.cc>

This program is free software; you can redistribute it and/or modify i
+t
under the same terms as Perl itself.  For details on the Perl Artistic
License, read the F<http://www.perl.com/language/misc/Artistic.html>
page.

=cut

Comment on unhead and untail
Download Code
Re: unhead and untail
by dmitri (Curate) on Apr 15, 2003 at 19:54 UTC
    A couple of thoughts:

    The cutting up of input stream as you describe can be much done much easier with sed, for instance, instead of

    query-students | sort | unhead '^D' | untail '^G'
    do
    query-students | sort | sed '/^D/,/^G/p; d'
    Also, neither head nor tail modify files they operate on. I understand, you were probably solving a specific problem, but this script can be generalized even more.
      unhead is superfluous even with inplace editing given recent versions of GNU sed, as they sport an -i switch for inplace editing just like Perl has. What you still can't do with sed is specify "X lines before end of file" addresses.

      Makeshifts last the longest.

        > What you still can't do with sed is specify "X lines before end of file" addresses.

        Neither can this utility, read the description.

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://249987]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2014-07-11 23:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (236 votes), past polls