For how many tasks have you wanted to use a sampling of every Nth line of a file?
- selecting a "random" subset before running on all five million lines
- getting a flavor of what's in a line-oriented database
- holding out test data
Well, for me, it's nearly every line-based text-processing tool I write -- if it's not a standard requirement, it's usually much more informative to test on every 50th line of my test corpus than it is to use the first 50 lines for test data.
In fact, I find it very frustrating that there's no Unix power tool a la grep or tail that does this.
So, per is an addition to the Unix power-tool library -- it's sort of like head or tail except that it takes every Nth line instead of the first or last N. Save it as ~/bin/per (or /usr/bin/per) and use it every day, like me.
Windows users can run pl2bat on this and put it somewhere in your path -- my NT box happily uses a variant of this.
Usage info is in POD, in the script. But here it is in HTML anyway (I love pod2html):
per - return one line per N lines
per -oOFFSET -N files
per -90 -o2 file.txt # every 90th line starting with line 2
per -o500 -3 file.txt # every 3rd line starting with line 500
per -o1 -2 file.txt # every other line, starting with the first
per -2 file.txt # same as above
It can also read from STDIN, for pipelining:
tail -5000 bigfile.txt | per -100 # show every 100th line for the
# last 5000 in the file
per writes every Nth line, starting with OFFSET, to
STDOUT.
- -N
-
the integer value N provided (e.g. -50, -2) is used to decide
which lines to return -- every Nth.
- -oOFFSET
-
the value OFFSET provided says how far down in the input to proceed
before beginning. The output will begin at line number
OFFSET. Default is 1.
- files
-
Note that per works on files specified on the commandline, or on
STDIN if no files are provided. The special input file -
indicates that remaining data should be read from STDIN.
|