Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

get string/array slices using cut-like specifications

by jdporter (Paladin)
on Mar 10, 2011 at 03:50 UTC ( [id://892335]=CUFP: print w/replies, xml ) Need Help??

The other day in the cb kailas asked how to reproduce the following cut command in perl:

cut -c2-11,12-61,62-63,64-76,90-92 $str
Well, reproducing cut in perl is actually not quite as trivial as it seems on first look, because cut can take some pretty hairy specifications, such as descending indices, e.g. 12-8,5,4.

Here's an attempt to do at least part of what cut can do, in a native perly context of extracting either sub-strings or sub-arrays. It can handle any amount of overlapping and descending ranges. However, it does not do argument validation. If you attempt to get string/array elements beyond the range of the input, ugly things may happen.

The string-oriented solution uses unpack, and makes the optimization of calling unpack only once. The array-oriented solution has to return arrays, and since there's no way (afaik, in perl 5) to get multiple slices of an array, separately, in a single slicing operation, it can't make a similar optimization: it has to get as many distinct slices as there are "ranges" in the spec. Consequently, that solution is more elegant-looking. We could take the same approach for strings, using substr, and it would look about as elegant, but clearly not as optimized.

Note that indexing starts at 1 in both cases, in accordance with cut.

{ package Cut; sub from_list { my( $spec ) = @_; map [ $_->[1] < $_->[0] ? reverse @_[ $_->[1] .. $_->[0] ] : @_[ $_->[0] .. $_->[1] ] ], map [ /(.*)\s*-\s*(.*)/ ? ( $1, $2 ) : ( $_, $_ ) ], split /\s*,\s*/, $spec; } sub from_string { my( $spec, $input ) = @_; my @spec = map [ /(.*)\s*-\s*(.*)/ ? ( $1, $2 ) : ( $_, $_ ) ], split /\s*,\s*/, $spec; my $ofs=0; my %reverse; my @pat; for ( 0 .. $#spec ) { my( $lo, $hi ) = @{ $spec[$_] }; if ( $hi < $lo ) { $reverse{$_} = 1; ( $lo, $hi ) = ( $hi, $lo ); } my $move = $lo - $ofs - 1; my $len = $hi - $lo + 1; $ofs = $hi; $pat[$_] = ( $move > 0 ? 'x'.$move : $move < 0 ? 'X'.(-$move) : '' ) . 'a'.$len; } my @result = unpack "@pat", $input; $result[$_] = reverse $result[$_] for keys %reverse; @result } } # some test cases: my @a = Cut::from_string( '1,3-4,6-10,12-8,1,1,1', join '', 'a'..'z' ) +; print "'$_'\n" for @a; my @b = Cut::from_list( '1,3-4,6-10,12-8,1,1,1', 'a'..'z' ); print "> @$_\n" for @b;
I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

Replies are listed 'Best First'.
Re: get string/array slices using cut-like specifications
by repellent (Priest) on Mar 10, 2011 at 06:15 UTC
      ... because cut can take some pretty hairy specifications, such as descending indices, e.g. 12-8,5,4.

    Which version of cut?
    $ cat b.txt 123456789 $ cut -c 6-4,2,8,8,8,8 b.txt 28 $ gcut -c 6-4,2,8,8,8,8 b.txt gcut: invalid decreasing range Try `gcut --help' for more information. $ gcut --version cut (GNU coreutils) 7.5 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp +l.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by David M. Ihnat, David MacKenzie, and Jim Meyering.

    My cut came with FreeBSD 6.4.

      Hm. Right. I was going by the man page linked at cut. It says, "selected input is written in the same order that it is read." I took that to mean that arbitrary order is supported and respected. What else could it mean? But I don't see any explicit mention of non-increasing order. And it does explicitly state that each "selected input... is written exactly once." I find that either vague or redundant. What is a "selected input"? If "2" occurs in the spec twice, doesn't that make the second element a selected input twice, both of which are to be written exactly once? In any case, my perl version is at least as general, if not more so, than the cut version. :-)

      Obviously there are other things cut can do which I have not (yet) implemented here.

      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
Re: get string/array slices using cut-like specifications
by Anonymous Monk on Mar 10, 2011 at 03:57 UTC
Re: get string/array slices using cut-like specifications
by Argel (Prior) on Mar 10, 2011 at 22:42 UTC
    <facetious>
    What am I missing here? Wouldn't The One True Perl Way to implement a UNIX command be to take the C source code, integrate that into Perl, and then add a built-in 'cut' function to perl that "just" calls the C code?!? ;-)
    </facetious>

    (nice cut implementation, btw)

    Elda Taluta; Sarks Sark; Ark Arks

Re: get string/array slices using cut-like specifications
by educated_foo (Vicar) on Apr 04, 2011 at 17:22 UTC
    Sounds like golf! First, a not-that-obfuscated version that emulates cut -cSPEC:
    #!perl -lp BEGIN { $s = shift } sub cut { my ($s, $r) = pop; for (split /,/, shift) { $r .= /(\d+)-(\d+)/ ? substr $s,$1-1,$2-$1+1 : substr $s, $_-1 +,1; } $r; } chomp; $_=cut $s, $_;
    Alright, now let's get started with a 104 96:
    #!perl -lp BEGIN{@s=split/,/,shift} chomp; $x=$_; /(\d+)-?/, $r.=substr$x,$1-1,($'||$1)-$1+1 for@s,$r=''; $_=$r
    ...or 73:
    #!perl -ln BEGIN{($s=shift)=~s/-/../g} chomp; $x=$_; print+map{substr$x,$_-1,1}eval$s
Re: get string/array slices using cut-like specifications
by cavac (Parson) on Apr 04, 2011 at 16:34 UTC
    Thanks. You just gave me an idea how to get the "partial content" handling in my webserver off the ground!
    Don't use '#ff0000':
    use Acme::AutoColor; my $redcolor = RED();
    All colors subject to change without notice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://892335]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-18 13:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found