Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

(Golf) Character windows

by japhy (Canon)
on Aug 24, 2001 at 03:53 UTC ( #107557=perlmeditation: print w/replies, xml ) Need Help??

For my regex book, I have an exercise in the chapter about look-ahead that asks for a regex to split a string into groups of N characters, as though a window was being passed over the string. What I mean is, from "regexes", if N is 3, get "reg", "ege", "gex", "exe", "xes". This is rather simple once the chapter has been read -- at least, that's the purpose of the chapter!

But there is a trickier question (not in my text) which is to have the window start BEFORE the beginning of the string and end AFTER the end of the string. That means, with the same string and N, you would get "r", "re", "reg", "ege", "gex", "exe", "xes", "es", "s". That is more difficult, and sadly, bleadperl gave me grief when I tried to solve this problem with a regex using (??{ ... }). So I extend this challenge to you.

Given: string in $_, chunk size in $s (or any other variable you wish)
Golf: extract all overlapping substrings of at least 1 and at most $s characters from the string
Known: the string is at least 1 character long, but MAY be shorter than $s characters

Here is my attempt:
# 64 @s=grep''ne$_,/^@{[map"(?=(.{$_}))?",1..$s]}.|(?=(.{1,$s}))/gsx;

Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re (tilly) 1: (Golf) Character windows
by tilly (Archbishop) on Aug 24, 2001 at 04:58 UTC
    #23456789_123456789_123456789_123456789_123456789_1234567 $"='';@s=/./gs;@s=map"@s[($_<$s?0:$_-$s)..$_]",0..$s+$#s;
    #23456789_123456789_123456789_123456789_123456789_123456789 $"='';@s=/./gs;@s=map"@s[($_<$s?0:$_-$s)..$_-1]",1..$s+$#s;
    Trying to shorten this, I assumed that the string was in a variable $c, and did this:
    This fails. And it fails for reasons that illustrate very clearly why I think that prototypes are officially a Bad Thing. And so my substr based approach comes out at 58 with:
    #23456789_123456789_123456789_123456789_123456789_12345678 @s=map{substr$c,$_<0?0:$_,$_<0?$s+$_:$s}1-$s..-1+length$c;
    UPDATE 2
    Oops, My initial solution produced an extra character. I am up to 58.
      You still have empty strings returned. Here's my best:
      #23456789_123456789_123456789_123456789_123456789_123456789 $"='';@s=/./gs;@s=map"@s[($_<$s?0:$_-$s)..$_-1]",1..$s+$#s;

      Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

(MeowChow) Re: (Golf) Character windows
by MeowChow (Vicar) on Aug 24, 2001 at 22:01 UTC
    Messy in the extreme. Without really looking, I came up with several nearly identical solutions:
    57 chars: $t=$_;@s=(map({substr$t,0,$_}1..$s-1),/(?=(.{1,$s}))/gs); 57 chars, string in $t: @s=map{substr$t,$_>0&&$_,$_<0?$s+$_:$s}1-$s..-1+length$t; 55 chars, string in $t: @s=(map({substr$t,0,$_}1..$s-1),$t=~/(?=(.{1,$s}))/gs); 55 chars: @s=/./gs;@s=map{join'',@s[$_>0&&$_..$_+$s]}- --$s..$#s;
    update: do we have to initialize @s, and can we assume use re 'eval' is in effect? :)
    52 chars: for$i(1..$s){push@s,/(?(?{$i<$s})^)(?=(.{1,$i}))/sg}
    update2: A failed attempt at code reuse...
    55 chars, string in $t: @s=map{eval'$t=~/(?=(.{1,$_}))/s'.($_<$s?'':"g")}1..$s;
                   s aamecha.s a..a\u$&owag.print
Re: (Golf) Character windows
by chipmunk (Parson) on Aug 25, 2001 at 01:26 UTC
    51 characters: $S=$_;map($S=~/^.{$_}/sg,1..$s-1),/(?=(.{1,$s}))/sg
    P.S. Oh, you didn't say how the results should be returned... I guess I'm supposed to assign to an array, so this solution is actually 54 characters.

    I'm used to the golf challenges where you define a sub that returns the desired results.

      This is alot like my first solution, but it doesn't properly deal with windows that are larger than the string length... and it's actually 57 chars - you need to add two more characters to parenthesize the comma'd rvalue, and add a semicolon (since it's not in a sub).
                     s aamecha.s a..a\u$&owag.print
      56 chars satisfy all the reqs (I think):

Re: (Golf) Character windows
by dfog (Scribe) on Aug 28, 2001 at 03:29 UTC
    Here is my solution of 44 chars.
    #23456789_123456789_123456789_123456789_1234 map{$y=$_;map{$s[$_].=$y}$c..$s+$c++}/(.)/g;
    It still has the problem of repeating the entire phrase if the window ($s) is larger than the length of the string. In order to correct that, I came up with a 65 character solution of
    #23456789_123456789_123456789_123456789_123456789_123456789_12345 @t=/(.)/g;$s=@t<$s?@t:$s;map{$y=$_;map{$f[$_].=$y}$c..$s+$c++}@t;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://107557]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2018-03-18 10:20 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (229 votes). Check out past polls.