Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

check for square-number with a regex

by Ratazong (Monsignor)
on Oct 23, 2009 at 12:57 UTC ( #802896=perlquestion: print w/replies, xml ) Need Help??
Ratazong has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks!

The node The Oldest Plays the Piano inspired me a bit playing with regular expressions on numbers represented by '1'x$nr. Something works fine (see below: checking for "dividable by 5" or the famous prime-check). Unfortunately I don't manage to check if the number is a square - at least not with pure regexes (see my tries below) - and my google-Fu is not good enough to find a hint online.

Can you please enlighten me?
Thanks, Rata
my $nr = 25; my $n = 5-1; print "dividable by 5 :-)\n" if (1 x $nr) =~ /^(1+)\1{$n}$/; + # works print "prime\n" if (1 x $nr) !~ /^(11+)\1+$/; + # works (1 x $nr) !~ /^(1+)(1*)$ (?{ sq($1)}) (0) /x; + # works - but not what I want ... sub sq { print "SQUARE\n" if ((length($_[0]) * length( +$_[0])) == $nr) ; } my $len = 'length($1)-1'; print "square2 :-)\n" if (1 x $nr) =~ /^(1+)\1{$len}$/; + # this won't work print "square3 :-)\n" if (1 x $nr) =~ /^(1+)\1{length($1)-1}$/ +; # this won't work .. but what will?!?!?!

Replies are listed 'Best First'.
Re: check for square-number with a regex
by moritz (Cardinal) on Oct 23, 2009 at 13:18 UTC

    Just an idea, not a solution:

    Every square number can be written as sum of consecutive odd numbers:

    1 = 1 4 = 1 + 3 9 = 1 + 3 + 5 ...

    Maybe you can write a regex that starts by matching one 1, and then always twice more than the previous capture?

    Update: I was thinking along the lines of $str =~ m/^(1)((?{$^N})11)*$/ But that doesn't work, because $^N always gets the value of the previous capture, never that of the last match of the current capture... maybe one has to make another capture with a look-around and refer to that? don't know if that works somehow...

    Second update

    Now this probably counts as cheating, but using an external variable for tracking the captures make it work:

    use strict; use warnings; use 5.010; for (1..100) { my $str = '1' x $_; my $prev; say $_, ' ', $str =~ m/^(1)(?{$prev=$^N})(?>((??{$prev})11)(?{$pre +v=$^N}))*$/ ? 'yes' : 'no'; }
    Perl 6 - links to (nearly) everything that is Perl 6.
Re: check for square-number with a regex
by Anonymous Monk on Oct 23, 2009 at 14:55 UTC

    If such a regex exists, the regex is a finite automaton. So what it will do is accept strings of length that is a square number.

    Such an automata does not exist.


    It can be proven that no finite automaton can recognize this language. Intuitively, the idea behind the proof is that any such automaton would have to count an arbitrary number of 1's, say m of them and check that m is a perfect square. Not machine with a finite number of states can do this for arbitrarily large m. Although a finite automaton is not up to the job, a Turing machine can be designed to recognize this language.

    Quote was taken from here.

    The following is quoted from here although the article describes a way to build a 4-band Turing machine that accepts square length strings. A set which cannot be accepted by pushdown machines (this is shown in the material on formal languages) is the set of strings whose length is a perfect square. In symbols this is: {a^n | n is a perfect square }.

      When we talk about a regex in the context of Perl, we don't mean Regular Expression in the sense that a computer scientist would use (ie Typ-3 language of the Chomsky hierarchy), but rather "the thing that you but in m/... in Perl".

      These regexes are much more powerful than regular expressions, in fact they can do things that even Typ-2 langauges (context-free languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.

      Update: fixed Chomsky classification, thanks jakobi

      Perl 6 - links to (nearly) everything that is Perl 6.
        These regexes are much more powerful than regular expressions, in fact they can do things that even Typ-1 langauges (context-free languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.
        In a sense they're ‘obviously’ Turing-complete, since they can contain embedded Perl code; but I assume that's not what you meant. What's another example of Perl regexes recognising a language that a CFG can't? (It sounds vaguely like a challenge, but I'm just politely curious.)

      Albeit that argument assumes "classical" regexes. No backrefs like \1 nor any of the 'comfort' in egrep, PCRE, Perl or - even worse - Perl's (?{}), all of which violate Chomsky-3 AFAIR. Then again even Chomsky-3 can be surprising powerful compared to other 'machines' - just limit the others' allowed tape and/or word size... .

      (Moritz: small typo: DFA's ch-3, with ch-0 being TM)

      Wasn't there some recent discussion or article on Perl6 extensions to regexes (or just on Parsing Expression Grammars/PEG; LUA version maybe?) that contained some comparison to the Chomsky Hierarchy?

      Note that it's only ‘pure’ regexes (for which Ratazong did indicate a preference) that are equivalent to finite automata. I know that back-references make regexes essentially push-down automata. Are Perl's regexes still ‘only’ that powerful? I guess that they're not, really, because they can execute embedded code; but that's not playing fair.
        Backreferences are neither enough, nor necessary to make up the difference between languages that are recognized by FAs ("regular expressions") and those that are recognized by PDAs ("context-free grammars").

        As I pointed out, a classical context-free language that isn't a regular expression is anbn. You cannot match that with 5.8 regular expressions (assuming no jumping back to the interpreter using (?{}) or (??{})), proving that backreferences aren't enough. But in 5.10, you can write /^(a(?1)b)$/, which doesn't use backreferences. It does use recursion, but recursion is the difference between an FA and a PDA.

        Now, I think that due to backreferences, Perl regular expressions are more powerful than context-free grammars, but I cannot think of a language that is matched by a Perl regular expression, and not by a context-free grammar. And I don't think Perl regular expressions are as powerful as context sensitive grammars (which requires a TM). For instance, I don't think there's a Perl regular expression to match anbncn.

      But Perl regexes have backreferences. Which makes them more powerful than finite automata. Note that your first quote could also be used to prove you cannot accept strings that are a composite number. But /^(..+)\1+$/ proves that you can.

      Note also that Perl 5.10 regexes have named rules, allowing recursion. Which means, IMO, that Perl regexes are at least as powerful as PDA (push down automata). Which means they should be able to accept any context insensitive language.

      The classical example is that with a "regular expression" you cannot match strings for the form anbn, but that you need a PDA. /^(a(?1)?b)$/ matches such strings.

      I forgot to mention another source that gives as exercise to prove that {a^n| with n a square} is not recognizable as a language by a pushdown automaton.

      You might be right about Perl regexes being more powerful. I honestly don't know so I'll call it a day.

Re: check for square-number with a regex
by spx2 (Deacon) on Oct 24, 2009 at 15:31 UTC
    UPDATE: this solution is wrong(I would delete it but I can't :)

    I have thought a bit more on this problem and was able to come up with a partial solution.

    I would like to ask for some help in order to generalize it.

    (It seems that I can't generalize it because I don't yet know well enough regexes in Perl)

    MAIN IDEA: if n is a perfect square then n = p12kp22kp32k...pt2k(as you can see t is the number of prime divisors of n)

    Now, for perfect squares whos prime divisors are only 2 and 3 then n=22k32k


    sub test { print $_[0] =~ $_[1] ? "matched\n" : "not matched\n"; } $p2 = qr/(1{4})*/; $p3 = qr/(1{9})*/; $issquare = qr/^(1{4})*(1{9})*$/; #for square numbers of form 2^2k * 3^2k test(('1'x36), $issquare);#2^2*3^2 <-- perfect square test(('1'x37), $issquare);#2^2*3^2+1 <-- not perfect square test(('1'x324), $issquare);#2^2*3^4 <-- perfect square test(('1'x325), $issquare);#2^2*3^4+1 <-- not perfect square test(('1'x9216), $issquare);#2^10*3^4 <-- perfect square test(('1'x9217), $issquare);#2^10*3^4+1 <-- not perfect square

    The last test takes quite a bit and from now I can tell this is not a very efficient way to test for perfect squares(but this was pretty obvious from the very start, using a regex to check arithmetical properties of a number is not a problem where efficiency is desired, or is it ?)

    RATAZONG's prime regex

    The OP has provided a regex to check if a number is composite /^(11+)\1+$/

    and this is a nice regex that works. However, in order to get a regex to check for primes with this we need to negate the regex(I failed while trying to use zero-width look-ahead look-behind negative in order to negate this,so if anyone has any suggestions or ideas on how to negate this ?). But to continue to a final solution let's suppose such a regex for checking if a string of 1s is of prime length and let's denote it $prime.


    A general solution to the problem could be something like this

    /^( ((?<prime2>$prime)\k<prime2>)* )*$/

    I've used named backreferences because if I used the numbered there would be some confusions between the current captures and the ones withing the $prime regex.

    Again, it's still unclear to me(or I don't remember) if + or * mean repeat the same sequence multiple times or repeat a sequence that matches the sub-expression multiple times(that's because I'm still a noob).

      I failed while trying to use zero-width look-ahead look-behind negative in order to negate this,so if anyone has any suggestions or ideas on how to negate this
      I think a general negation of /PATTERN/ is something like /(?:(PATTERN)(*COMMIT)(*FAIL))|$/.

      this solution is wrong because of two reasons:

      • the powers for each prime divisors are not necessarily the same
      • I am describing the numbers of the form p*22+q*32 instead of 22k32k

      any other arguments in the post are flawed because of this.and mainly the fact that an arbitrary power can't really be expressed by the regex(except for hardcoded values like

      (1{4}){9}/ means 36 but that's far from what is needed here )

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://802896]
Approved by planetscape
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2017-02-28 01:37 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (394 votes). Check out past polls.