Re: check for squarenumber with a regex
by moritz (Cardinal) on Oct 23, 2009 at 13:18 UTC

Just an idea, not a solution:
Every square number can be written as sum of consecutive odd numbers:
1 = 1
4 = 1 + 3
9 = 1 + 3 + 5
...
Maybe you can write a regex that starts by matching one 1, and then always twice more than the previous capture?
Update: I was thinking along the lines of
$str =~ m/^(1)((?{$^N})11)*$/
But that doesn't work, because $^N always gets the value of the previous capture, never that of the last match of the current capture... maybe one has to make another capture with a lookaround and refer to that? don't know if that works somehow...
Second update
Now this probably counts as cheating, but using an external variable for tracking the captures make it work:
use strict;
use warnings;
use 5.010;
for (1..100) {
my $str = '1' x $_;
my $prev;
say $_, ' ', $str =~ m/^(1)(?{$prev=$^N})(?>((??{$prev})11)(?{$pre
+v=$^N}))*$/ ? 'yes' : 'no';
}
Perl 6  links to (nearly) everything that is Perl 6.
 [reply] [d/l] [select] 
Re: check for squarenumber with a regex
by Anonymous Monk on Oct 23, 2009 at 14:55 UTC

If such a regex exists, the regex is a finite automaton.
So what it will do is accept strings of length that is a square number.
Such an automata does not exist.
Quote:
It can be proven that no finite automaton can recognize this language.
Intuitively, the idea behind the proof is that any such automaton would have to count an arbitrary number of 1's, say m of them and check that m is a perfect square. Not machine with a finite number of
states can do this for arbitrarily large m. Although a finite automaton is not up to the job, a Turing machine can be designed to recognize this language.
Quote was taken from here.
The following is quoted from here although the article describes a way to build a 4band Turing machine that accepts square length strings.
A set which cannot be accepted by pushdown machines (this is shown in the material on formal languages) is the set of strings whose length is a perfect square. In symbols this
is: {a^n  n is a perfect square }.
 [reply] 

When we talk about a regex in the context of Perl, we don't mean Regular Expression in the sense that a computer scientist would use (ie Typ3 language of the Chomsky hierarchy), but rather "the thing that you but in m/... in Perl".
These regexes are much more powerful than regular expressions, in fact they can do things that even Typ2 langauges (contextfree languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.
Update: fixed Chomsky classification, thanks jakobi
Perl 6  links to (nearly) everything that is Perl 6.
 [reply] [d/l] 

These regexes are much more powerful than regular expressions, in fact they can do things that even Typ1 langauges (contextfree languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.
In a sense they're ‘obviously’ Turingcomplete, since they can contain embedded Perl code; but I assume that's not what you meant. What's another example of Perl regexes recognising a language that a CFG can't? (It sounds vaguely like a challenge, but I'm just politely curious.)
 [reply] 



Albeit that argument assumes "classical" regexes. No backrefs like \1 nor any of the 'comfort' in egrep, PCRE, Perl or  even worse  Perl's (?{}), all of which violate Chomsky3 AFAIR. Then again even Chomsky3 can be surprising powerful compared to other 'machines'  just limit the others' allowed tape and/or word size... .
(Moritz: small typo: DFA's ch3, with ch0 being TM)
Wasn't there some recent discussion or article on Perl6 extensions to regexes (or just on Parsing Expression Grammars/PEG; LUA version maybe?) that contained some comparison to the Chomsky Hierarchy?
 [reply] 

Note that it's only ‘pure’ regexes (for which Ratazong did indicate a preference) that are equivalent to finite automata. I know that backreferences make regexes essentially pushdown automata. Are Perl's regexes still ‘only’ that powerful? I guess that they're not, really, because they can execute embedded code; but that's not playing fair.
 [reply] 

Backreferences are neither enough, nor necessary to make up the difference between languages that are recognized by FAs ("regular expressions") and those that are recognized by PDAs ("contextfree grammars").
As I pointed out, a classical contextfree language that isn't a regular expression is a^{n}b^{n}. You cannot match that with 5.8 regular expressions (assuming no jumping back to the interpreter using (?{}) or (??{})), proving that backreferences aren't enough. But in 5.10, you can write /^(a(?1)b)$/, which doesn't use backreferences. It does use recursion, but recursion is the difference between an FA and a PDA.
Now, I think that due to backreferences, Perl regular expressions are more powerful than contextfree grammars, but I cannot think of a language that is matched by a Perl regular expression, and not by a contextfree grammar. And I don't think Perl regular expressions are as powerful as context sensitive grammars (which requires a TM). For instance, I don't think there's a Perl regular expression to match a^{n}b^{n}c^{n}.
 [reply] [d/l] 



But Perl regexes have backreferences. Which makes them more powerful than finite automata. Note that your first quote could also be used to prove you cannot accept strings that are a composite number. But /^(..+)\1+$/ proves that you can.
Note also that Perl 5.10 regexes have named rules, allowing recursion. Which means, IMO, that Perl regexes are at least as powerful as PDA (push down automata). Which means they should be able to accept any context insensitive language.
The classical example is that with a "regular expression" you cannot match strings for the form a^{n}b^{n}, but that you need a PDA. /^(a(?1)?b)$/ matches such strings.
 [reply] [d/l] [select] 

 [reply] 
Re: check for squarenumber with a regex
by spx2 (Deacon) on Oct 24, 2009 at 15:31 UTC

UPDATE: this solution is wrong(I would delete it but I can't :)
I have thought a bit more on this problem and was able to come up with a partial solution.
I would like to ask for some help in order to generalize it.
(It seems that I can't generalize it because I don't yet know well enough regexes in Perl)
MAIN IDEA: if n is a perfect square then n = p1^{2k}p2^{2k}p3^{2k}...pt^{2k}(as you can see t is the number of prime divisors of n)
Now, for perfect squares whos prime divisors are only 2 and 3 then n=2^{2k}3^{2k}
CODE:
sub test {
print
$_[0] =~ $_[1]
? "matched\n"
: "not matched\n";
}
$p2 = qr/(1{4})*/;
$p3 = qr/(1{9})*/;
$issquare = qr/^(1{4})*(1{9})*$/;
#for square numbers of form 2^2k * 3^2k
test(('1'x36), $issquare);#2^2*3^2 < perfect square
test(('1'x37), $issquare);#2^2*3^2+1 < not perfect square
test(('1'x324), $issquare);#2^2*3^4 < perfect square
test(('1'x325), $issquare);#2^2*3^4+1 < not perfect square
test(('1'x9216), $issquare);#2^10*3^4 < perfect square
test(('1'x9217), $issquare);#2^10*3^4+1 < not perfect square
The last test takes quite a bit and from now I can tell this is not a very efficient way to test for perfect squares(but this was pretty obvious from the very start, using a regex to check arithmetical properties of a number is not a problem where efficiency is desired, or is it ?)
RATAZONG's prime regex
The OP has provided a regex to check if a number is composite /^(11+)\1+$/ and this is a nice regex that works. However, in order to get a regex to check for primes with this we need to negate the regex(I failed while trying to use zerowidth lookahead lookbehind negative in order to negate this,so if anyone has any suggestions or ideas on how to negate this ?).
But to continue to a final solution let's suppose such a regex for checking if a string of 1s is of prime length and let's denote it $prime.
(PROBABLY) A GENERAL SOLUTION
A general solution to the problem could be something like this
/^( ((?<prime2>$prime)\k<prime2>)* )*$/
I've used named backreferences because if I used the numbered there would be some confusions between the current captures and the ones withing the $prime regex.
Again, it's still unclear to me(or I don't remember) if + or * mean repeat the same sequence multiple times or repeat a sequence that matches the subexpression multiple times(that's because I'm still a noob).
 [reply] [d/l] [select] 

I failed while trying to use zerowidth lookahead lookbehind negative in order to negate this,so if anyone has any suggestions or ideas on how to negate this
I think a general negation of /PATTERN/ is something like /(?:(PATTERN)(*COMMIT)(*FAIL))$/.
 [reply] [d/l] [select] 

this solution is wrong because of two reasons:
 the powers for each prime divisors are not necessarily the same
 I am describing the numbers of the form p*2^{2}+q*3^{2} instead of 2^{2k}3^{2k}
any other arguments in the post are flawed because of this.and mainly the fact that an arbitrary power can't really be expressed by the regex(except for hardcoded values like (1{4}){9}/ means 36 but that's far from what is needed here )
 [reply] [d/l] 