### Re^3: Analysis of Regular Expressions

by PetaMem (Priest)
 on Mar 17, 2010 at 16:46 UTC ( #829218=note: print w/replies, xml ) Need Help??

in reply to Re^2: Analysis of Regular Expressions
in thread Analysis of Regular Expressions

Uh.. boys... (JavaFan and LanX) gimme a break...

I mean yes, I got my master in CS 13 years ago and there's no need to remind me of Gödel, the halting problem and similar.

And YES, you are theoretically right. And NO, you are practically (especially JavaFan) way off. There are no countable infinite sets on finite computers, and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable. So your ivory tower arguments are mostly for the dustbin...

I know, that for a Java Fan it might be very hard to adopt the Perl-inherent DWIM scheme when interpreting questions, but why LanX jumps on that bandwaggon is beyond me.

Fotunately I got already some interesting hints in this thread and am working on a SOLUTION. Yes - will be presented for discussion.

Bye
PetaMem
All Perl:   MT, NLP, NLU

Replies are listed 'Best First'.
Re^4: Analysis of Regular Expressions
by LanX (Bishop) on Mar 17, 2010 at 22:18 UTC
> I know, that for a Java Fan it might be very hard to adopt the Perl-inherent DWIM scheme when interpreting questions, but why LanX jumps on that bandwaggon is beyond me.

Come on, keep cool! I just can't resist a good pun! =)

Just think of me as too ignorant as to fully understand your question... ;-)

> and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable. So your ivory tower arguments are mostly for the dustbin...

true, but you only limited the sets of input strings and not the sets of regexes or allowed regex commands.

IIRC Friedl (or perldocs?) describes in his book a regex with only two nested parens and quantifiers which would recursively backtrack for years!

Without a detailed description we have to answer the question in generality, and fast solutions are not really likely!

Cheers Rolf

Re^4: Analysis of Regular Expressions
by JavaFan (Canon) on Mar 18, 2010 at 00:27 UTC
There are no countable infinite sets on finite computers, and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable.
True.

Let's do some arithmetic. Let's take a computer with 1 kB of memory - not unreasonable in 2010. For argument sakes, ignore any other memory, including disk space.

Limiting the problem space to all strings that can be stored on disk, you'd have 2561024 finite strings to work with. 2561024 is a finite number. Let's assume that each yoctosecond (10-24s), you can analyse yotta-strings (1024 strings). In other words, each second, you'd be able to determine of 1048 ≅ 2160 strings whether they match the given regexes. Then you still need about 102418 seconds to examine them all. The age of the universe is estimated to be about 13.75 billion years, which is about 4.4 * 1017 seconds.

So, yeah, there are no countable infinite sets on computers. But I don't think it's really practical to do a brute-force calculation of all the possible FINITE strings a computer can hold.

Re^4: Analysis of Regular Expressions (counterexample of slow regexes)
by LanX (Bishop) on Mar 19, 2010 at 13:14 UTC
OK...

lets go explicit, the following (halting!) counterexamples use fairly short input strings and fairly simple regexes, nevertheless resulting in exploding run times...

```perl -e '
for \$n(10..14) {
\$str = "x"x(\$n*2+1);
\$t=time;
\$str =~ /(x+x+){\$n}y/;
print "\\$str:\$str n:\$n time:",time-\$t,"sec\n"
}'
\$str:xxxxxxxxxxxxxxxxxxxxx n:10 time:0sec
\$str:xxxxxxxxxxxxxxxxxxxxxxx n:11 time:3sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxx n:12 time:9sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxxxx n:13 time:39sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx n:14 time:156sec

I hope it's now obvious that an empiric brute force approach can only (if ever) work with rigid restrictions!

(credits go to ratazong for finding the base principle in this example after some /msging and credits to his unwillingness to post it afterwards ... ;)

Cheers Rolf

UPDATE: improved code...

UPDATE just for fun :)

```perl -e '
for \$n(1..14) {
\$str = "x"x(\$n*3+1);
\$t=time;
\$str =~ /(x+x+x+){\$n}y/;
print "\\$str:\$str n:\$n time:",time-\$t,"sec\n"
}'
\$str:xxxx n:1 time:0sec
\$str:xxxxxxx n:2 time:0sec
\$str:xxxxxxxxxx n:3 time:0sec
\$str:xxxxxxxxxxxxx n:4 time:0sec
\$str:xxxxxxxxxxxxxxxx n:5 time:0sec
\$str:xxxxxxxxxxxxxxxxxxx n:6 time:0sec
\$str:xxxxxxxxxxxxxxxxxxxxxx n:7 time:1sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxx n:8 time:9sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxxxxx n:9 time:73sec
\$str:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx n:10 time:555sec
^C

Create A New User
Node Status?
node history
Node Type: note [id://829218]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2018-04-23 12:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (84 votes). Check out past polls.

Notices?