Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I have solved it but I have passed 3 stages in order to make it run fast. The code has been around 60 lines in all of the stages. I have decided to play on the strong sides of perl and I have used regular expressions.

So my first version took me one and a half hour to write and debug the logic. The code was rewriting regexes and it was pathetically slow. It was running on the set for 5 minutes on my home PC (Pentium 4).

Next version took me another one and an half hour to convert it to use pre-compiled regular expressions. I have used some black magic in order to make exhaustive pattern matching. The interesting line is this :

 qr{^($re1)(?{ push @match, [$-[0], $+[0]] })(*FAIL)};

where $re1 is gigantic alternative (123|2322|...). It was running on the set for 32 seconds - not bad but I was expecting more.

I have read that there is "trie optimization" of the regexes in last version of perl. It take me another 2 hours to make optimizations and to debug why the hell I am not getting trie optimized regexes for the gigantic alternative I was constructing from the dictionary. Experimentally I have found that there is a limit between 11000 and 13000 alternatives so I decided to partition the constructed regex in 10 smaller ones. And this one worked gaining much more speed. It was running on the set of numbers for 2.2 seconds. Now I an happy.

In summary I was developing the code for 5 hours. It consists of 65 lines of code that follows. It runs on my PC (Pentium 4 @ 3GHz) for 2.2 seconds and 1.6 sec on my office PC (Intel Dual Core E2160 @ 1.80GHz) and eats around 22M of ram

#!/usr/bin/perl use strict; my %dic; #load dictionary open DIC,"< dictionary.txt" or die "$!"; while (<DIC>){ chomp; my $w = uc($_); $w =~ s/"//g; if ($w =~ y/EJNQRWXDSYFTAMCIVBKULOPGHZ/01112223334455666777888999/ +){ $w =~ s/[^0-9]//; push @{$dic{$w}},$_; } } close DIC; # Partition the dictionary in 10 regular expressions. # I do not know why the trie regex optimisation is not # supported for very large alternations (> 12000 entries). # Changing ${^RE_TRIE_MAXBUFF} does not help my @rex; my @match; for my $i (0..9){ use re 'eval'; my $re1 = join ('|', grep { /^$i/ } keys %dic); $rex[$i] = qr{^($re1)(?{ push @match, [$-[0], $+[0]] })(*FAIL)}; } my $num; sub print_num { my $n = shift; $n =~ s/\ +/ /g; # remove duplicate spaces $n =~ s/^\ //; # remove leading space $n =~ s/\ $//; # remove trailing space print "$num: $n\n"; } sub process_num { my ($n,$rest,$last) = @_; if (length($rest)==0){ print_num($n); return; } my $i = 0; @match=(); for my $r (@rex){ # match all regexes $rest =~ /$r/; # the result is stored in } # @match global variable my @matches = @match; # and is copied locally for my $x (@matches){ my $match = substr($rest,$x->[0],$x->[1]-$x->[0]); my $tail = substr($rest,$x->[1]); for my $w (@{$dic{$match}}){ # we could have multiple words my $m = $n; # for one number $m =~ s/$match/ $w /; process_num($m,$tail,0); } $i++; } # permit only one digit if there is no match if (($i==0)&&($last==0)) { $rest=~s/.//; process_num($n,$rest,1); } } # Load input numbers while (<>){ chomp; my $n = $_; $n =~ s/[^0-9]//g; # we are interested only in didits if (length($n)){ # if we have number here $num = $_; # num is the original number process_num($n,$n,0); } }

I have learned a little bit more about regular expressions, their optimization and extension.

In reply to Re: One for the weekend: challenge by karavelov
in thread One for the weekend: challenge by BrowserUk

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (7)
    As of 2018-06-25 08:21 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (126 votes). Check out past polls.