Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I have solved it but I have passed 3 stages in order to make it run fast. The code has been around 60 lines in all of the stages. I have decided to play on the strong sides of perl and I have used regular expressions.

So my first version took me one and a half hour to write and debug the logic. The code was rewriting regexes and it was pathetically slow. It was running on the set for 5 minutes on my home PC (Pentium 4).

Next version took me another one and an half hour to convert it to use pre-compiled regular expressions. I have used some black magic in order to make exhaustive pattern matching. The interesting line is this :

 qr{^($re1)(?{ push @match, [$-[0], $+[0]] })(*FAIL)};

where $re1 is gigantic alternative (123|2322|...). It was running on the set for 32 seconds - not bad but I was expecting more.

I have read that there is "trie optimization" of the regexes in last version of perl. It take me another 2 hours to make optimizations and to debug why the hell I am not getting trie optimized regexes for the gigantic alternative I was constructing from the dictionary. Experimentally I have found that there is a limit between 11000 and 13000 alternatives so I decided to partition the constructed regex in 10 smaller ones. And this one worked gaining much more speed. It was running on the set of numbers for 2.2 seconds. Now I an happy.

In summary I was developing the code for 5 hours. It consists of 65 lines of code that follows. It runs on my PC (Pentium 4 @ 3GHz) for 2.2 seconds and 1.6 sec on my office PC (Intel Dual Core E2160 @ 1.80GHz) and eats around 22M of ram

#!/usr/bin/perl use strict; my %dic; #load dictionary open DIC,"< dictionary.txt" or die "$!"; while (<DIC>){ chomp; my $w = uc($_); $w =~ s/"//g; if ($w =~ y/EJNQRWXDSYFTAMCIVBKULOPGHZ/01112223334455666777888999/ +){ $w =~ s/[^0-9]//; push @{$dic{$w}},$_; } } close DIC; # Partition the dictionary in 10 regular expressions. # I do not know why the trie regex optimisation is not # supported for very large alternations (> 12000 entries). # Changing ${^RE_TRIE_MAXBUFF} does not help my @rex; my @match; for my $i (0..9){ use re 'eval'; my $re1 = join ('|', grep { /^$i/ } keys %dic); $rex[$i] = qr{^($re1)(?{ push @match, [$-[0], $+[0]] })(*FAIL)}; } my $num; sub print_num { my $n = shift; $n =~ s/\ +/ /g; # remove duplicate spaces $n =~ s/^\ //; # remove leading space $n =~ s/\ $//; # remove trailing space print "$num: $n\n"; } sub process_num { my ($n,$rest,$last) = @_; if (length($rest)==0){ print_num($n); return; } my $i = 0; @match=(); for my $r (@rex){ # match all regexes $rest =~ /$r/; # the result is stored in } # @match global variable my @matches = @match; # and is copied locally for my $x (@matches){ my $match = substr($rest,$x->[0],$x->[1]-$x->[0]); my $tail = substr($rest,$x->[1]); for my $w (@{$dic{$match}}){ # we could have multiple words my $m = $n; # for one number $m =~ s/$match/ $w /; process_num($m,$tail,0); } $i++; } # permit only one digit if there is no match if (($i==0)&&($last==0)) { $rest=~s/.//; process_num($n,$rest,1); } } # Load input numbers while (<>){ chomp; my $n = $_; $n =~ s/[^0-9]//g; # we are interested only in didits if (length($n)){ # if we have number here $num = $_; # num is the original number process_num($n,$n,0); } }

I have learned a little bit more about regular expressions, their optimization and extension.


In reply to Re: One for the weekend: challenge by karavelov
in thread One for the weekend: challenge by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (7)
    As of 2015-07-06 16:35 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (77 votes), past polls