Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: Efficient code.

by holli (Monsignor)
on Jan 30, 2006 at 17:34 UTC ( #526498=note: print w/replies, xml ) Need Help??

in reply to Efficient code.

A group of 400 alternations is something Perl's regex engine can surely handle. Remember to quotemeta your lines, when it's possible they contain regex relevant chars (braces for example). Then you can have a look at Regex::Assemble which will do the regex construction and optimize the result.

Personally I'd go for a hashed based solution and simply use the lines from "a" as keys for a hash to look up the contents of "b". Alternatively you can sort the lines and use a binary search algorithm.

holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: Efficient code.
by Aristotle (Chancellor) on Jan 30, 2006 at 17:55 UTC

    400 alternations is manageable, but don’t forget the match is executed 50,000 times.

    Regexp::Assemble helps by reducing the necessary backtracking, but don’t forget that Regexp::Assemble itself doesn’t work by magic and will need to do a fair amount of string munging. Most likely, that effort is easily amortised during the actual tests, but you don’t know that yet. Then the question that remains is how similar the lines are; if they all differ greatly, then Regexp::Assemble can’t do much.

    Without benchmarks I wouldn’t go on the record with any prediction as to what’s going to be faster.

    And the problem that the code does not match intention closely remains. The hash-based approach will be trivial to get right, and it will be much easier to understand in six months.

    Makeshifts last the longest.

      Juts FYI: In Perl 5.10 alternations not containing regex metachars will not backtrack at all as they are represented by a trie node that can traverse all of the alternations in a single pass. Id say which version of perl contains this but embarrasingly I appear to have forgotten.



        Old geezer moment: sometimes these optimisations make me feel like I did when I was writing assembly code (inner loops of texture mappers slinging pixels… unrolling carefully, rejoicing when a clever trick got your loop down from 11 to 9 cycles per pixel – those were the days) and went from Pentium to PentiumPro – suddenly it was no longer possible to measure how many clock cycles any one instruction took: an innocuous change somewhere up or down the line would sometimes dramatically change the time response of an unrelated instruction, occasionally a loop iteration would randomly take more or less cycles than average, and sometimes instructions would appear to take a fractional amount of cycles. It was like wading through quicksand.

        Of course, there was no way to inspect the result of the PPro’s µ-Op translator. Thank goodness for use re 'debug'; :-)

        Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://526498]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2018-10-21 07:51 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (119 votes). Check out past polls.