Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Re: Re:{2} Getting impossible things right (behaviour of keys)

by demerphq (Chancellor)
on Oct 24, 2001 at 14:45 UTC ( #121071=note: print w/replies, xml ) Need Help??

in reply to Re: Re:{2} Getting impossible things right (behaviour of keys)
in thread Getting impossible things right (behaviour of keys)

If this is going to be at all robust (umm and work as desired, sorry Blakem) I would change the sort to the following:
my $regex=join '|', map {substr $_,2} sort {$a cmp $b} map {pack "SA*",length($_),quotemeta($_)} keys %su +fdata;
Your code doesnt actually sort the words by length. (Yes I _am_ deliberately storing the length before I quotemeta it.)


Thanks to Amoe I reexamined this and realized I missed an opportunity for lazyness that geeky virtue:

my $regex=join '|', map {substr $_,2} sort map {pack "SA*",length($_),quotemeta($_)} keys %su +fdata;
Although IIRC perl will optimize the first into the second anyway, it does save about 10 chars or so..
Oh also for the curious this is more modern form of the Schwartizian Transform which is a very cool trick. Unfortunately I cant remember the name of this version, nor the link to the excellent document I read about it. Hopefully someone that does will post a reply.

Tilly kindly supplied the link (see replies to this post). However the name I had in mind is the GRT or Guttman Rosler Transform.

DeMerphq / Yves
Have you registered your Name Space?

Replies are listed 'Best First'.
Re (tilly) 6: Getting impossible things right (behaviour of keys)
by tilly (Archbishop) on Oct 24, 2001 at 15:34 UTC
    I think the phrase you want is, "packed default".

    It is discussed in this paper on efficient sorting in Perl.

Re: Re: Re: Re:{2} Getting impossible things right (behaviour of keys)
by blakem (Monsignor) on Oct 24, 2001 at 21:35 UTC
    Ah, but you dont *need* to sort by length... the regex is anchored at the end, so the pattern that matches first from left to right will *already be* the longest match. For instance, look at the following code:
    #!/usr/bin/perl -wT use strict; my $text = 'fedcba'; $text =~ (/(a|ba|cba|dcba)$/); print "Match: $1\n"; =OUTPUT Match: dcba
    It matches on the longest one, even though its at the end of the alternation.... thats because the regex engine works from left to right, and the first one that matches wins. That was the whole point of my post, sorry I wasn't more explicit.


      Yes I see it now. Leftmost longest. I omitted the implication of the $. I should have caught the hint.

      Good one. :-)

      Yves / DeMerphq
      Have you registered your Name Space?

        Well, it was my last post of the night, and I skimped on the explanation so I could get some sleep ;-P

        This thread illustrates why I try to post complete, self-contained scripts. (i.e. sample input via __DATA__, printed output samples, etc) I think much of the confusion could have been avoided, if the original script had a set of expected inputs and outputs.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://121071]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2017-10-22 10:25 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (272 votes). Check out past polls.