Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: improving speed in ngrams algorithm

by tybalt89 (Monsignor)
on Jun 12, 2019 at 09:02 UTC ( [id://11101271]=note: print w/replies, xml ) Need Help??


in reply to improving speed in ngrams algorithm

Benchmarking left to someone who cares :)

#!/usr/bin/perl # https://perlmonks.org/?node_id=11101225 use strict; use warnings; my $sentence = "this is the text to play with"; my $ngramWindow_MIN = 2; my $ngramWindow_MAX = 3; my ($low, $high) = ($ngramWindow_MIN - 1, $ngramWindow_MAX - 1); $sentence =~ /(?<!\S)\S+(?: \S+){$low,$high}?(?!\S)(?{ print "START INDEX: @{[$` =~ tr| || ]} : $&\n" })(*FAIL)/;

Outputs (same lines, slightly different order) :

START INDEX: 0 : this is START INDEX: 0 : this is the START INDEX: 1 : is the START INDEX: 1 : is the text START INDEX: 2 : the text START INDEX: 2 : the text to START INDEX: 3 : text to START INDEX: 3 : text to play START INDEX: 4 : to play START INDEX: 4 : to play with START INDEX: 5 : play with

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11101271]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-23 22:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found