Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: match sequences of words based on number of characters

by frozenwithjoy (Priest)
on Feb 17, 2013 at 21:37 UTC ( #1019219=note: print w/replies, xml ) Need Help??


in reply to match sequences of words based on number of characters

Like this?
perl -E ' my $string = "xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq"; my @array = $string =~ /\b(\w{2})\b.+?\b(\w{4})\b.+?\b(\w{3})\b/g; say "@array"; ' yy xxxx qqq yy xxxx qqq
Edit: here is an approach that lets you auto-customize the regex.
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my $regex = build_regex( 2, 4, 3 ); say "Regex: $regex"; my $string = "xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq"; my @match = $string =~ /$regex/g; say "Match: @match"; sub build_regex { my ( $first, @others ) = @_; my $regex = qr{\b(\w{$first})\b}; $regex .= qr{.+?\b(\w{$_})\b} for @others; return $regex; } __END__ Regex: (?^:\b(\w{2})\b)(?^:.+?\b(\w{4})\b)(?^:.+?\b(\w{3})\b) Match: yy xxxx qqq yy xxxx qqq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019219]
help
Chatterbox?
[LanX]: two utf8 strings from different sources are base64 encoded, but after joining both the umlauts in teh second get deleted
[Corion]: LanX: You can't just join two base64 strings together
[LanX]: (not a high priority bug because I can use some HTML entities in the second string)
[Corion]: base64 is padded to a multiple of 4 chars (or something)
[LanX]: misunderstanding, I joined them before converting to base64
[Corion]: Also, I would be wary of encodings and try to make really sure that both input strings are UTF-8. Maybe join the input strings from one source together to see whether they decode as bad or not
[Corion]: LanX: Then the problem should persist without encoding to base64 too ;)
[LanX]: I think it's a flag problem ... I'll produce a reprodocable example for SOPW
[Corion]: "flag problem" to me sounds like "contains UTF-8 bytes but was never properly decoded to an UTF-8 string"

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2017-01-16 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (150 votes). Check out past polls.