Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Generate random strings from regular expression

by bliako (Abbot)
on Jul 02, 2024 at 10:41 UTC ( [id://11160309]=perlmeditation: print w/replies, xml ) Need Help??

I needed to generate random strings from a regular expression and I have found (*) a C++ library (regxstring C++ library by daidodo) which does just that and works fine for my use case. It claims that it supports most Perl 5 regexp syntax. It is quite fast too. So I have ported it into a Perl module, here : String::Random::Regexp::regxstring.

perl -MString::Random::Regexp::regxstring -e 'print @{generate_random_ +strings(q{^\d{3}[-,.][A-V][a-z]{3}\d{2}})};'
use String::Random::Regexp::regxstring; my $strings = generate_random_strings('^[a-c]{2}[-,.]\d{3}-[A-Z]{2}$', + 10); print "@$strings\n"

The XS and C++ code bridging Perl to C++ library are very simple and can serve as an example for doing that for other libraries. Go forth and multiply.

(*) Via this old thread I found Regexp::Genex and there is also String::Random. Neither worked for my use case.

bw, bliako

Replies are listed 'Best First'.
Re: Generate random strings from regular expression
by tybalt89 (Monsignor) on Jul 02, 2024 at 21:00 UTC

    This was fun.
    At least it passes all your test cases :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11160313 use warnings; use List::AllUtils qw( sample ); $SIG{__WARN__} = sub { die @_ }; for my $regex ( split /\n/, <<'END' ) \d{3}[-,.][A-V][a-z]{3}\d{2} [a-c]{2}[-,.]\d{3}-[A-Z]{2} abc*de? a?b?c?d?e? (ab?c){3} END { print "\nThe Regex: $regex\n\n"; my $strings = generate_random_strings( $regex, 10 ); print "@$strings\n"; } sub generate_random_strings { my ($regex, $count) = @_; eval { my $tree = parse( $regex ); # print $tree->show, "\n"; # NOTE uncomment to see parse tree return [ map $tree->run, 1 .. $count ]; } or print " $@\n"; } sub CHAR::run { $_[0][0] } sub ONEOF::run { sample 1, @{$_[0]} } sub SEQ::run { join '', map $_->run, $_[0]->@* } sub REPEAT::run { join '', map $_[0][2]->run, 1 .. sample 1, $_[0][0] .. $_[0][1]; } sub CHAR::show { "char @{$_[0]}\n" } sub ONEOF::show { "oneof @{$_[0]}\n" } sub SEQ::show { $_[0][0]->show . $_[0][1]->show } sub REPEAT::show { my ($min, $max, $body) = $_[0]->@*; "repeat from $min to $max times\n" . $body->show =~ s/^/ /gmr } sub node { bless [ @_[1..$#_] ], $_[0] } sub error { die "ERROR: @_ \n" } sub want { /\G$_[1]/gc ? shift : error pop } sub parse { local $_ = shift; my $tree = expr(); pos($_) == length or error "incomplete parse"; return $tree; } sub withranges { split //, shift =~ s/(.)-(.)/ join '', map chr, ord($1) .. ord($2) / +gesr; } sub expr { my $tree = /\G\\d/gc ? node ONEOF => 0 .. 9 : /\G\\w/gc ? node ONEOF => withranges '0-9A-Za-z_' : /\G\[(.+?)\]/gc ? node ONEOF => withranges $1 : /\G([- !"#\$%&',.\/:;<=>\@^_`~0-9A-Za-z])/gc ? node CHAR => "$1" : /\G\((?:\?:)?/gc ? want expr(), qr/\)/, 'missing right paren' : error 'operand expected'; $tree = /\G\{(\d+)\}/gc ? node REPEAT => "$1", "$1", $tree : /\G\{(\d+),(\d)\}/gc ? node REPEAT => "$1", "$2", $tree : /\G\?/gc ? node REPEAT => 0, 1, $tree : /\G\*/gc ? node REPEAT => 0, 5, $tree : /\G\+/gc ? node REPEAT => 1, 5, $tree : /\G(?=[^)])/gc ? node SEQ => $tree, expr() : return $tree while 1; }

    Outputs:

    The Regex: \d{3}[-,.][A-V][a-z]{3}\d{2} 175-Mucn14 959-Qznv29 167.Rrfx39 554-Ibhv74 095.Pcwu22 659-Knno96 438. +Qiou66 730.Cirx72 201.Rcmm97 823,Cade99 The Regex: [a-c]{2}[-,.]\d{3}-[A-Z]{2} cc-249-NP ab-910-DK ab,430-GE bb-928-YG cb,283-BN ba.476-RJ cb-026-PC +cb-799-FW ba-301-TE bc-159-EP The Regex: abc*de? abcccde abccd abcccde abccccd abcccccd abccde abccccd abccd abd abcccc +cd The Regex: a?b?c?d?e? d abde ad abc a bcde abd ce de be The Regex: (ab?c){3} abcabcac abcacabc abcabcac acacac acacabc abcacac acacac abcacabc abca +cac abcabcac

      cool. thanks! this is a treasure trove of cryptic goodies.

Re: Generate random strings from regular expression
by LanX (Saint) on Jul 02, 2024 at 13:54 UTC
    Very cool!

    May I meditate about another approach? :)

    I'm a big fan of introspection, if it was possible to catch the output of

    use re 'debug'; qr(^\d{3}[-,.][A-V][a-z]{3}\d{2});
    One gets (among other things)
    Final program: 1: SBOL /^/ (2) 2: CURLY{3,3} (5) 4: POSIXU[\d] (0) 5: ANYOFR[,\-.] (7) 7: ANYOFR[A-V] (9) 9: CURLY{3,3} (12) 11: POSIXA[:lower:] (0) 12: CURLY{2,2} (15) 14: POSIXU[\d] (0) 15: END (0)

    Now imagine a parser translating this to a DSL which generates your output... :)

    Edit
    Trouble is, according to re , that the debug output's format isn't guaranteed to be stable :/

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

Re: Generate random strings from regular expression
by NERDVANA (Priest) on Jul 06, 2024 at 00:44 UTC
    (*) Via this old thread I found Regexp::Genex and there is also String::Random. Neither worked for my use case.
    How about Mock::Data::Regex?
    perl -MMock::Data::Regex -E ' say Mock::Data::Regex->new( qr/^[a-c]{2}[-,.]\d{3}-[A-Z]{2}$/a )->generate;' bb-157-JA
    Or is the real use case more complicated?

      My searches failed to find Mock::Data::Regex. This module works faultlessly for my use-case. Except that I wasted some time on finding out about the qr//a requirement. Without it, even \d may return a unicode numeral! That's what I call being hit by the unicode bag on the head (re: Re^5: Unicode infinity) hehehe.

      Added this solution to the Re: Regexp generating strings?.

        finding out about the qr//a requirement.

        There's also the max_codepoint option.

        Without it, even \d may return a unicode numeral!

        But! that's because your \d also matches unicode numerals, which might be one of the things you need to test when validating your code works as expected for all possible inputs. People can get a false sense of security by comparing a html form input against /^\d+$/. Likewise, if you end a regex with $ instead of \Z your regex will tolerate a trailing \n, so that is one of the things this module can generate.

Re: Generate random strings from regular expression
by cavac (Prior) on Jul 05, 2024 at 12:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11160309]
Approved by marto
Front-paged by hippo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2025-06-21 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.