Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

The Monastery Gates

( #131=superdoc: print w/replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
Converting utf-8 to base64 and back
2 direct replies — Read more / Contribute
by LanX
on Jan 21, 2017 at 18:21
    The following code produces the original utf-8 string after converting to base64.

    But it seems overly complicated. What am I missing ?

    Additionally: I can probably understand that I need to do encode_utf8 step, but why do I need to set the utf8 flag manually after explicitly encoding to utf8?

    use strict; use warnings; use utf8 ; use Encode; use Data::Dump qw/dd pp/; use Devel::Peek; use MIME::Base64 ; my $str =''; warn "\n*** orig str='$str'"; Dump $str; our $encoded = MIME::Base64::encode_base64($str); warn "\n*** encode_b64 encoded='$encoded'"; Dump $encoded; warn "\n*** decode_b64"; our $decoded = MIME::Base64::decode_base64($encoded); Dump $decoded; warn "\n*** encode_utf8"; $decoded = Encode::encode_utf8($decoded); Dump $decoded; warn "\n*** _utf8_on"; Encode::_utf8_on($decoded); Dump $decoded;

    *** orig str='' at c:/tmp/ line 12. SV = PV(0x54c9d8) at 0x5a8b00 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x5471d8 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16 *** encode_b64 encoded='xA== ' at c:/tmp/ line 17. SV = PV(0x54caa8) at 0x26dbcb8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x547478 "xA==\n"\0 CUR = 5 LEN = 16 *** decode_b64 at c:/tmp/ line 21. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x28d2b98 "\304"\0 CUR = 1 LEN = 16 *** encode_utf8 at c:/tmp/ line 25. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x28d2c28 "\303\204"\0 CUR = 2 LEN = 16 *** _utf8_on at c:/tmp/ line 29. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x28d2c28 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Is this a sane/safe way to pass an aref into a C function?
3 direct replies — Read more / Contribute
by stevieb
on Jan 21, 2017 at 13:54

    Given that weekends are typically slower than through the week, I thought I'd ask for a quick code review of those who understand the perl internals. This is my real first attempt at doing anything regarding manipulating and managing perl data types in C/XS, and am just wondering if what I've got below is reasonable, sane and safe, or if there are better ways to do it.

    Instead of passing an integer into the C function then doing a bunch of bit shifting to get out the required number of bytes, I wanted to pass in an array reference so that the number of bytes can be dynamic (otherwise with an int, I'm limited to a maximum of four bytes in a call to testing().

    use warnings; use strict; use Inline 'C'; my $channel = 0; my @bytes = (0x00, 0x01, 0x02, 0x03); testing($channel, \@bytes, 4); __END__ __C__ int testing(int channel, SV* byte_ref, int len){ if (channel != 0 && channel != 1){ croak("channel param must be 0 or 1\n"); } if (! SvROK(byte_ref) || SvTYPE(SvRV(byte_ref)) != SVt_PVAV){ croak("not an aref\n"); } AV* bytes = (AV*)SvRV(byte_ref); int num_bytes = av_len(bytes) + 1; if (len != num_bytes){ croak("$len param != elem count\n"); } unsigned char buf[num_bytes]; int i; for (i=0; i<len; i++){ SV** elem = av_fetch(bytes, i, 0); buf[i] = (int)SvNV(*elem); } /* * here, I'll be passing the char buffer and len * to an external C function. For display, I'll * just print the elements (the return is from ioctl() * * if ((spiDataRW(channel, buf, len) < 0){ * croak("failed to write to the SPI bus\n"); * } */ int x; for (x=0; x<len; x++){ printf("%d\n", buf[x]); } }
Detecting 'our $foo => 1' mistake?
3 direct replies — Read more / Contribute
by perlancar
on Jan 21, 2017 at 04:58

    Dear monks,

    I just wasted half an hour trying to find a cause why a variable is undefined (and chasing the wrong rabbit of trying to figure out why a particular module won't load but require() doesn't raise an error), before finally realizing that I made a typo:

    our $foo => { a=>'blah', b=>'blah', ... };

    instead of:

    our $foo = { a=>'blah', b=>'blah', ... };

    And come to think of it, this is perhaps my third or fourth time being bitten by this. This mistake is not caught by warnings, so any suggestion on how to detect this? Perhaps there's a Perl::Critic policy somewhere and I need to just (re-)plunge and use Perl::Critic (again).

using waitpid() with signals
2 direct replies — Read more / Contribute
by ristov
on Jan 20, 2017 at 14:59

    I am trying to write the code which would wait for the child process to exit -- but if the code gets the TERM signal while waiting, the child process should be terminated with the same signal. While waiting for the child can be easily done with waitpid(pid, 0), signals do not interrupt waitpid() with EINTR error code. To solve this problem, one could use the following code with non-blocking waitpit():

    $term = 0; $SIG{TERM} = sub { $term = 1; }; for (;;) { $p = waitpid($pid, WNOHANG); # where did the child disappear? if ($p == -1) { exit(1); } # child has terminated if ($p == $pid && (WIFEXITED($?) || WIFSIGNALED($?))) { exit($?>>8); } # TERM has arrived, forward it to child and exit if ($term) { kill('TERM', $pid); exit(0); } # check the child again after 1 second sleep(1); }

    However, I am interested whether the same task can be accomplished with blocking waitpid() which consumes less CPU time. There is one very interesting recipe which involves the use of eval:

    Nevertheless, I am wondering whether it would be OK to use the blocking waitpid() with a different signal handler for the same purpose:

    # waitpid($pid,WNOHANG) returns 0 if the child process # exists and has not terminated $SIG{TERM} = sub { waitpid($pid,WNOHANG) || kill('TERM', $pid); exit(0 +); } # waitpid loop for (;;) { $p = waitpid($pid, 0); # where did the child disappear? if ($p == -1) { exit(1); } # child has terminated if ($p == $pid && (WIFEXITED($?) || WIFSIGNALED($?))) { exit($?>>8); } }

    In the signal handler, waitpid($pid, WNOHANG) is used for checking if the child process exists, in order to avoid sending TERM to a non-existing process. Since I am not too familiar with Perl internals, I am not sure if it is OK if the signal handler is invoked in the middle of blocking waitpid(), in order to call waitpid() again in non-blocking mode from the handler. Can anyone provide some insights? If this approach has flaws, I would go with previous code examples.

    regards, risto
Issues with Array of Hashes
2 direct replies — Read more / Contribute
by ciscomonkey
on Jan 20, 2017 at 10:54
    Having an issue with Text::CSV::Slurp not seeing my array of hashes. Here's some code that generates a hash just as my production script does:
    use Modern::Perl; use Text::CSV::Slurp; use Data::Dumper; sub generate { my %inline; $inline{'total'} = 2; $inline{'items'} = [ { 'name' => 'item1' }, { 'name' => 'item2' } ]; return \%inline; } my $ref = &generate(); print Dumper( $ref->{'items'} ); my $csv = Text::CSV::Slurp->create( input => $ref->{'items'} );
    I have multiple steps doing this same thing with no issue, and a quick Dumper output from those looks to be the exact same as the Dumper output from this code above. e.g. $VAR1 = [ {'key' => 'val' },{ 'key' => 'val' } ]; however it keeps coming back with "Need an an array of hashes input to create CSV from" and I'm a little lost as to why. I've even compared against those that work with the following:
    print "\$ref->{'items'} is " . ref( $ref->{'items'} ) . "\n"; foreach my $item ( @{ $ref->{'items'} } ) { print $item->{'name'} . " is " . ref( $item ) . "\n"; }
    Which yields:
    $ref->{'items'} is ARRAY item1 is HASH item2 is HASH
    And both look the same. I'm hoping I'm just missing something really simple here, but any help is appreciated.
Using an IO::File object stored in a Hash.
2 direct replies — Read more / Contribute
by jjs04
on Jan 20, 2017 at 10:47

    I have found something that I think should work, but does not work as expected. When I use Path::Class to open a file and save its handle to a variable, it works as expected.

    Sample 1:
    use Path::Class; my $filename = "data.txt"; my $handle = file($filename)->open('<:encoding(UTF-8)'); while (<$handle>) { chomp($_); print $_,"\n"; } close($handle);
    Sample 1 Output:
    1 2 3 4
    However, if I save the same in a hash, it does not. Sample 2:
    use Path::Class; my $filename = "data.txt"; my %hash = ( 'handle' => file($filename)->open('<:encoding(UTF-8)') ); while(<$hash{'handle'}>) { chomp($_); print $_,"\n"; } close($hash{'handle'});
    Sample 2 Output:
    What am I missing? I suspect it is something quite simple that I have not experienced before. Thank you for the assistance.


    Edit: As first indicated by BrowserUk, readline should be used. Because the diamond operator is used with readline and glob, there may be some ambiguity as to how the parser should proceed. When a hash element is used within the diamond operator (<$hash{$key}>), it is a glob, causing my difficulties.

    This following sample works as expected. Sample 4:
    use Path::Class; my $filename = "data.txt"; my %hash = ( 'handle' => file($filename)->open('<:encoding(UTF-8)') ); while(readline($hash{'handle'})) { chomp($_); print $_,"\n"; } close($hash{'handle'});
    Sample 4 Output:
    1 2 3 4
Constant code
4 direct replies — Read more / Contribute
by philiprbrenan
on Jan 20, 2017 at 10:26

    When I try:

    #!/usr/bin/perl sub aa{2} sub bb{2} my ($a, $b) = (aa, bb); my ($s, $t) = ((2*aa+bb), (2*$a+$b)); say STDERR "$a==", aa; say STDERR "$b==", bb; say STDERR "$s==$t";

    I get:

    2==2 2==2 4==6

    Please tell me what I am doing wrong, as I expected $s and $t to be equal

    It can't be perl It must be me Yet what it is I cannot see


Requring minimum perl version in a cpanfile
2 direct replies — Read more / Contribute
by Anonymous Monk
on Jan 19, 2017 at 17:12
    I am using cpanfile with my app to try to make it easier to install for non-perl users.

    I added a line in cpanfile specifying the minimum perl version, but I was expecting it to bail if the requirement wasn't met.

    Instead, it only prints a line after it tries to install everything, saying the perl version isn't in the specified range.

    I also added the requires line inside an

    on 'configure' => sub { }
    , but it behaves the same.

    Is there a way to make it bail on the perl version, before it wastes time trying to install a bunch of modules?

Unknown charnames when building Encode
2 direct replies — Read more / Contribute
by yulivee07
on Jan 19, 2017 at 03:23
    Hello fellow Perlmonks, I am trying to build Encode 2.88-3 from CPAN on AIX 7.2. During the make test phase I receive various errors about unknown characters:
    Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/ line 499. Unknown charname 'alpha' at t/Encode.t line 44, within string BEGIN not safe after errors--compilation aborted at t/Encode.t line 14 +8. t/Encode.t ................. t/encoding-locale.t ........ ok Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/ line 459. Unknown charname 'LATIN SMALL LETTER SHARP S' at t/encoding.t line 77, + within string BEGIN not safe after errors--compilation aborted at t/encoding.t line +165. Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/ line 459. Unknown charname 'POUND SIGN' at t/mime-header.t line 166, within stri +ng Execution of t/mime-header.t aborted due to compilation errors. # Looks like your test exited with 2 just after 1. t/mime-header.t ............
    To test whether this is an AIX problem or a perl problem, I tried to build the same version on my linux system, where Encode installs just fine.
    To pick the first error:
    is "\N{alpha}",substr($uni,0,1),"alpha does not map to symbol 'a'";
    It seems AIX perl is unable to find the \N{alpha} character. I am a bit lost here - where does perl usually search for characters like this?
    I need a hint into the direction I have to search for with this problem. Can someone provide some debugging tips?
    perl -E 'use charnames (); say $charnames::VERSION' 1.40 perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"' Wide character in say at -e line 1. &#945; - - perl -E 'use Unicode::UCD; say $Unicode::UCD::VERSION' 0.58 perl -MUnicode::UCD=charprops_all -E 'say charprops_all("U+$_")->{Age} + for qw{3b1 df a3}' "charprops_all" is not exported by the Unicode::UCD module Can't continue after import errors at -e line 0. perl -C -E 'say "\N{greek:alpha}"' Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/ line 459. Use of uninitialized value $txt in pattern match (m//) at /usr/opt/per +l5/lib/5.20.1/ line 499. Unknown charname 'greek:alpha' at -e line 1, within string Execution of -e aborted due to compilation errors.
    The build process is using CPAN (perl -MCPAN -eshell) to install modules. We use local::lib to install to a specific directory, rather than the system perl path. The perl we are using is the one coming with AIX 7.2, so we did not build perl ourself.
    perl -V Summary of my perl5 (revision 5 version 20 subversion 1) configuration +: Platform: osname=aix, osvers=, archname=aix-thread-multi uname='aix blade08 1 6 00003c3ad100 ' config_args='-d -Dprefix=/usr/opt/perl5 -Dcc=xlc_r -Duseshrplib -D +usethreads' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='xlc_r -q32', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX +_SOURCE -qmaxmem=-1 -qnoansialias -qlanglvl=extc99 -DUSE_NATIVE_DLOPE +N -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES', optimize='-O', cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem= +-1 -qnoansialias -qlanglvl=extc99 -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_ +INIT' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld', ldflags =' -brtl -bdynamic -b32' libpth=/lib /usr/lib /usr/ccs/lib libs=-lbind -lnsl -ldbm -ldl -lld -lm -lcrypt -lpthreads -lc perllibs=-lbind -lnsl -ldl -lld -lm -lcrypt -lpthreads -lc libc=, so=a, useshrplib=true, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -bE:/usr/ +opt/perl5/lib/5.20.1/aix-thread-multi/CORE/perl.exp' cccdlflags=' ', lddlflags='-bhalt:4 -G -bI:$(PERL_INC)/perl.exp -b +E:$(BASEEXT).exp -bnoentry -lpthreads -lc -lm' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Built under aix Compiled at Feb 6 2015 14:54:29 %ENV: PERL5LIB="/home/perl_ss/perl5/lib/perl5/aix-thread-multi:/home/per +l_ss/perl5/lib/perl5:/usr/local/lib/site_perl/5.8.8:/usr/local/site_p +erl/common" PERL5OPT="" PERL5_CPANPLUS_IS_RUNNING="9961732" PERL5_CPAN_IS_RUNNING="9961732" PERL_LOCAL_LIB_ROOT="/home/perl_ss/perl5" PERL_MB_OPT="--install_base /home/perl_ss/perl5" PERL_MM_OPT="INSTALL_BASE=/home/perl_ss/perl5" @INC: /home/perl_ss/perl5/lib/perl5/aix-thread-multi /home/perl_ss/perl5/lib/perl5/aix-thread-multi /home/perl_ss/perl5/lib/perl5 /usr/local/lib/site_perl/5.8.8/aix-thread-multi /usr/local/lib/site_perl/5.8.8 /usr/local/site_perl/common /usr/opt/perl5/lib/site_perl/5.20.1/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.20.1 /usr/opt/perl5/lib/5.20.1/aix-thread-multi /usr/opt/perl5/lib/5.20.1 /usr/opt/perl5/lib/site_perl/5.8.8 /usr/opt/perl5/lib/site_perl
mysql_connect_timeout laughs at me
1 direct reply — Read more / Contribute
by bbarnett
on Jan 18, 2017 at 18:42
    Nothing I do, seems to make mysql_connect_timeout actually function.

    I've read endless posts, via Google and otherwise on the subject. I've appended in many forms, mysql_connect_timeout=1 to my connect line.



    and, I've tried adding port=, tried not using identifiers (eg, just db;;3306;mysql_connect_timeout=1) and about 17 other variations.

    In all of these instances, I can hear the deep, evil laugh of mysql_connect_timeout in the depths of my mind.

    I was going to move on to using some form of timeout, but *then* read about 17.3 thousand posts about "don't do that, you'll get a segfault!', or -- 'use this method with perl 5.x or you will feel extreme pain!'.

    I believe I'm running 5.18 of perl, and it's ubuntu. I'll be able to check more directly when at work tomorrow -- that evil laugh, that never ending laugh kept me distracted, and I did not check versions prior to leaving.

    Any immediate suggestions? Ideas? A "why are you doing that, now I'm laughing too!" thoughts?

    Any help mucho appreciated.


Feature Idea: qr//e (updated with solutions)
4 direct replies — Read more / Contribute
by haukex
on Jan 18, 2017 at 08:47

    Dear Monks,

    As I was thinking about this node about dynamically building regexes, I had the idle thought that it might be nice if qr// supported qr//e, analogous to s///e it would eval the inside of the construct before parsing it as a regex. Now this is really just a very minor itch, and I don't yet have any idea of how much sense it makes or how difficult it might be to implement, but I still thought I'd bounce it off of you.

    Thoughts? Maybe the "normal" and/or "hacked" solutions below are good enough, and the effort required to implement qr//e isn't worth it? Other potential problems I haven't noticed yet?

    Update 2017-01-19: TIMTOWTDI has already provided plenty of possible solutions, and I just wasn't feeling creative enough at the moment to see them :-) Thank you very much, LanX, Haarg, and vr! I updated the code with your solutions, and added Test::More and sub testre.


    -- Hauke D

ActiveState new vs old?
4 direct replies — Read more / Contribute
by enrgyxprt
on Jan 18, 2017 at 07:59
    Its been a while since I installed ActiveState perl, perl -V shows I have V5 something... So I updated all using PPM.. Now I am thinking about updating to the newest activestate community version... Any downsides ?
New Meditations
Improve readability of Perl code. Naming reference variables.
6 direct replies — Read more / Contribute
by hakonhagland
on Jan 19, 2017 at 15:05
    Hello Monks!

    I've been learning Perl for some years now. At the same time, moving from writing awk scripts to writing Perl scripts, I have found Perl to be an amazing resource for getting things done.

    Still, I have some minor issues with the language design that I have not yet been able to understand/resolve. This is what I want to discuss here.


    It sometimes bugs me that it is so difficult to write Perl code that is readable (easy to follow) when working with references. For example, if I see a variable $var in the middle of some code, it can be a scalar variable, a scalar reference, an array reference, a hash reference, and so on. Hence, I often end up guessing or having to scan source code nearby in order to determine the type of the variable. I find this workflow less than optimal. Would it not be better if the variable could (optionally) be made self-documenting with respect to reference type?

    In the book Perl Best Practices, the problem is mentioned in another setting, and the solution suggested is to add the suffix _ref to the variable name. So one could write,

    $var_href = { a => 1 };
    to create a hash ref, and
    $var_aref = [ 1, 2, 3];
    to create an array reference.

    However, a problem with this convention could be that the suffix is not optional. You should not be forced to used the more verbose form of the variable name. I think, the programmer should have a choice to decide whether he finds it advantageous to include the suffix at given place or not. For example, when declaring the variable as

    $var = [ 1, 2, 3 ];
    it is rather obvious that it is an array reference, and there is no need to write:
    $var_aref = [ 1, 2, 3 ];
    The latter is in my opinion too verbose. However, if the reference is just defined as
    my $var;
    it would often be better to include the suffix. If there is no indication on the next lines or so whether $var will be used as an array reference or not, it would be more readable to define it as
    my $var_aref;

    A new idea for reference variable naming syntax

    So this lead me to an idea: Could the postfix dereferencing syntax be extended for this use case?

    The Postfix Dereferening Syntax (PDS) was introduced as experimental in 5.20. And starting from 5.24 it is included in the Perl language by default.

    Currently PDS is used for dereferencing:

    my @array = $var->@*;
    Notice that the PDS includes a star after the sigil. It is a syntax error not to include the star. But let's say for the moment that if the star was omitted, the dereferencing was to be simply ignored instead. So
    my $var->@;
    would mean the same as
    my $var;
    and produce no syntax error.

    Let's denote this new syntax by Optional Postfix Reference Declaration Syntax (OPRDS). So when using OPRDS, should it be entirely up to the user to ensure that he used the correct sigil. For example, if I write

    $var->@ = 12;
    when I really meant
    $var->@ = [ 12 ];
    should it produce a compile time error? I think it would be very helpful if the compiler could use OPRDS to check for consistency. But it might be difficult to implement? I do not know. If it is difficult to implement, some alternatives might be used instead? I don't know much of Perl internals, so this is a point where I need help.

    When I started out with this idea, compile time type-checking was not on my mind at all. But I see now that OPRDS would offer the opportunity for stricter type checking.

    But type checking was not the main issue I wanted to discuss. What I would like to discuss is how to deal with reference variable names. Reading and understanding written Perl code can be difficult since the $ sigil can be used for many data types. How could this situation be improved?

RFC: Module for testing asynchronous event series
No replies — Read more | Post response
by Dallaylaen
on Jan 18, 2017 at 16:57

    Let's say we are going to test a module that is supposed to be run asynchronously - using threads, AnyEvent, or Coro, or some other means. And we need to check that certain events happen in certain sequence, because some of them depend on the others.

    Probably the best way to achieve this would be of course to minimize interdependencies and use mathematically correct synchronization for whatever is left. Of course, that is not always achievable, due to limited time.

    So I'm going to propose a primitive that I think should deal with a huge subclass of such tasks.

    The code goes as follows:

    use Test::AsyncSeq; my $id = Test::AsyncSeq->get_sequence_id; my $id2 = Test::AsyncSeq->get_sequence_id( "frobnicate" ); # would be "frobnicate1" or smth # somewhere in threads/callbacks is_after( $id, "start" ); # elsewhere is_after( $id, "event2", "start" ); # more elsewhere is_after( $id, "event3", "start" ); # finally is_after( $id, "finish", "event2", "event3" );

    The is_after( $id, $event, @dependencies ); passes if and only if:

    • sequence named $id was created;
    • $event was not seen in the sequence yet;
    • all of the @dependencies have been seen at the moment of the call.

    The id is just a string, and is required since Perl is not very good at passing blessed references across threads. And multiple tests MAY be needed in the same script, say to catch a race condition.

    Does such interface make sense? Would it be of use to anyone?

[RFC] Building Regex Alternations Dynamically
5 direct replies — Read more / Contribute
by haukex
on Jan 18, 2017 at 07:57

    Dear Monks, this is a suggestion for a tutorial, any comments or suggestions are welcome. Update 1: Fixed up explanation of metacharacters a bit. Update 2: Implemented some things from kcott's comments. Update 3: Added TL;DR, inspired by LanX.

    TL;DR: The two code samples below are working pieces of code that can be copied into your Perl script and adapted for your purposes.

    I thought it might be useful to explain the technique of building regular expressions dynamically from a set of strings. Let's say you have a list of strings, like ("abc", "def", "ghi"), and you want to build a regex that matches any of them, like /(?:abc|def|ghi)/. This also works well with s/search/replacement/ if you have a hash where the keys are the search strings and the values are the replacements, as I'll show below. If you're uncertain on some of the regex concepts used here, like alternations a|b and non-capturing groups (?:...), I recommend perlretut.

    First, the basic code, which I explain below - note the numbering on the lines of code.

    my @values = qw/ a ab. d ef def g|h /; my $regex_str = join '|', # 4. map {quotemeta} # 3. sort { length $b <=> length $a } # 2. @values; # 1. my $regex = qr/$regex_str/; # 5. print "$regex\n"; # 6.
    1. We begin with the list of strings stored in the array @values. This could be any list, such as a literal qw/.../, or return values from functions, including keys or values.
    2. We sort the list so that the longer strings appear first. This is necessary because if our regular expression was /foo|foobar/, then applied to the string "foobarfoofoobar", it would only match "foo" three times, and never "foobar". But if the regex is /foobar|foo/, then it would correctly match "foobar", "foo", and again "foobar".
    3. Next, we apply the quotemeta function to each string, which escapes any metacharacters that might have special meaning in a regex, such as . (dot, matches anything) or | (alternation operator). In our example, we want the string "g|h" to be matched literally, and not to mean "match g or h". Unescaped metacharacters can also break the syntax of the regex, like stray opening parentheses or similar. Note that quotemeta is the same as using \Q...\E in a regex. As discussed here, you should only drop \Q...\E or quotemeta in the case that you explicitly want metacharacters in your input strings to be special, they come from a trusted source, and you are certain that your strings don't contain any characters that would break your regular expression or expose security holes!
    4. Then, we join the strings into one long string using the regex alternation operator |. If you want to use this string without the qr// of step 5, note this potential pitfall: For example, if your input is qw/a b c/, then at this point your string will look like $regex_str="a|b|c". Then, saying /^$regex_str$/ will be interpolated to /^a|b|c$/, which means "match a only at the beginning of the string, or b anywhere in the string, or c only at the end of the string", which is probably not what you meant, you probably meant /^(?:a|b|c)$/, that is /^(?:$regex_str)$/!
    5. Finally, we compile the regular expression using qr//. This is not strictly necessary, you could just interpolate the string you've just created into a regex, but I prefer to turn them into regex objects explicitly. It also has the advantages that you can apply modifiers such as /i to the regex in a (IMO) more natural way, and that qr// implicitly adds a non-capturing group (?:...) around the regex, which takes care of the problem described in step 4 above.
    6. When we print the regular expression, we see that it has become this:
      You can now use this precompiled regular expression anywhere, as explained in Compiling and saving regular expressions and perlop, such as if ($input=~$regex) { ... } or while ($input=~/$regex/g) { ... }.

    Search and Replace Using a Hash

    my %map = ( a=>1, ab=>23, cd=>45 ); # 1. my $regex_str = join '|', # 2. map {quotemeta} sort { length $b <=> length $a or $a cmp $b } # 3. keys %map; my $regex = qr/$regex_str/; print "$regex\n"; # 4. # Now, use the regex my @strings = qw/ abcd aacd abaab /; # 5. for (@strings) { my $before = $_; s/($regex)/$map{$1}/g; # 6. print "$before -> $_\n"; # 7. }
    1. This is the hash in which the keys are the search strings, and the values are the replacements. As above, this can come from any source.
    2. This code to build the regex is mostly the same as the above, with differences noted here.
    3. Instead of only sorting by length, this sort first sorts by length, and sorts values with the same length with a stringwise sort. While not strictly necessary, I would recommend this because hashes are unordered by default, meaning that your regex would be in a different order across different runs of the program. Sorting the hash keys like this causes the regex to be in the same order in every run of the program.
    4. We print the regex for debugging, and see that it looks like this: (?^:ab|cd|a)
    5. These are the test strings we will apply the regular expression against.
    6. This is the search and replace operation that matches the keys of the hash, and as a replacement value gets the corresponding value from the hash. Note that the /g modifier is not strictly required (s///g will replace all matches in the string, not just the first), and you can adapt this regex any way you like. So for example, to only make one replacement anchored at the beginning of the string, you can say s/^($regex)/$map{$1}/;.
    7. The output of the code is:
      abcd -> 2345 aacd -> 1145 abaab -> 23123

    Hope this helps,
    -- Hauke D

Log In?

What's my password?
Create A New User
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2017-01-22 01:42 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (186 votes). Check out past polls.