http://www.perlmonks.org?node_id=853599


in reply to expand unicode property (eg \p{Print}) to regex character class range

I got
Use of uninitialized value $minbits in numeric ne (!=) at C:/Perl/lib/ +utf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $minbits in numeric lt (<) at C:/Perl/lib/u +tf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $bits in numeric lt (<) at C:/Perl/lib/utf8 +_heavy.pl line 237. at C:/Perl/lib/utf8_heavy.pl line 237 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 main::(unicode_printable.pl:11): unless( eval { $swash = ut +f8->SWASHNEW( $item ); 1; } ){
unicode_printable.pl is what I stepped through your script. It does work, I like the trick where you pull the ranges from after __END__ with @ARGV, but I cannot find that documented anywhere in the perldocs, maybe I'm not looking hard enough?
SSF

Replies are listed 'Best First'.
Re^2: expand unicode property (eg \p{Print}) to regex character class range
by Anonymous Monk on Aug 07, 2010 at 23:56 UTC
    See what ikegami said. Usage of program is
    perl fileyousaveditas.pl Propertyname propertyname propertyname
    For example (i'm omitting Print):
    $ perl unicode-regex-range.pl PerlSpace Title Bopo Dingbats PerlSpace => [\u0009-\u000A\u000C-\u000D\u0020] Title => [\u01C5\u01C8\u01CB\u01F2\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1 +FAF\u1FBC\u1FCC\u1FFC] Bopo => [\u3105-\u312D\u31A0-\u31B7] Dingbats => [\u2700-\u27BF]
    For a list of properties see perluniprops. This program will work only for \w+ properties, it wont work for compound ones like Script: something or Block: something..

    I suppose Unicode::UCD ought to provide this functionality or really javascript ought to provide \p{} and \P{} ...

Re^2: expand unicode property (eg \p{Print}) to regex character class range
by ikegami (Patriarch) on Aug 07, 2010 at 23:10 UTC

    I like the trick where you pull the ranges from after __END__ with @ARGV

    He does no such thing. He simply placed the output he got after __END__ so you could see it without breaking the program.

    If you want what's after __END__, you can read it from the DATA file handle.