Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: expand unicode property (eg \p{Print}) to regex character class range

by sflitman (Hermit)
on Aug 07, 2010 at 22:54 UTC ( #853599=note: print w/ replies, xml ) Need Help??


in reply to expand unicode property (eg \p{Print}) to regex character class range

I got

Use of uninitialized value $minbits in numeric ne (!=) at C:/Perl/lib/ +utf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $minbits in numeric lt (<) at C:/Perl/lib/u +tf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $bits in numeric lt (<) at C:/Perl/lib/utf8 +_heavy.pl line 237. at C:/Perl/lib/utf8_heavy.pl line 237 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 main::(unicode_printable.pl:11): unless( eval { $swash = ut +f8->SWASHNEW( $item ); 1; } ){
unicode_printable.pl is what I stepped through your script. It does work, I like the trick where you pull the ranges from after __END__ with @ARGV, but I cannot find that documented anywhere in the perldocs, maybe I'm not looking hard enough?
SSF


Comment on Re: expand unicode property (eg \p{Print}) to regex character class range
Download Code
Replies are listed 'Best First'.
Re^2: expand unicode property (eg \p{Print}) to regex character class range
by ikegami (Pope) on Aug 07, 2010 at 23:10 UTC

    I like the trick where you pull the ranges from after __END__ with @ARGV

    He does no such thing. He simply placed the output he got after __END__ so you could see it without breaking the program.

    If you want what's after __END__, you can read it from the DATA file handle.

Re^2: expand unicode property (eg \p{Print}) to regex character class range
by Anonymous Monk on Aug 07, 2010 at 23:56 UTC
    See what ikegami said. Usage of program is
    perl fileyousaveditas.pl Propertyname propertyname propertyname
    For example (i'm omitting Print):
    $ perl unicode-regex-range.pl PerlSpace Title Bopo Dingbats PerlSpace => [\u0009-\u000A\u000C-\u000D\u0020] Title => [\u01C5\u01C8\u01CB\u01F2\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1 +FAF\u1FBC\u1FCC\u1FFC] Bopo => [\u3105-\u312D\u31A0-\u31B7] Dingbats => [\u2700-\u27BF]
    For a list of properties see perluniprops. This program will work only for \w+ properties, it wont work for compound ones like Script: something or Block: something..

    I suppose Unicode::UCD ought to provide this functionality or really javascript ought to provide \p{} and \P{} ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://853599]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (16)
As of 2015-07-30 14:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls