Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: expand unicode property (eg \p{Print}) to regex character class range

by sflitman (Hermit)
on Aug 07, 2010 at 22:54 UTC ( [id://853599]=note: print w/replies, xml ) Need Help??


in reply to expand unicode property (eg \p{Print}) to regex character class range

I got
Use of uninitialized value $minbits in numeric ne (!=) at C:/Perl/lib/ +utf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $minbits in numeric lt (<) at C:/Perl/lib/u +tf8_heavy.pl line 225. at C:/Perl/lib/utf8_heavy.pl line 225 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 Use of uninitialized value $bits in numeric lt (<) at C:/Perl/lib/utf8 +_heavy.pl line 237. at C:/Perl/lib/utf8_heavy.pl line 237 utf8::SWASHNEW('utf8', 'Print') called at unicode_printable.pl + line 11 eval {...} called at unicode_printable.pl line 11 main::(unicode_printable.pl:11): unless( eval { $swash = ut +f8->SWASHNEW( $item ); 1; } ){
unicode_printable.pl is what I stepped through your script. It does work, I like the trick where you pull the ranges from after __END__ with @ARGV, but I cannot find that documented anywhere in the perldocs, maybe I'm not looking hard enough?
SSF
  • Comment on Re: expand unicode property (eg \p{Print}) to regex character class range
  • Download Code

Replies are listed 'Best First'.
Re^2: expand unicode property (eg \p{Print}) to regex character class range
by Anonymous Monk on Aug 07, 2010 at 23:56 UTC
    See what ikegami said. Usage of program is
    perl fileyousaveditas.pl Propertyname propertyname propertyname
    For example (i'm omitting Print):
    $ perl unicode-regex-range.pl PerlSpace Title Bopo Dingbats PerlSpace => [\u0009-\u000A\u000C-\u000D\u0020] Title => [\u01C5\u01C8\u01CB\u01F2\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1 +FAF\u1FBC\u1FCC\u1FFC] Bopo => [\u3105-\u312D\u31A0-\u31B7] Dingbats => [\u2700-\u27BF]
    For a list of properties see perluniprops. This program will work only for \w+ properties, it wont work for compound ones like Script: something or Block: something..

    I suppose Unicode::UCD ought to provide this functionality or really javascript ought to provide \p{} and \P{} ...

Re^2: expand unicode property (eg \p{Print}) to regex character class range
by ikegami (Patriarch) on Aug 07, 2010 at 23:10 UTC

    I like the trick where you pull the ranges from after __END__ with @ARGV

    He does no such thing. He simply placed the output he got after __END__ so you could see it without breaking the program.

    If you want what's after __END__, you can read it from the DATA file handle.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://853599]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-18 12:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found