Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Regular expressions and accents

by graff (Chancellor)
on Dec 22, 2004 at 04:31 UTC ( #416690=note: print w/replies, xml ) Need Help??


in reply to Regular expressions and accents

Larry Wall recently posted this nifty little script on the perl-unicode mail list -- here it is, pretty much verbatim (I added the "S" on the shebang line, to make STDIN/STDOUT/STDERR be utf8):
#!/usr/bin/perl -CS $pat = shift; if (ord $pat > 256) { $pat = sprintf("%04x", ord $pat); } elsif (ord $pat > 128) { # arg in sneaky UTF-8 $pat = sprintf("%04x", unpack("U0U",$pat)); } @names = split /^/, do 'unicore/Name.pl'; for (@names) { if (/$pat/io) { $hex = hex($_); print chr($hex),"\t",$_; } }
The idea is to output a list of unicode code points (if any) that match any given expression you put into  $ARGV[0] -- here's a relevant command-line usage example (Larry had this script in a file named "uni"):
uni "latin (?:small|capital) letter A with"
(update: if you try this, you'll want to be running in a terminal window that handles utf8 characters!)

So, all you need for what you want is the part that assigns the output of "unicode/Name.pl" to an array -- this gives you the unicode character database -- and grep through the array to get the set of vowels you want. Then, put the first token (first character in each array element is the utf8 character itself) into a character-class expression. Something like:

my @names = split/^/, do 'unicore/Name.pl'; #... my @vowelsets; for my $v ( qw/A E I O U/ ) { push( @vowelsets, join( '', map { chr hex( substr $_, 0, 4 ) } grep /LATIN (?:SMALL|CAPITAL) LETTER $v/, @names )); } # now you can use each element of @vowelsets as a character class # (similiarly for consonants...)
(updated this snippet: changed the map block from a regex to substr; updated a second time to use "chr hex()" in the map block -- each element of @names begins with a four-digit hex code-point value, which needs to be converted to a character.)

Still a bit cumbersome, I suppose, but quite manageable and not that bulky.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://416690]
help
Chatterbox?
[holli]: but then you DO give a fuck
[1nickt]: I don;t think beliefs should be down-voted, just behaviours.
[james28909]: im not quite how to explain it any better nick. you evolved from ignorance to intelligence. not the other way. the universe evolves from gas coulds and debris into planets stars and galaxies ect. it doesnt happen any other way. hence it has ....
[james28909]: some kind of logic behind it
[james28909]: and that is also anothe rpoint i made, i think it has to do with perception of the world around you. most people think of evolution on a human scale. why could life evolve on this planet? because this planet evolved in this solar system. and so on.
[holli]: here's something for you to watch, James. I think you will like it
[erix]: for the record: I have not downvoted anyone on that subthread that was my fault
[james28909]: there are all kinds of things that had to happen to let life come to be. but at the same time, life may not be the end goal IF there is any kind of end goal lol
[james28909]: well who is the person who gets to decide which behaviour is worthy of a downvote? a person with their own beliefs? xD
[erix]: teleology -- I've never understood why that was thunk up

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (18)
As of 2017-12-15 14:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (433 votes). Check out past polls.

    Notices?