remiah's user image
User since: Dec 27, 2010 at 04:38 UTC (12 years ago)
Last here: May 04, 2017 at 08:04 UTC (5 years ago)
Experience: 1475
Level:Hermit (10)
Writeups: 233
User's localtime: Sep 27, 2022 at 03:16 JST
Scratchpad: View
For this user:Search nodes

hello monks.

Documents for Unicode
perlunitut 6 pages Very very short overview for unicode in perl + FAQ.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 8 pages About Charcter Set, Code Page, Unicode itself. Short History of Unicode.
perluniintro 12 pages This is the first thing to read (I think).
Character Encodings in Perl 7 pages all-in-one doc for encoding. Written by German Author.
perlunicode 20 pages Main document of perl's unicode. Through and precise, or too much for beginner.
Perl Programming/Unicode UTF-8 15 pages This document explains internal encoding of Perl (N8CS, utf-8) and also describe other problems. When you stumbled with 0x80-0xFF problem, this document explains the reason.
Others for Unicode
ikegami explains use feature 'unicode_strings' 1 pages It's for bug fix.
Unicode::UCD 19 pages Unicode Character Database. S‎crip‎t, Block, Properties of Unicode character.
perluniprops 49 pages Reference for Character properties which could be used with \p{Greek} .
\p{Print} to code points ... not yet read
Unicode support in perlguts not yet read

regex memo

Replace the nth occurence
\K, similar to zero width look behind, keep the left of \K exclude from $&.

my $nth = 4 -1 ; #replath 4th , to | my $str = 'a,bb,ccc,dddd,eeeee,ffffff'; $str =~ s{ (?: , [^,]*){$nth} \K , }{|}xms;

Error in my Regular expression pattern
pos() moves if regex succeeds, to reset, pos($_)=undef;

RegEx related line split
zero width look ahead good example, it acts like place holder.

regex: negative lookahead
Negative lookahead

Perl Regex Repeating Patterns
Regex Repeating Patterns, \G anchor
regexp: removing extra whitespace
Why do these regex variants behave as they do?

Re^3: Retain first 4 characters of a string various ways to making "Apple iPhone 4 Black Cover" to "Appl-iPho-4-Blac-Cove" (space separated words to 4 letter hyphened)

Limiting number of regex matches three dogs of marshall