Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: The unicode / utf8 struggle, part 2: regexes

by graff (Chancellor)
on May 17, 2007 at 18:59 UTC ( [id://616085]=note: print w/replies, xml ) Need Help??


in reply to Re: The unicode / utf8 struggle, part 2: regexes
in thread The unicode / utf8 struggle, part 2: regexes

And output should also be utf8 encoded unicode. Which it already is so I modified the step to skip the wrong encode step (new step 3) - am I doing it right now?

It would be easier to answer that if you showed us a relevant code snippet. And if you try the snippet yourself, that will probably answer the question. Check out this little unicode tool (shameless plug for a prog I posted recently), in case that helps to validate your data.

For the interested reader: in fact I use storable to serialize my resulting data structure as whole, then I gzip the freeze'd data and write it to disk with a simple binmode (and thus not :utf8) filehandle. Any problems here? utf8 data and utf8-flag should stay intact over the pipeline.

The utf8 flag is strictly a perl-internal attribute of scalar values. Once data is written to any sort of file (including any pipe), it's just data, and what happens to it after that point depends on what sort of process is reading it, and how that process chooses to interpret what is being read.

There is a section of the Storable man page about utf8 (under the heading "FORWARD COMPATIBILITY"), which you should consult. It looks like it will "do the right thing" for you by default (retain the utf8 flag as part of the "freeze"d data structure so that a downstream "thaw" gets it), but it'll be worth testing to be sure. (I haven't used it, so I don't know.)

  • Comment on Re^2: The unicode / utf8 struggle, part 2: regexes

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://616085]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-26 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found