Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Creating (and using) a custom encoding. (SOLUTION)

by davido (Cardinal)
on May 31, 2013 at 17:01 UTC ( [id://1036270]=note: print w/replies, xml ) Need Help??


in reply to Creating (and using) a custom encoding.

It turns out that most of the confusion was due to File::Slurp RT#84918, submitted by our friend corion. "File-Slurp: read_file() ignores binmode option for short files". If only I had suspected File::Slurp earlier, I could have saved myself (and others) some time.

Here's a complete working example. Note, you must binmode the filehandle with ":encoding(rot13)", not the more terse ":rot13" (which simply won't work). Also, there's no need to explicitly call define_encoding from Encode within the calling package; the line __PACKAGE__->Define('rot13'); does that for us.

package Encode::ROT13; use strict; use warnings; use parent qw( Encode::Encoding ); sub encode($$;$){ my( $obj, $str, $chk ) = @_; $str =~ tr/A-Za-z/N-ZA-Mn-za-m/; $_[1] = '' if $chk; # $_[1] is aliased through the call. Inplace edi +t. # (Remove whole string unless there's an error.) return $str; } no warnings 'once'; *decode = \&encode; # Because rot13( rot13() ) is a round-trip. __PACKAGE__->Define( 'rot13' ); 1; package main; use strict; use warnings; binmode \*DATA, ':encoding(rot13)'; chomp( my @words = <DATA> ); print "$_\n" for @words; __DATA__ Apple cat dog strawberry watermelon

...and the output...

Nccyr png qbt fgenjoreel jngrezryba

...now on to learn how to use the enc2xs tool.


Dave

Replies are listed 'Best First'.
Re^2: Creating (and using) a custom encoding. (SOLUTION)
by graff (Chancellor) on Jun 01, 2013 at 02:38 UTC
    In case you haven't gotten all the way yet with enc2xs, the only "hard" part is to build the appropriate "Unicode Character Map" (ucm) file to describe the relationship between Unicode and your specialized character encoding.

    In case it helps, you might want to look at Encode::Buckwalter, which includes a ucm file to define a specialized ASCII "alphabet" for transliterating Arabic characters. It's fairly simple, except that some character relations only work in one direction (e.g. when going from Unicode to "Buckwalter Transliteration", U+0030 and U+0660 will both map to ASCII "0", but when going from transliteration to Unicode, ASCII "0" will only map to U+0030, and likewise for other digits).

      Thanks. I appreciate the links.

      What first motivated this investigation was a quest for alternatives to automatically apply fold case (fc) to an incoming file. I'm well aware that this is a road less traveled. Certainly it violates "the principle of least surprise", and as such I wouldn't consider it for production code. But it's been an interesting investigation so far. :)


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1036270]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2024-04-23 14:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found