Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

(RFC) PostScript::Glyph::MapToUnicode - my first (intended-to-be) CPAN module

by Aristotle (Chancellor)
on Aug 18, 2003 at 22:46 UTC ( [id://284750]=perlmeditation: print w/replies, xml ) Need Help??

This is a small utility module I wrote for my work. I've tried to make it reasonably DWIMmish and equip it with enough convenience features that it gets out of your face as quickly as possible.

#!/usr/bin/perl -w =pod head1 NAME PostScript::Glyph::MapToUnicode - PostScript glyph name to Unicode con +version =head1 SYNOPSIS use PostScript::Glyph::MapToUnicode file => '/usr/doc/PostScript/aglf +n13.txt'; print PostScript::Glyph::MapToUnicode::map('Euro'), "\n"; =head1 DESCRIPTION This module implements (most of - see L</"BUGS">) the PostScript glyph + name to Unicode code point conversion algorithm described by Adobe at L<http://partners.adobe.com/asn/tech/type/unicodegn.jsp>. To do something more than marginally useful with this module you shoul +d download the B<Adobe Glyph List> from L<http://partners.adobe.com/asn/tech/type/aglfn13.txt>. =head1 INTERFACE =over 4 =item parse_adobeglyphlist() This function parses an B<Adobe Glyph List> file and returns true on s +uccess. On failure, it returns false and supplies an error message in the pack +age variable C<$ERROR>. It expects its first argument to specify how to re +trieve the data. The following options exist: =over 4 =item file Takes the name of a file containing the B<Adobe Glyph List>. =item fh Takes a filehandle reference that should be open on a file containing +the Adobe Glyph List. =item array Takes an array reference. Each array element is expected to contain on +e line from the B<Adobe Glyph List>. =item data Takes a scalar that is expected to contain the entire B<Adobe Glyph Li +st> file. =back For convenience, you can pass the same parameter to the module's C<imp +ort()> function, as exemplified in L</"SYNOPSIS">. It will croak if it encoun +ters any errors. =item map() Takes a list of strings containing whitespace separated PostScript gly +phs and returns them concatenated as a single string in Unicode encoding. You +may want to memoize this function when processing large PostScript documents. =back =head1 BUGS C<map()> does not take the font into account, so it will produce incor +rect results for glyphs from the B<ZapfDingbats> font. =head1 AUTHOR Aristotle Pagaltzis L<mailto:pagaltzis@gmx.de> =head1 COPYRIGHT This program is Copyright (c)2003 Aristotle Pagaltzis. All rights res +erved. This program is free software; you can redistribute it and/or modify i +t under the terms of either: a) the GNU General Public License as published by + the Free Software Foundation; either version 1, or (at your option) any later v +ersion, or b) the "Artistic License" which comes with Perl. =head1 DISCLAIMER This program is distributed in the hope that it will be useful, but WI +THOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITN +ESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the A +rtistic License for more details. =cut package PostScript::Glyph::MapToUnicode; use strict; use vars qw($ERROR); my $uni_notation = qr{ \A uni ( (?: [0-9ABCEF] [\dA-F] {3} | D [0-7] [\dA-F] {2} )+ ) \z }x; my $u_notation = qr{ \A u ( [0-9ABCEF] [\dA-F] {3,5} | (?: D [0-7] [\dA-F] {2,3} | D [8-9A-F] [\dA-F] {3} ) ) \z }x; my %agl; sub map { my $digits; return join '', map { exists $agl{$_} ? $agl{$_} : (($digits) = m/$uni_notation/) ? map { pack "U", hex } $digits =~ /(....)/g : (($digits) = m/$u_notation/) ? pack "U", hex $digits : do { '' }; } map { split /_/ } map { /\A(.+?)\./ ? $1 : $_ } map { split } @_; } sub parse_adobeglyphlist { my $method = shift; my $data = $method eq 'array' ? do { my $array = shift; unless(ref $array eq 'ARRAY') { $ERROR = "Expected array reference in '$array'"; return; } $array; } : $method eq 'data' ? [ split /^/m, shift ] : ($method eq 'file' or $method eq 'fh') ? do { my $fh = $method eq 'fh' ? shift : do { open my $fh, '<', shift or ($ERROR = "$!", return) +; $fh; }; [ <$fh> ]; } : ($ERROR = "Unknown parsing interface '$method'", return); %agl = do { @$data = grep !/\A (?: \# | \s* \z)/x, @$data; chomp @$data; map { my ($code_pt, $glyph) = split /;/; ($glyph => pack "U", hex $code_pt); } @$data; }; delete $agl{'.notdef'}; return 1; } sub import { shift; unless(&parse_adobeglyphlist) { require Carp; Carp::croak("Failed to parse AdobeGlyphList: $ERROR"); } } 1;

Not using any exports was a conscious choice. One big factor was that I absolutely despise the Exporter interface. Considering how little there is to possibly export in the first place, I don't want to introduce a dependency on a non-core exporter either. And lastly, writing my own import() gives me a nice opportunity to be convenient - however, it would be difficult to bend its semantics far enough sideways to allow the user to specify when s/he doesn't actually want to import anything (not that I could be bothered writing a sufficiently flexible exporter anyway).

Other than that, I don't really have strong opinions on any of my choices.

What do people think of the name? The POD? The code?

Originally, I was going to distribute the glyph list as an appendix in the module's __DATA__, but its license is unclear, so I opted to tell users where to get it from and put the module under the same terms as Perl instead.

Also, I'm unsure whether the specification on the Adobe site should be interpreted such that it allows glyph names like u00D7FF, in which case my $u_notation require far more contortions.

Any comments would be gladly welcome.

Update: pack "U", $digits now correctly says pack "U", hex $digits. I need to write a test suite..

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: (RFC) PostScript::Glyph::MapToUnicode - my first (intended-to-be) CPAN module
by PodMaster (Abbot) on Aug 19, 2003 at 09:24 UTC
    You got typo (head1 ne =head1). Also, I prefer =head(2|3|4) instead of =item for function listings ;)

    You're missing a $VERSION variable (but I assume you're gonna add this as soon as you create a distribution -- when you do, it'd be a good idea to include a LICENSE file containing the licenses -- ExtUtils::ModuleMaker ).

    I'd like to point out to you PostScript::ISOLatin9Encoding and PostScript::ISOLatin1Encoding, which are similar to your module (PostScript::UnicodeEncoding?).

    However you decide to deal with the actual glyph file, it might be a good idea to attempt to download it during make time, and append it to your module. If you choose to prompt the user, but sure to use ExtUtil::MakeMaker's prompt and not roll your own (cause it detects if Makefile.PL is being run interactively or not -- very important).

    And on a final note, I'd like to suggest that you remove your email and make a note of http://rt.cpan.org like so.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Thanks for the pointers.

      =head2 would be better appropriate than =item indeed. I don't know why that didn't occur to me.

      I know the PostScript encoding modules, but they don't do the same. They offer the reverse mapping only, and only as a static table. Due to the sheer mass of glyph names possible with the uni- and u-notation it is infeasible to do it that way for this module. Besides, I need the opposite mapping direction from what they do.

      I'll have to think some more about the glyph list. I'll probably put that off for the first revision of the module.

      Good point about CPAN's RT.

      Makeshifts last the longest.

Re: (RFC) PostScript::Glyph::MapToUnicode - my first (intended-to-be) CPAN module
by valdez (Monsignor) on Aug 19, 2003 at 00:01 UTC

    Nice module! May I suggest a feature? Add an option to installation script to download and/or set path for the glyh list. This way, you don't break any licence (the file is available to everyone) and installation can be automated.

    HTH, Valerio

      I've thought of that, but how do I integrate that with the CPAN.pm installation process when I want the default to be not to do it?

      Actually, thinking about it, I might provide an additional distribution whose installation always downloads the file, containing a stub module whose presence is detected by this one.

      Then a bundle which installs both distributions would solve things nicely. You either fire-and-forget CPAN.pm at the bundle or just get this module alone.

      How does that sound?

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://284750]
Approved by ybiC
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-03-19 04:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found