Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This is a small utility module I wrote for my work. I've tried to make it reasonably DWIMmish and equip it with enough convenience features that it gets out of your face as quickly as possible.

#!/usr/bin/perl -w =pod head1 NAME PostScript::Glyph::MapToUnicode - PostScript glyph name to Unicode con +version =head1 SYNOPSIS use PostScript::Glyph::MapToUnicode file => '/usr/doc/PostScript/aglf +n13.txt'; print PostScript::Glyph::MapToUnicode::map('Euro'), "\n"; =head1 DESCRIPTION This module implements (most of - see L</"BUGS">) the PostScript glyph + name to Unicode code point conversion algorithm described by Adobe at L<>. To do something more than marginally useful with this module you shoul +d download the B<Adobe Glyph List> from L<>. =head1 INTERFACE =over 4 =item parse_adobeglyphlist() This function parses an B<Adobe Glyph List> file and returns true on s +uccess. On failure, it returns false and supplies an error message in the pack +age variable C<$ERROR>. It expects its first argument to specify how to re +trieve the data. The following options exist: =over 4 =item file Takes the name of a file containing the B<Adobe Glyph List>. =item fh Takes a filehandle reference that should be open on a file containing +the Adobe Glyph List. =item array Takes an array reference. Each array element is expected to contain on +e line from the B<Adobe Glyph List>. =item data Takes a scalar that is expected to contain the entire B<Adobe Glyph Li +st> file. =back For convenience, you can pass the same parameter to the module's C<imp +ort()> function, as exemplified in L</"SYNOPSIS">. It will croak if it encoun +ters any errors. =item map() Takes a list of strings containing whitespace separated PostScript gly +phs and returns them concatenated as a single string in Unicode encoding. You +may want to memoize this function when processing large PostScript documents. =back =head1 BUGS C<map()> does not take the font into account, so it will produce incor +rect results for glyphs from the B<ZapfDingbats> font. =head1 AUTHOR Aristotle Pagaltzis L<> =head1 COPYRIGHT This program is Copyright (c)2003 Aristotle Pagaltzis. All rights res +erved. This program is free software; you can redistribute it and/or modify i +t under the terms of either: a) the GNU General Public License as published by + the Free Software Foundation; either version 1, or (at your option) any later v +ersion, or b) the "Artistic License" which comes with Perl. =head1 DISCLAIMER This program is distributed in the hope that it will be useful, but WI +THOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITN +ESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the A +rtistic License for more details. =cut package PostScript::Glyph::MapToUnicode; use strict; use vars qw($ERROR); my $uni_notation = qr{ \A uni ( (?: [0-9ABCEF] [\dA-F] {3} | D [0-7] [\dA-F] {2} )+ ) \z }x; my $u_notation = qr{ \A u ( [0-9ABCEF] [\dA-F] {3,5} | (?: D [0-7] [\dA-F] {2,3} | D [8-9A-F] [\dA-F] {3} ) ) \z }x; my %agl; sub map { my $digits; return join '', map { exists $agl{$_} ? $agl{$_} : (($digits) = m/$uni_notation/) ? map { pack "U", hex } $digits =~ /(....)/g : (($digits) = m/$u_notation/) ? pack "U", hex $digits : do { '' }; } map { split /_/ } map { /\A(.+?)\./ ? $1 : $_ } map { split } @_; } sub parse_adobeglyphlist { my $method = shift; my $data = $method eq 'array' ? do { my $array = shift; unless(ref $array eq 'ARRAY') { $ERROR = "Expected array reference in '$array'"; return; } $array; } : $method eq 'data' ? [ split /^/m, shift ] : ($method eq 'file' or $method eq 'fh') ? do { my $fh = $method eq 'fh' ? shift : do { open my $fh, '<', shift or ($ERROR = "$!", return) +; $fh; }; [ <$fh> ]; } : ($ERROR = "Unknown parsing interface '$method'", return); %agl = do { @$data = grep !/\A (?: \# | \s* \z)/x, @$data; chomp @$data; map { my ($code_pt, $glyph) = split /;/; ($glyph => pack "U", hex $code_pt); } @$data; }; delete $agl{'.notdef'}; return 1; } sub import { shift; unless(&parse_adobeglyphlist) { require Carp; Carp::croak("Failed to parse AdobeGlyphList: $ERROR"); } } 1;

Not using any exports was a conscious choice. One big factor was that I absolutely despise the Exporter interface. Considering how little there is to possibly export in the first place, I don't want to introduce a dependency on a non-core exporter either. And lastly, writing my own import() gives me a nice opportunity to be convenient - however, it would be difficult to bend its semantics far enough sideways to allow the user to specify when s/he doesn't actually want to import anything (not that I could be bothered writing a sufficiently flexible exporter anyway).

Other than that, I don't really have strong opinions on any of my choices.

What do people think of the name? The POD? The code?

Originally, I was going to distribute the glyph list as an appendix in the module's __DATA__, but its license is unclear, so I opted to tell users where to get it from and put the module under the same terms as Perl instead.

Also, I'm unsure whether the specification on the Adobe site should be interpreted such that it allows glyph names like u00D7FF, in which case my $u_notation require far more contortions.

Any comments would be gladly welcome.

Update: pack "U", $digits now correctly says pack "U", hex $digits. I need to write a test suite..

Makeshifts last the longest.

In reply to (RFC) PostScript::Glyph::MapToUnicode - my first (intended-to-be) CPAN module by Aristotle

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (6)
    As of 2018-06-21 09:09 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (117 votes). Check out past polls.