http://www.perlmonks.org?node_id=742338

CColin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Am trying to parse some HTML in TreeBuilder which returns fractional values '1/2', '1/4', '3/4' and would like to represent these as decimals. I hoped I could go into HTML::Entities and find out how to simply spot the mapping of the HTML values to fractions and change it to decimals. I can see in the source that HTML is converted to the following "frac" values (and vice versa):
frac14 => chr(188), frac12 => chr(189), frac34 => chr(190),
But I can't see how the "frac" values are converted to display fractions, or how I can convert these fractions to decimals instead. Ideally I want frac14, frac12 and frac34 to convert to values to
.25, .5, .75
respectively. I suppose I could ignore the modules and go and try to change all the HTML by regexes, but that kind of misses the point and makes everything else harder to maintain. Thanks for any guidance.

Replies are listed 'Best First'.
Re: frac12 to decimal
by karoshi (Novice) on Feb 09, 2009 at 03:52 UTC

    Behold, the hash holding those is declared with "use vars":

    use vars qw(%entity2char %char2entity);

    So, after that package has been loaded and compiled, you could reach into that package and change just the values of them keys:

    # switch package package HTML::Entities; $entity2char{frac14} = '.25'; $entity2char{frac12} = '.5'; $entity2char{frac34} = '.75'; # switch back to your package package main; # go on with your code
      Hi Thanks - I guess that answers the question on changing the behaviour of the module without changing the module, but having said that this particular code did not bring about the desired behaviour. That's still the piece I can't figure out what's happening? The hash in the module:
      %entity2char
      Contains the following key value pairs that seem to govern this behaviour.
      frac14 => chr(188), frac12 => chr(189), frac34 => chr(190),
      Presumably chr(188) and its brethren are taken straight from the HTML, then fracNN to which it is mapped is somehow translated to display as the tiny one character wide fraction symbol. But stating:
      # switch package package HTML::Entities; $entity2char{frac14} = '.25'; $entity2char{frac12} = '.5'; $entity2char{frac34} = '.75'; # switch back to your package package main; # go on with your code
      Does not seem to change that behaviour?

        from HTML/Entity.pm:

        # Make the opposite mapping while (my($entity, $char) = each(%entity2char)) { $entity =~ s/;\z//; $char2entity{$char} = "&$entity;"; }

        so try: change %char2entity as well

        package HTML::Entities; $char2entity{'.25'} = '¼'; $char2entity{'.5'} = '½'; $char2entity{'.75'} = '¾'; package main; ...

        or post some code to look at.

        ½ is a HTML entity, it maps directly to chr(188)/U+00BC, which in ISO88591 is vulgar fraction one quarter , and all computer programs which know how to display ISO88591 will draw on the screen ½
Re: frac12 to decimal
by kennethk (Abbot) on Feb 09, 2009 at 03:47 UTC
    Any time you see chr, you are invoking the Perl internal character set. The display values are therefore not 1/2, etc. but a single character ½ that looks similar. As such, to convert to a decimal, you need to associate that fraction character with the appropriate decimal value. The inverse function of chr is ord, so you could go through the unicode character set (unicode tables) to find all appropriate values. I would recommend checking the CPAN modules HTML::Fraction and String::Fraction for some mappings you could "borrow".