http://www.perlmonks.org?node_id=848781

punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Mercantile Monks,

Could someone please give me a clue on using the decode_entities() function of the HTML::Entities module?

I just want to decode & to & and &#59; to ;

Based on the documentation's example, I've tried many variations on:

decode_entities($mystring, { amp => "&", 59 => ";" }, 0);
but it keep decoding chacters I don't want decoded, like apostrophes.

Also, the docs show it as having a leading underscore (_decode_entities), but it seems to work without.

Thanks




Time flies like an arrow. Fruit flies like a banana.

Replies are listed 'Best First'.
Re: Help with HTML::Entities - decode_entities
by ikegami (Patriarch) on Jul 09, 2010 at 00:23 UTC

    Even if it was possible to do what you want with HTML::Entities — and it looks like you can't — It would make most sense to just use

    $mystring =~ s/&#59;/;/g; $mystring =~ s/&/&/g;

    That said, decoding "&" without decoding all other entities creates garbage. For example

    é

    and

    é

    are clearly not equivalent, but they produce the same string under your requirements.

      You've inferred the problem I'm tryuing to solve. Recursive use of encode_entities has filled a db with entries like é, which display, as you say, like garbage in the client. Was hoping to just de-decode them before output, but we'll go regex.

      Thanks again.




      Time flies like an arrow. Fruit flies like a banana.
Re: Help with HTML::Entities - decode_entities
by Your Mother (Archbishop) on Jul 08, 2010 at 23:46 UTC

    I think the leading underscore is necessary--best not to second guess the documents when you're having problems--and it seems to do it in place so you have to check the variable, not the return from the function.

    perl -MHTML::Entities -le '$s = "ÖH & HAÌ"; _decode_en +tities($s, { amp => "&" }, 0); print $s' -- ÖH & HAÌ
      Thanks, Mom, for the help.

      Operats in place - right- OK, so I took it out, like this:

      _decode_entities($mystring, { amp => "&", 59 => ";" }, 0); print $mystring;
      but when $mystring = "Fred's shoe", I'm still getting "Fred's shoe" output.



      Time flies like an arrow. Fruit flies like a banana.
Re: Help with HTML::Entities - decode_entities
by Anonymous Monk on Jul 08, 2010 at 23:49 UTC
    Also, the docs show it as having a leading underscore (_decode_entities), but it seems to work without.

    decode_entities and _decode_entities are two different functions with two different interfaces.

    #!/usr/bin/perl -- use strict; use warnings; use HTML::Entities; use Test::More tests => 3; my $str = q!amp & 59 &#59; quot "!; is( decode_entities($str), 'amp & 59 ; quot "' ); is( join( '-', decode_entities( $str, $str ) ), join( '-', 'amp & 59 ; quot "' , 'amp & 59 ; quot "' ) ); { my $str = $str; _decode_entities( $str, { amp => "&", 59 => ";" }, 0 ); is( $str, 'amp & 59 ; quot "' ); } __END__
    http://search.cpan.org/dist/HTML-Parser/MANIFEST
    t/entities.t Test encoding/decoding of entities t/entities2.t Test _decode_entities()
    How (Not) To Ask A Question

      Thanks for this. But I'm not familiar with this notation, so I tried running it and it won't run for me to see what it outputs.




      Time flies like an arrow. Fruit flies like a banana.
        *gasp* :D
        $ perl -MTest::More=tests,2 -e"is( 1, 2 )" 1..2 not ok 1 # Failed test at -e line 1. # got: '1' # expected: '2' # Looks like you planned 2 tests but ran 1. # Looks like you failed 1 test of 1 run.
Re: Help with HTML::Entities - decode_entities
by punch_card_don (Curate) on Jul 09, 2010 at 00:27 UTC
    For example, this test code:
    #!/usr/bin/perl use strict; use warnings; use HTML::Entities; print "Content-type: text/html\n\n"; my $str = "Fred&#39;s shoe"; _decode_entities($str, { amp => "&", 59 => ";" }, 0); print "<p>$str\n";
    Outputs
    <p>Fred's shoe
    even though the docs say

    _decode_entities( $string, \%entity2char, $expand_prefix )
    This will in-place replace HTML entities in $string. The %entity2char hash must be provided. Named entities not found in the %entity2char hash are left alone.




    Time flies like an arrow. Fruit flies like a banana.
      "&#39;" is not a named entity. It's numerical.
        *!*! SMACK !*!* You could have had a V8!

        Ok, so you can't do this with this module. Back to the regex's.

        thanks.




        Time flies like an arrow. Fruit flies like a banana.
      Named entities not found in the %entity2char hash are left alone. Numeric entities are expanded unless their value overflow.