Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Help with HTML::Entities - decode_entities

by punch_card_don (Curate)
on Jul 08, 2010 at 23:29 UTC ( #848781=perlquestion: print w/ replies, xml ) Need Help??
punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Mercantile Monks,

Could someone please give me a clue on using the decode_entities() function of the HTML::Entities module?

I just want to decode & to & and &#59; to ;

Based on the documentation's example, I've tried many variations on:

decode_entities($mystring, { amp => "&", 59 => ";" }, 0);
but it keep decoding chacters I don't want decoded, like apostrophes.

Also, the docs show it as having a leading underscore (_decode_entities), but it seems to work without.

Thanks




Time flies like an arrow. Fruit flies like a banana.

Comment on Help with HTML::Entities - decode_entities
Select or Download Code
Re: Help with HTML::Entities - decode_entities
by Your Mother (Chancellor) on Jul 08, 2010 at 23:46 UTC

    I think the leading underscore is necessary--best not to second guess the documents when you're having problems--and it seems to do it in place so you have to check the variable, not the return from the function.

    perl -MHTML::Entities -le '$s = "ÖH & HAÌ"; _decode_en +tities($s, { amp => "&" }, 0); print $s' -- ÖH & HAÌ
      Thanks, Mom, for the help.

      Operats in place - right- OK, so I took it out, like this:

      _decode_entities($mystring, { amp => "&", 59 => ";" }, 0); print $mystring;
      but when $mystring = "Fred's shoe", I'm still getting "Fred's shoe" output.



      Time flies like an arrow. Fruit flies like a banana.
Re: Help with HTML::Entities - decode_entities
by Anonymous Monk on Jul 08, 2010 at 23:49 UTC
    Also, the docs show it as having a leading underscore (_decode_entities), but it seems to work without.

    decode_entities and _decode_entities are two different functions with two different interfaces.

    #!/usr/bin/perl -- use strict; use warnings; use HTML::Entities; use Test::More tests => 3; my $str = q!amp & 59 &#59; quot "!; is( decode_entities($str), 'amp & 59 ; quot "' ); is( join( '-', decode_entities( $str, $str ) ), join( '-', 'amp & 59 ; quot "' , 'amp & 59 ; quot "' ) ); { my $str = $str; _decode_entities( $str, { amp => "&", 59 => ";" }, 0 ); is( $str, 'amp & 59 ; quot "' ); } __END__
    http://search.cpan.org/dist/HTML-Parser/MANIFEST
    t/entities.t Test encoding/decoding of entities t/entities2.t Test _decode_entities()
    How (Not) To Ask A Question

      Thanks for this. But I'm not familiar with this notation, so I tried running it and it won't run for me to see what it outputs.




      Time flies like an arrow. Fruit flies like a banana.
        *gasp* :D
        $ perl -MTest::More=tests,2 -e"is( 1, 2 )" 1..2 not ok 1 # Failed test at -e line 1. # got: '1' # expected: '2' # Looks like you planned 2 tests but ran 1. # Looks like you failed 1 test of 1 run.
Re: Help with HTML::Entities - decode_entities
by ikegami (Pope) on Jul 09, 2010 at 00:23 UTC

    Even if it was possible to do what you want with HTML::Entities — and it looks like you can't — It would make most sense to just use

    $mystring =~ s/&#59;/;/g; $mystring =~ s/&/&/g;

    That said, decoding "&" without decoding all other entities creates garbage. For example

    é

    and

    é

    are clearly not equivalent, but they produce the same string under your requirements.

      You've inferred the problem I'm tryuing to solve. Recursive use of encode_entities has filled a db with entries like é, which display, as you say, like garbage in the client. Was hoping to just de-decode them before output, but we'll go regex.

      Thanks again.




      Time flies like an arrow. Fruit flies like a banana.
Re: Help with HTML::Entities - decode_entities
by punch_card_don (Curate) on Jul 09, 2010 at 00:27 UTC
    For example, this test code:
    #!/usr/bin/perl use strict; use warnings; use HTML::Entities; print "Content-type: text/html\n\n"; my $str = "Fred&#39;s shoe"; _decode_entities($str, { amp => "&", 59 => ";" }, 0); print "<p>$str\n";
    Outputs
    <p>Fred's shoe
    even though the docs say

    _decode_entities( $string, \%entity2char, $expand_prefix )
    This will in-place replace HTML entities in $string. The %entity2char hash must be provided. Named entities not found in the %entity2char hash are left alone.




    Time flies like an arrow. Fruit flies like a banana.
      "&#39;" is not a named entity. It's numerical.
        *!*! SMACK !*!* You could have had a V8!

        Ok, so you can't do this with this module. Back to the regex's.

        thanks.




        Time flies like an arrow. Fruit flies like a banana.
      Named entities not found in the %entity2char hash are left alone. Numeric entities are expanded unless their value overflow.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://848781]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2015-07-05 22:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls