Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Convert & to & etc.

by loris (Hermit)
on Feb 07, 2008 at 12:42 UTC ( #666784=perlquestion: print w/ replies, xml ) Need Help??
loris has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

I am parsing some HTML with HTML::Parser and need to convert the ampersands and umlauts from stuff like & and ü to something more reasonable (in my case, Excel-friendly).

Can anyone point me in the right direction?

Thanks,

loris


"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)

Comment on Convert & to & etc.
Select or Download Code
Re: Convert & to & etc.
by poolpi (Hermit) on Feb 07, 2008 at 12:52 UTC

      Thanks, that works fine for the ampersands, but not for my umlauts. I assume this is because, say, is encoded not as ü, but as ü, whatever that is. Do you know what sort of encoding this is and how I can deal with it?

      Thanks,

      loris


      "It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)
        When you parse websites you have to consult the HTTP headers (and perhaps the http-equiv meta tags) to find out which charset it is in.

        Then you can use Encode::decode to transform it into something useful.

        (Perhaps inspecting a hexdump of the string helps you to find out which charset it is in).

        If you are using Spreadsheet::WriteExcel, you can use its functionality directly:

        use Spreadsheet::WriteExcel; use HTML::Entities; use Encode qw( from_to ); from_to (decode_entities ($value), "utf-8", "ucs2"); $wks->write_unicode ($column, $row, $value);

        Enjoy, Have FUN! H.Merijn
Re: Convert & to & etc.
by moritz (Cardinal) on Feb 07, 2008 at 12:52 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://666784]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-07-12 08:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (238 votes), past polls