Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Convert & to & etc.

by loris (Hermit)
on Feb 07, 2008 at 12:42 UTC ( #666784=perlquestion: print w/ replies, xml ) Need Help??
loris has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

I am parsing some HTML with HTML::Parser and need to convert the ampersands and umlauts from stuff like & and ü to something more reasonable (in my case, Excel-friendly).

Can anyone point me in the right direction?

Thanks,

loris


"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)

Comment on Convert & to & etc.
Select or Download Code
Re: Convert & to & etc.
by poolpi (Hermit) on Feb 07, 2008 at 12:52 UTC

      Thanks, that works fine for the ampersands, but not for my umlauts. I assume this is because, say, is encoded not as ü, but as ü, whatever that is. Do you know what sort of encoding this is and how I can deal with it?

      Thanks,

      loris


      "It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)
        When you parse websites you have to consult the HTTP headers (and perhaps the http-equiv meta tags) to find out which charset it is in.

        Then you can use Encode::decode to transform it into something useful.

        (Perhaps inspecting a hexdump of the string helps you to find out which charset it is in).

        If you are using Spreadsheet::WriteExcel, you can use its functionality directly:

        use Spreadsheet::WriteExcel; use HTML::Entities; use Encode qw( from_to ); from_to (decode_entities ($value), "utf-8", "ucs2"); $wks->write_unicode ($column, $row, $value);

        Enjoy, Have FUN! H.Merijn
Re: Convert & to & etc.
by moritz (Cardinal) on Feb 07, 2008 at 12:52 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://666784]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-12-29 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (188 votes), past polls