Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Convert & to & etc.

by loris (Hermit)
on Feb 07, 2008 at 12:42 UTC ( #666784=perlquestion: print w/ replies, xml ) Need Help??
loris has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

I am parsing some HTML with HTML::Parser and need to convert the ampersands and umlauts from stuff like & and ü to something more reasonable (in my case, Excel-friendly).

Can anyone point me in the right direction?

Thanks,

loris


"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)

Comment on Convert & to & etc.
Select or Download Code
Re: Convert & to & etc.
by poolpi (Hermit) on Feb 07, 2008 at 12:52 UTC

      Thanks, that works fine for the ampersands, but not for my umlauts. I assume this is because, say, is encoded not as ü, but as ü, whatever that is. Do you know what sort of encoding this is and how I can deal with it?

      Thanks,

      loris


      "It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)
        When you parse websites you have to consult the HTTP headers (and perhaps the http-equiv meta tags) to find out which charset it is in.

        Then you can use Encode::decode to transform it into something useful.

        (Perhaps inspecting a hexdump of the string helps you to find out which charset it is in).

        If you are using Spreadsheet::WriteExcel, you can use its functionality directly:

        use Spreadsheet::WriteExcel; use HTML::Entities; use Encode qw( from_to ); from_to (decode_entities ($value), "utf-8", "ucs2"); $wks->write_unicode ($column, $row, $value);

        Enjoy, Have FUN! H.Merijn
Re: Convert & to & etc.
by moritz (Cardinal) on Feb 07, 2008 at 12:52 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://666784]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2015-07-02 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (30 votes), past polls