Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: Convert & to & etc.

by loris (Hermit)
on Feb 07, 2008 at 14:05 UTC ( #666794=note: print w/ replies, xml ) Need Help??


in reply to Re: Convert & to & etc.
in thread Convert & to & etc.

Thanks, that works fine for the ampersands, but not for my umlauts. I assume this is because, say, is encoded not as ü, but as ü, whatever that is. Do you know what sort of encoding this is and how I can deal with it?

Thanks,

loris


"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)


Comment on Re^2: Convert & to & etc.
Select or Download Code
Re^3: Convert & to & etc.
by moritz (Cardinal) on Feb 07, 2008 at 14:21 UTC
    When you parse websites you have to consult the HTTP headers (and perhaps the http-equiv meta tags) to find out which charset it is in.

    Then you can use Encode::decode to transform it into something useful.

    (Perhaps inspecting a hexdump of the string helps you to find out which charset it is in).

      Thanks for the advice.

      Unfortunately, there didn't seem to be any thing like charset or anything information about the encoding in the HTML. However, luckily (since I am a bit of an encoding wimp), it turned out that I just had to choose 'UTF8' instead of the default 'Windows ANSI' as 'file origin' when importing into Excel and everything was fine. Doh!

      loris


      "It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ." (from "Slow Loris" by Alexis Deacon)
Re^3: Convert & to & etc.
by Tux (Monsignor) on Feb 07, 2008 at 15:15 UTC

    If you are using Spreadsheet::WriteExcel, you can use its functionality directly:

    use Spreadsheet::WriteExcel; use HTML::Entities; use Encode qw( from_to ); from_to (decode_entities ($value), "utf-8", "ucs2"); $wks->write_unicode ($column, $row, $value);

    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://666794]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2014-10-01 10:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (6 votes), past polls