Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^5: HTML::Parser, file, print to Terminal

by moritz (Cardinal)
on Jul 13, 2010 at 14:38 UTC ( [id://849288]=note: print w/replies, xml ) Need Help??


in reply to Re^4: HTML::Parser, file, print to Terminal
in thread HTML::Parser, file, print to Terminal

The snippet you show is encoded in UTF-8.

Next step: determine the encoding of the file in which umlauts display correctly on your terminal.

Or even better: configure a clean UTF-8 enivronment.

I suppose the confusion lies in => if I create the file, I get my Latin-1. If I didn't create the file, there is only ASCII.

I'm confused indeed. If you don't create a file, it doesn't exist, neither with ASCII nor with UTF-8.

Speaking of confusion, I think you try to achieve too much in one step. For example the title of your question metions HTML::Parser, which doesn't appear in the posting at all.

So, small steps:

  • Make sure you know which encoding your terminal understands. There's no point in proceeding before you have done this step.
  • Find out what encodings your source files are. Seems to be UTF-8.
  • In your perl scripts, decode everything coming from the outside (except when a module does it for you), and encode everything. use utf8;, and write your program files in UTF-8.
  • If something doesn't work, find out where you violate any of the points of the previous steps.
Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^6: HTML::Parser, file, print to Terminal
by victor_charlie (Novice) on Jul 13, 2010 at 19:37 UTC

    Okay, this DOES work...

    #!/usr/bin/perl -w # legaget.pl use strict; use Encode; my $filename = "engleword.html"; open FILE, "<", $filename or die $1; while( my $line = <FILE> ) { print encode( "utf8",$line); } close(FILE);

    What I have learned...

    • use utf8; is for Unicode source code, filenames, deals with legacy stuff, not for encoding.
    • I still have to grab the html and write to a file, I would still like to encode the string in place. Maybe later.

    I have come across this encode problem as a graphic artist. Customers used MSWord to generate text and then pasted the resulting text into html, or Adobe Pagemaker, PDF, etc. and everything is just hunky-dory on a WinBox, but on a Mac or Linux the results had missing characters. MS was late adopting Unicode. MS thought they had another answer with OpenType (I think it was) a fonts technology in partnership with Adobe. That fell apart. But in pre-XP MS text products the first byte set the encode for the text file. I used to have a FreeWare program on the PC that manually changed that byte.

    Forgive me, I worked on this silly problem all day, but I'm loving Perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://849288]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-24 17:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found