Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?

by Khen1950fx (Canon)
on Apr 21, 2013 at 08:55 UTC ( #1029738=note: print w/ replies, xml ) Need Help??


in reply to How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?

To remove a BOM from a file, use String::BOM.

#!/usr/bin/perl -l use strict; use warnings; use String::BOM qw(strip_bom_from_file); my $file = '/path/to/file'; print strip_bom_from_file($file);
Prints 1 on success. Uses $! on failure.


Comment on Re: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?
Download Code
Re^2: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?
by Anonymous Monk on Apr 21, 2013 at 10:15 UTC
    Typical Khen1950fx , ignores the answer in the question, ignores the question, posts broken links
      Khen's link may be broken, but String::BOM is a good solution.

      Your criticism is easy, but not helpful.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

        Khen's link may be broken, but String::BOM is a good solution.

        Its more typing than using :via(File::BOM) and isn't any kind of solution to the question I asked

        Your criticism is easy, but not helpful.

        Yes, naturally, Khen1950fx keeps making the same mistake over and over again. His responses rarely helpful to anyone, but they're never helpful to me.

Re^2: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?
by Zzenmonk (Sexton) on Apr 23, 2013 at 07:45 UTC

    Hi,

    Encode::Guess does a fine job to detect the encoding. Read its documentation carefully on CPAN. To detect the encoding you can use something like:

    open ( IN, "<", yourfile); my $bigstring = ""; my @content = <IN>; foreach my $tmp (@content) { $bigstring .= $tmp; } print "My file content encoding is: ", Encode::Guess->guess($bigstring +)->name;

    Now you can decode and encode your data in the encoding you want. You need to have a strategy as to this matter. I recomment keeping it in UTF8 or 16 depending on the case. If you face BOM issues String::BOM is a good solution.

    The following might help further: http://perldoc.perl.org/perluniintro.html

    K

    The best medicine against depression is a cold beer!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1029738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (16)
As of 2014-04-24 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (565 votes), past polls