Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

MP3::Tag encoding problem

by mfearby (Initiate)
on Sep 21, 2008 at 07:00 UTC ( #712813=perlquestion: print w/replies, xml ) Need Help??

mfearby has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a script to loop through and fix the tags in my MP3 files before adopting Songbird and have come across a character encoding problem. MP3::Info can get the tag successfully, but since it can't write ID3v2 tags, I'm now using MP3::Tag. Only problem is that European characters are printing as "�" (that's a black diamond with a question-mark inside, in case it doesn't print properly).

Here's the relevant bit of code I'm using:

# $File::Find::name in this code snippet is as follows: # Mozart - Die Zauberflöte Act 2, 06. Arie. Der Hölle Rache kocht in m +einem Herzen.mp3 my $mp3 = MP3::Tag->new($File::Find::name); $mp3->get_tags; if (exists $mp3->{ID3v2}) { my $name = $mp3->{ID3v2}->get_frame("TIT2"); print "Title: $name\n"; }

The above print statement results in the following (the title tag is pretty much the same as the file name excluding "Mozart - "). I've put question-marks instead of the black diamond because the code tags are successfully substituting it for the correct entity:

Name: Die Zauberfl?te Act 2, 06. Arie. Der H?lle Rache kocht in meinem Herzen

I've also tried setting an environment variable, as per the documentation, in the hope that I might get it to use the correct encoding:

$ENV{'MP3TAG_DECODE_V2_DEFAULT'} = 'utf-8';

What I know about character encodings could be written in a very tiny text box, so maybe I'm doing something wrong. I have also tried setting the character encoding this way:

$mp3->config('decode_encoding_v2' => 'utf-8');

Which results in the following print-out:

Wide character in print at ./do.pl line 78. Name: Die Zauberfl?te Act 2, 06. Arie. Der H?lle Rache kocht in meinem + Herzen

I have also tried setting "decode_encoding_v2" and $ENV{'MP3TAG_DECODE_V2_DEFAULT'} to "iso-8859-1" and "iso-8859-2", being European encodings, which don't result in warnings, but they also still print out the black diamond with the question mark. I can also use "utf-8" and "utf8" and it doesn't seem to complain. Using "utf-16" results in the following error/warning in the console:

UTF-16:Unrecognised BOM 4469 at /usr/lib/perl5/5.10.0/i386-linux-thread-multi/Encode.pm line 162.

I've also tried setting both to "latin1" to no avail, too. I haven't got a clue why some of my <code> blocks seem to wrap text with a red plus sign at the beginning of the next line, and not in others.

Replies are listed 'Best First'.
Re: MP3::Tag encoding problem
by Anonymous Monk on Sep 21, 2008 at 08:39 UTC
    `perldoc diagnostics'
    `perldoc splain'
    $ echo Wide character in print at ./do.pl line 78. |splain Wide character in print at ./do.pl line 78. (#1) (W utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The eas +iest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.
    Also see perluniintro

      Thanks for the info. I did try "binmode STDOUT, ':utf8';" at the top of my script but it made no difference. I didn't think a simple "o" with an umlaut was unicode anyway. Isn't it a simple ASCII character? Maybe I shouldn't even be specifying utf8 the way I am?

      Anyway, I'm obviously missing something or I'm on the wrong track because with MP3::Tag, printing stuff with such characters results in black diamonds where as printing the same information, from the same MP3 file, retrieved using MP3::Info does not.

        I did try "binmode STDOUT, ':utf8';" at the top of my script but it made no difference.

        Yes it did, it got rid of Wide character in print at ./do.pl line 78.

Re: MP3::Tag encoding problem
by graff (Chancellor) on Sep 21, 2008 at 23:07 UTC
    According to this source: http://www.id3.org/id3v2.3.0, it seems most likely that your mp3 tag strings are iso-8859-1. To get them to appear properly in a text window, it depends on the nature of the text window.

    Try this little experiment on the command line in the same window where you want to see the tag text displayed correctly:

    perl -le 'print "\xa1"'
    If you see an inverted exclamation mark, your terminal window works with iso-8859-1. If you get a question mark instead, try this next:
    perl -CS -le 'print "\xa1"'
    If you now see the inverted exclamation mark, you now know that your terminal wants utf8.

    For an 8859-based display, perl should probably do nothing to the tag text before printing it. But I doubt this is the situation, because I don't think you would have been seeing "?" in your tag text if this were the case.

    For a utf8-based display, it should sufficed to do  binmode STDOUT, ":utf8"; which will automatically (and quietly) "upgrade" the 8859-1 text to utf8 when printing to STDOUT.

    If you are storing the tag text to a file, and are seeing question marks when looking at the file contents, it's the same basic issue. Use binmode on that file handle instead of STDOUT.

      Your knowledge of unicode is impressive, and I wish to subscribe to your newsletter :-)

      I saw the upside-down question mark for the second command, confirming that my terminal (Konsole) wants utf8. It also seems that specifying the binmode and commenting-out my misguided attempts at trying to solve the problem now means I'm seeing umlauts above my o's.

      Thank you very much!

        Not sure if you have an answer to your question. I've run into the same situation with the 'black/inverse' question mark. I'm no perl guru, as you will see in the post: http://perlguru.com/gforum.cgi?post=34634 There you will see where I use MP3::Tag to write versions 1 and 2 mp3 tags. If you comment-out the id3v1 calls (three lines), tags for Artist, Albums and Tracks appear correctly in the tags. My issue was with items like Queensr˙che, Mötley Crüe, etc.
        I've had a similar problem and for me the solution was to identify binary/character data and handle it appropriately.
        my $mp3 = MP3::Tag->new($t);
        The module returns a character string (Dec '08); so that's OK. Make sure any applications writing the tags encode properly. For linux, Easytag seems to work very well.
        my $tag_dir = "/music/$a_artist/$a_name";
        What you want as a human but no good for mkpath() et al
        my $binary_tag_dir = encode_utf8($tag_dir);
        ah, now this is mkpath()-able

        Now, File::Find (properly) returns a binary string so needs decoding. Assuming your filesystem uses utf8 encoding:

        my $char_file_find_dir = decode("utf8",$File::Find::dir);
        At this point you can print and compare $char_file_find_dir and $tag_dir.

        You can also compare and do filename tests etc with $binary_tag_dir and $File::Find::dir.

        When printing (including debugging) I had:

        binmode STDOUT, ":utf8";
        This tells perl that my terminal is utf8 aware and to print accented characters appropriately.

        You should also encode() the binary strings before printing them if you want to read them (or not if you want to 'od' them)

        HTH

        Corretcions welcome ;)

Re: MP3::Tag encoding problem
by Anonymous Monk on Sep 21, 2008 at 08:36 UTC
    I haven't got a clue why some of my <code> blocks seem to wrap text with a red plus sign at the beginning of the next line, and not in others.
    Its a feature. See PerlMonks FAQ, Markup in the Monastery for details
      Thanks. I found the section allowing me to disable code wrapping. I guess the fact that one of the lines didn't wrap like the rest made me think I somehow caused it.
Re: MP3::Tag encoding problem
by kubrat (Scribe) on Sep 21, 2008 at 20:16 UTC

    Just a stab in the dark. Do you have use utf8; at the beginning of your script?

      No. I just tried that (both with and without my two other methods of saying it's utf8) and get the same diamonds with question marks. Thanks, though.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://712813]
Approved by lamp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2022-12-08 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?