Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Win32::OLE with non-ANSI data

by freonpsandoz (Beadle)
on Mar 25, 2022 at 20:39 UTC ( #11142423=perlquestion: print w/replies, xml ) Need Help??

freonpsandoz has asked for the wisdom of the Perl Monks concerning the following question:

I'm having trouble passing non-ANSI data from a Perl script to the dBpoweramp Music Converter using its COM interface. Perl strings from a UTF-8 script end up expanded instead of UTF-16 encoded (e.g. \xc3 \xbc from the script appears in an MP3 tag as \xc3 \x00 \xbc \x00 instead of x\fc \x00.) Can someone please suggest how I can test this further to see whether the problem is in Win32::OLE? Are there any standard COM objects other than Microsoft Excel that are commonly used for testing this module? Thanks.

use v5.16; use strict; use warnings; use utf8; use Win32::OLE (); my $filename = shift or die "No file specified\n"; my $dmcconverter = Win32::OLE->new('dMCScripting.Converter') or die "Can't create dMCScripting.Converter object: $!\n"; $dmcconverter->WriteIDTag( $filename, 'artist', 'The Crüxshadows' );

UPDATE: I have verified that VB Script can read UTF-8 data from a file and set the ID tag correctly, so I believe the problem is in Win32::OLE. If someone can tell me how I might work around this problem (e.g. with Win32::OLE::Variant) I would be grateful. Thanks.

Replies are listed 'Best First'.
Re: Win32::OLE with non-ANSI data (use Win32::OLE qw(CP_UTF8);
by Anonymous Monk on Mar 26, 2022 at 09:39 UTC

      Thanks, that seems to have worked.

      Reading the Win32::OLE CP option documentation more carefully, I see why I was confused. It refers to "translations between Perl strings and Unicode strings," but it really means "translations between octets and OLE Unicode strings," doesn't it? Does the CP_UTF8 option mean "strings that are in Perl's internal representation," or should Perl strings be encoded to and decoded from UTF-8 octets when passing data to/from this interface? Thanks.

Re: Win32::OLE with non-ANSI data
by ikegami (Patriarch) on Mar 27, 2022 at 03:48 UTC
Re: Win32::OLE with non-ANSI data
by freonpsandoz (Beadle) on Mar 29, 2022 at 21:35 UTC

    It gets weirder. The CP_UTF8 option seems to work for data input to the OLE object, but not for returned data. It appears that data is returned in CP_ACP if possible, and only returned as a Perl string if conversion to CP_ACP fails. Nothing seems to be returned to indicate to the caller how the returned data is encoded. Please check whether I'm missing something or whether this is a bug. In test1.mp3, the 'artist' tag is "The Crüxshadows" and in test2.mp3 it's the Cyrillic text "Издатель." Is there a way for me to supply the files I'm using for testing? Thanks.

    use strict; use warnings; use Encode qw( is_utf8 ); use Win32::OLE (); Win32::OLE->Option ( CP => Win32::OLE::CP_UTF8 ); binmode( STDOUT, ':raw' ); my $filename = shift or die "No file specified\n"; my $dmcconverter = Win32::OLE->new('dMCScripting.Converter') or die "Can't create dMCScripting.Converter object: $!\n"; my $data = $dmcconverter->AudioProperties($filename); printf( STDERR "The UTF-8 flag for converter output is %d\n", is_utf8( +$data) // 0 ); print "$data"; d:\Mp3\Encode>perl -S test-ole-out.pl D:\Mp3\Encode\test1.mp3 >test1.t +xt The UTF-8 flag for converter output is 0 d:\Mp3\Encode>perl -S test-ole-out.pl D:\Mp3\Encode\test2.mp3 >test2.t +xt The UTF-8 flag for converter output is 1 Wide character in print at D:\Batch/test-ole-out.pl line 15.

    UPDATE: I just realized that part of the weirdness is in how Perl represents strings internally. I had been led to believe that it was (almost) UTF-8, but that doesn't seem to be the case. If I read the string "The Crüxshadows" from a file in raw mode, the string data is UTF-8 octets, with the UTF-8 flag off. If I read the same data with an ":encoding(UTF-8)" layer specified, the string data is cp1252 octets with the UTF-8 flag on.

        I see now that "by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string." I hadn't seen that before. I was under the impression that if the utf8 flag is not set, the string consists of octets that should be decoded. It now appears that this impression was incorrect. That brings me back to the question: Exactly what does the Win32:OLE documentation mean when it talks about the CP option for "translations between Perl strings and Unicode strings?" Does the CP_UTF8 option actually mean "character strings in Perl's internal format?" Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11142423]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2022-09-25 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (116 votes). Check out past polls.

    Notices?