http://www.perlmonks.org?node_id=989629


in reply to Re^3: how to output file in Unicode
in thread how to output file in Unicode

Hi remiah

I run a Perl script to extract user attribute values from LDAP.

Extracted output file contains First Name and Last Name of German users.

Supposed Last Name should be displayed as "Björn", but it is displayed as "Björn", which seems to be unrecognizable.

I learned that the output file was created in ASCII format should be created in Unicode UTF-8 format in order the German character to be visible.

thanks

Replies are listed 'Best First'.
Re^5: how to output file in Unicode
by remiah (Hermit) on Aug 25, 2012 at 04:13 UTC

    Here I suppose the LDAP module that returns last name, gives you "decoded utf8 string". What do you see in log.txt file? and any warning message?

    my $lastname=''; #here put the return value of LDAP open( my $fh, ">:encoding(UTF-8)", "log.txt") or die $!; print $fh utf8::is_utf8($lastname) ? 'utf8 flagged,' : 'not utf8 flagg +ed,'; print $fh $lastname; close $fh;

    Some string (in this case LDAP)
     |
    LDAP module?
     |        <--- here LDAP returns what? is important information
    perl
     |
    utf8 text
    
    Do you see the manual of LDAP module has 'encoding' or 'binmode' section for this information?

      I have written this script below,

      the output file generated but i still seeing, German "Umlaut" in the csv output file. Example : Björn

      why the csv output file is not encoded into encoding(UTF-8)??, am i missed any steps here!!!!

      #!/usr/bin/perl use Net::LDAP; use Encode; use Net::LDAP::Entry; use Net::LDAP::Control::Paged; use Net::LDAP::Constant qw( LDAP_CONTROL_PAGED ); #$outputfile = "LicMan1.csv"; $outputfile = 'C:\Meta\MDS\data\out\LicMan1.csv'; #$inputfile = 'D:\Meta\MDS\data\out\LicMan1.csv'; $outputfile2 = 'C:\Meta\MDS\data\out\LicMan.csv'; open (FH, ">$outputfile") or die "$!"; print FH "import_id;import_data_source_id;login;import_level_2_id;impo +rt_location_id;last_name;first_name;email;is_active;language;remarks; +\n"; my @args = ( base => $base, scope => 'sub', filter => $query, attrs => \@attrs, control => [ $page ],); my $cookie; while (1) { $mesg = $ldap->search ( @args ) or die $!; while (my $entry = $mesg->shift_entry()) { #uid,dpwncrestcode,dpwnc,sn,givenname,mail,dhlaccountstatus,preferredl +anguage,dhlemployeetype my $entrydn = $entry->dn(); my $uid = $entry->get_value('uid'); my $dpwncrestcode = $entry->get_value('dpwncrestcode'); my $dpwnc = $entry->get_value('dpwnc'); my $sn = $entry->get_value('sn'); if (($sn eq "") || ($sn eq " ")){ $sn = "-"; } else { $sn = "$sn";} my $givenname = $entry->get_value('givenname'); if (($givenname eq "") || ($givenname eq " ")){ $givenname = "-"; } else { $givenname = "$givenname";} my $mail = $entry->get_value('mail'); my $dhlaccountstatus = $entry->get_value('dhlaccountstatus'); if ($dhlaccountstatus == "enabled") { $dhlaccountstatus = "1"; } else { $dhlaccountstatus = "0";} my $preferredlanguage = $entry->get_value('preferredlanguage'); if ($preferredlanguage ne "de") { $preferredlanguage = "en"; } else { $preferredlanguage = "de";} my $dhlemployeetype = $entry->get_value('dhlemployeetype'); if ($dhlemployeetype eq "") { $dhlemployeetype = ";"; } else { $dhlemployeetype = "$dhlemployeetype";} print FH "$uid;dpdhl_eds;$uid;$dpwncrestcode;$dpwnc;$sn;$givenname;$ma +il;$dhlaccountstatus;$preferredlanguage;$dhlemployeetype\n"; } } $ldap->unbind; close(FH); open (INPUT, "<:encoding(UTF-8)", $outputfile) or die "Cannot open fil +ename for input: $!"; while (<INPUT>) { s/"//g; $replace .= $_;} close INPUT or die "Cannot close filename: $!"; open (OUTPUT, ">:encoding(UTF-8)",$outputfile2) or die "Cannot open fi +lename for output: $!"; print OUTPUT $replace; close OUTPUT or die "Cannot close filename: $!";
        So, what's going on if first open, let it open with utf-8 io layer?
        open (FH, ">$outputfile") or die "$!";
        revise to
        open (OUTPUT, ">:encoding(UTF-8)",$outputfile or die "$!";
        What is the result?

        update:
        Second question: Is that $givenname?