Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hello fellow Monks,

I am looking for your advice on updating and my implemented module for encoding and decoding multiple formats. I wrote the module and tried to include as many formats I could. I know that there other formats that I have not added but in my case during the encoding decoding process has to be also converted to hex and vise versa, where I found problems with more formats that I have not included on my sample of code.

The whole idea behind the module, I am working for a telecommunication company and part of my daily job is to correct problems. The languages can vary globally since it is a live network with live customers and the format is in hex on a variety of encoding patterns. I had some cases that I had to create small scripts to process the packages before and after the nodes so I can observe encoding corruptions or not. Sample of previous questions that I was working that are similar with the module (Chinese to Hex and Hex to Chinese, Arabic to Hex and Hex to Arabic). After seeing my self that I need more and more encodings for more and more languages I end up saying that I need to write a simple module to do that for me instead of creating more or less the same code again and again.

So having said that, sample of code as the user would use the module based on the encodings that can be handled:

#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Cwd qw();
use Data::Dumper;
use feature 'say';

=alternative
BEGIN {
    push ( @INC, Cwd::cwd());
}
=cut

use Cwd qw();
use lib (Cwd::cwd());

use Foo::Bar qw(ascii2hexEncode hex2ascciiDecode hexOutput);

binmode( STDOUT, ':utf8' );

my @lanquages = qw(
    Chinese
    Japanese
    Russian
    Greek
    Arabic
    );

my @strs = (
    "這是一個測試",
    "これはテストです",
    "Это тест",
    "Αυτό είναι ένα τεστ",
    "هذا اختبار"
    );

my @flags = (
    'UCS-2',
    'UCS-2BE',
    'UCS-2LE',
    'UTF-7',
    'UTF-8',
    'utf-8-strict',
    'UTF-16',
    'UTF-16BE',
    'UTF-16LE',
    'UTF-32',
    'UTF-32BE',
    'UTF-32LE',
    );

my %hashOutput;
while ( defined ( my $flag = shift @flags ) ) {
    for ( 0 .. $#lanquages ) {
	my $hexEncoded = ascii2hexEncode($flag, $strs$_);
	say $lanquages$_ . " " . $flag;
	print Dumper hexOutput($flag, $strs$_);
	say hex2ascciiDecode($flag, $hexEncoded);
	say "";
	# $hashOutput{$flag}{$lanquages$_} = {
	# 'hex' => hexOutput($flag, $strs$_),
	# 'ascci' => hex2ascciiDecode($flag, $hexEncoded),
        # }
    }

}
# print Dumper \%hashOutput;

__END__

Arabic UCS-2
$VAR1 = [
          '06 47 06 30 06 27 00 20 06 27',
          '06 2e 06 2a 06 28 06 27 06 31'
        ]
هذا اختبار

Arabic UTF-8
$VAR1 = [
          'd9 87 d8 b0 d8 a7 20 d8 a7 d8',
          'ae d8 aa d8 a8 d8 a7 d8 b1'
        ]
هذا اختبار

Arabic utf-8-strict
$VAR1 = [
          'd9 87 d8 b0 d8 a7 20 d8 a7 d8',
          'ae d8 aa d8 a8 d8 a7 d8 b1'
        ]
هذا اختبار

Arabic UTF-16
$VAR1 = [
          'fe ff 06 47 06 30 06 27 00 20',
          '06 27 06 2e 06 2a 06 28 06 27',
          '06 31'
        ]
هذا اختبار

Arabic UTF-16BE
$VAR1 = [
          '06 47 06 30 06 27 00 20 06 27',
          '06 2e 06 2a 06 28 06 27 06 31'
        ]
هذا اختبار

Arabic UTF-32
$VAR1 = [
          '00 00 fe ff 00 00 06 47 00 00',
          '06 30 00 00 06 27 00 00 00 20',
          '00 00 06 27 00 00 06 2e 00 00',
          '06 2a 00 00 06 28 00 00 06 27',
          '00 00 06 31'
        ]
هذا اختبار

Arabic UTF-32LE
$VAR1 = [
          '47 06 00 00 30 06 00 00 27 06',
          '00 00 20 00 00 00 27 06 00 00',
          '2e 06 00 00 2a 06 00 00 28 06',
          '00 00 27 06 00 00 31 06 00 00'
        ]
هذا اختبار

The actual module, that I still have not found a good name to apply. Any ideas for naming please feel free to propose.

package Foo::Bar; # use utf8; use strict; use warnings; use Exporter qw(import); use Encode qw(decode encode); use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); $VERSION = 1.00; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(ascii2hexEncode hex2ascciiDecode hexOutput); # binmode( STDOUT, ':utf8' ); sub _ascii2hex { return unpack("H*", $_[0]); } sub _hex2ascii { return pack("H*", $_[0]); } sub hexOutput { my ( $flag , $data ) = @_; my $octet = ascii2hexEncode( $flag , $data ); # trim leading and trailing white space # split string every two characters # join the splitted characters with white space $octet = join(' ', split(/(..)/, $octet)) =~ s/^\s+|\s+$//r =~ y/ / /rs; # insert new line character every 30 characters # join("\n", unpack('(A30)*', $octet)); push my @aref, , unpack('(A30)*', $octet); return \@aref; } sub ascii2hexEncode { my ( $flag , $data ) = @_; my $octet = encode( $flag , $data ); return _ascii2hex( $octet ); } sub hex2ascciiDecode { my ( $flag , $data ) = @_; my $hex2ascciiOctet = _hex2ascii( $data ); return decode( $flag , $hex2ascciiOctet ); } 1;

The module by it self is extremely simple, but at the same time on my position and for my colleagues is extremely useful. Any suggestions on code or any other improvement please feel free to suggest.

Hope this tiny module will help others also.

BR, Thanos

Seeking for Perl wisdom...on the process of learning...not there...yet!

In reply to Encoding Decoding on multiple formats RFC by thanos1983

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2024-03-28 10:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found