Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hello fellow Monks,

I am looking for your advice on updating and my implemented module for encoding and decoding multiple formats. I wrote the module and tried to include as many formats I could. I know that there other formats that I have not added but in my case during the encoding decoding process has to be also converted to hex and vise versa, where I found problems with more formats that I have not included on my sample of code.

The whole idea behind the module, I am working for a telecommunication company and part of my daily job is to correct problems. The languages can vary globally since it is a live network with live customers and the format is in hex on a variety of encoding patterns. I had some cases that I had to create small scripts to process the packages before and after the nodes so I can observe encoding corruptions or not. Sample of previous questions that I was working that are similar with the module (Chinese to Hex and Hex to Chinese, Arabic to Hex and Hex to Arabic). After seeing my self that I need more and more encodings for more and more languages I end up saying that I need to write a simple module to do that for me instead of creating more or less the same code again and again.

So having said that, sample of code as the user would use the module based on the encodings that can be handled:

#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Cwd qw();
use Data::Dumper;
use feature 'say';

=alternative
BEGIN {
    push ( @INC, Cwd::cwd());
}
=cut

use Cwd qw();
use lib (Cwd::cwd());

use Foo::Bar qw(ascii2hexEncode hex2ascciiDecode hexOutput);

binmode( STDOUT, ':utf8' );

my @lanquages = qw(
    Chinese
    Japanese
    Russian
    Greek
    Arabic
    );

my @strs = (
    "這是一個測試",
    "これはテストです",
    "Это тест",
    "Αυτό είναι ένα τεστ",
    "هذا اختبار"
    );

my @flags = (
    'UCS-2',
    'UCS-2BE',
    'UCS-2LE',
    'UTF-7',
    'UTF-8',
    'utf-8-strict',
    'UTF-16',
    'UTF-16BE',
    'UTF-16LE',
    'UTF-32',
    'UTF-32BE',
    'UTF-32LE',
    );

my %hashOutput;
while ( defined ( my $flag = shift @flags ) ) {
    for ( 0 .. $#lanquages ) {
	my $hexEncoded = ascii2hexEncode($flag, $strs$_);
	say $lanquages$_ . " " . $flag;
	print Dumper hexOutput($flag, $strs$_);
	say hex2ascciiDecode($flag, $hexEncoded);
	say "";
	# $hashOutput{$flag}{$lanquages$_} = {
	# 'hex' => hexOutput($flag, $strs$_),
	# 'ascci' => hex2ascciiDecode($flag, $hexEncoded),
        # }
    }

}
# print Dumper \%hashOutput;

__END__

Arabic UCS-2
$VAR1 = [
          '06 47 06 30 06 27 00 20 06 27',
          '06 2e 06 2a 06 28 06 27 06 31'
        ]
هذا اختبار

Arabic UTF-8
$VAR1 = [
          'd9 87 d8 b0 d8 a7 20 d8 a7 d8',
          'ae d8 aa d8 a8 d8 a7 d8 b1'
        ]
هذا اختبار

Arabic utf-8-strict
$VAR1 = [
          'd9 87 d8 b0 d8 a7 20 d8 a7 d8',
          'ae d8 aa d8 a8 d8 a7 d8 b1'
        ]
هذا اختبار

Arabic UTF-16
$VAR1 = [
          'fe ff 06 47 06 30 06 27 00 20',
          '06 27 06 2e 06 2a 06 28 06 27',
          '06 31'
        ]
هذا اختبار

Arabic UTF-16BE
$VAR1 = [
          '06 47 06 30 06 27 00 20 06 27',
          '06 2e 06 2a 06 28 06 27 06 31'
        ]
هذا اختبار

Arabic UTF-32
$VAR1 = [
          '00 00 fe ff 00 00 06 47 00 00',
          '06 30 00 00 06 27 00 00 00 20',
          '00 00 06 27 00 00 06 2e 00 00',
          '06 2a 00 00 06 28 00 00 06 27',
          '00 00 06 31'
        ]
هذا اختبار

Arabic UTF-32LE
$VAR1 = [
          '47 06 00 00 30 06 00 00 27 06',
          '00 00 20 00 00 00 27 06 00 00',
          '2e 06 00 00 2a 06 00 00 28 06',
          '00 00 27 06 00 00 31 06 00 00'
        ]
هذا اختبار

The actual module, that I still have not found a good name to apply. Any ideas for naming please feel free to propose.

package Foo::Bar; # use utf8; use strict; use warnings; use Exporter qw(import); use Encode qw(decode encode); use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); $VERSION = 1.00; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(ascii2hexEncode hex2ascciiDecode hexOutput); # binmode( STDOUT, ':utf8' ); sub _ascii2hex { return unpack("H*", $_[0]); } sub _hex2ascii { return pack("H*", $_[0]); } sub hexOutput { my ( $flag , $data ) = @_; my $octet = ascii2hexEncode( $flag , $data ); # trim leading and trailing white space # split string every two characters # join the splitted characters with white space $octet = join(' ', split(/(..)/, $octet)) =~ s/^\s+|\s+$//r =~ y/ / /rs; # insert new line character every 30 characters # join("\n", unpack('(A30)*', $octet)); push my @aref, , unpack('(A30)*', $octet); return \@aref; } sub ascii2hexEncode { my ( $flag , $data ) = @_; my $octet = encode( $flag , $data ); return _ascii2hex( $octet ); } sub hex2ascciiDecode { my ( $flag , $data ) = @_; my $hex2ascciiOctet = _hex2ascii( $data ); return decode( $flag , $hex2ascciiOctet ); } 1;

The module by it self is extremely simple, but at the same time on my position and for my colleagues is extremely useful. Any suggestions on code or any other improvement please feel free to suggest.

Hope this tiny module will help others also.

BR, Thanos

Seeking for Perl wisdom...on the process of learning...not there...yet!

In reply to Encoding Decoding on multiple formats RFC by thanos1983

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (6)
    As of 2017-12-12 07:33 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      What programming language do you hate the most?




















      Results (327 votes). Check out past polls.

      Notices?