Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Problem while Data Extraction

by jpvinny28 (Novice)
on Dec 04, 2012 at 02:59 UTC ( [id://1006981]=perlquestion: print w/replies, xml ) Need Help??

jpvinny28 has asked for the wisdom of the Perl Monks concerning the following question:

I have patterns in a file which looks like this:

|30 14 06 03 55 04 03 14 0D 2A|.dropbox.com

User-Agent|3A|Google|20 20|Desktop

The data between "|" need to be saved as it is without the spaces and data outside pipes need to be converted to ASCII. I tried to use regexes but wasn't successful, so I tried character by character approach. But I'm not able to achieve the goal. The code is as below

#!/usr/bin/perl use strict; #use warnings; sub ascii_to_hex ($) { ## Convert each ASCII character to a two-digit hex number. (my $str = shift) =~ s/(.|\n)/sprintf("%02lx", ord $1)/eg; return $str; } my @a_out; my @a_str; my @b_str; my $str; my $index="0"; my $count; my $flag; my $file= 'rules.txt'; my $statenum="0"; my $num="0"; my $file1= 'XXX.txt'; my $file2= 'happ.txt'; my $file3= 'hex.txt'; my $match=quotemeta ("|"); my $space=quotemeta (" "); open INFILE, "<$file" or die "can't open file: $!"; open OUTFILE2, ">$file3" or die "can't open file: $!"; open OUTFILE, ">$file1" or die "can't open file: $!"; open OUTFILE1, ">$file2" or die "can't open file: $!"; while( my $line = <INFILE> ) { $num=$num+1; my $digits='0'; my $index2=length($line); print ("Processing Rule",$num,"\n"); @a_str = split(//, $line); print (OUTFILE1 @a_str,"\n"); foreach(@a_str) { chomp($_); print(OUTFILE2 $a_str[$index],"\n"); print(($a_str[$index]),"\n"); if ($count=='0') { if (($a_str[$index])==$match) {$count='1'; ("count=",$count,"START OF PIPE:",$a_str[$index], +"\n");} elsif (($a_str[$index])==$space) { print (OUTFILE $a_str[$index]);print ("SAVING ASCII SPACE:", +$a_str[$index],"\n");} else { my $hex = ascii_to_hex($a_str[$index]); print (OUTFILE $hex); print ("SAVING ASCII ELEMENTS OU +TSIDE PIPE",$hex,"\n"); } } elsif($count=='1') { if (($a_str[$index])==$match){ $count='0'; print("count=",$coun +t,"END OF PIPE:", $a_str[$index],"\n");} elsif (($a_str[$index])==$space){ print ($a_str[$index],"SKIPPI +NG BINARY SPACE","\n");} else { print (OUTFILE $a_str[$index]); print ("SAVING ELEMENTS IN BET +W PIPE",$a_str[$index],"\n"); } } $index=$index+1; print($index,"\t" ); } print ("\n"); print (OUTFILE1 "\n"); print (OUTFILE "\n"); print (OUTFILE2 "\n"); } close INFILE or die "can't close file: $!"; close OUTFILE1 or die "can't close file: $!"; close OUTFILE2 or die "can't close file: $!"; close OUTFILE or die "can't close file: $!";

Replies are listed 'Best First'.
Re: Problem while Data Extraction
by GrandFather (Saint) on Dec 04, 2012 at 03:24 UTC

    Don't turn off warnings, instead understand and fix the warnings.

    Don't use prototype subs - they don't do what you think and will probably bite you.

    If I understand what you want to do correctly the following should help:

    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { my @parts = split /(\|[^\|]*\|)/; for my $part (@parts) { if ($part =~ /\|(.*)\|/) { my @chars = $part =~ /([\da-f]+)/ig; $part = join '', map{chr(hex($_))} @chars; } print $part; } } __DATA__ |30 31 32 33 34|.dropbox.com User-Agent|3A|Google|20 20|Desktop

    Prints:

    01234.dropbox.com User-Agent:Google Desktop
    True laziness is hard work
Re: Problem while Data Extraction
by GrandFather (Saint) on Dec 04, 2012 at 03:42 UTC

    It may be that on my first reading of your node I understood it backwards, in which case the following code is more likely useful:

    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { chomp; my @parts = split /(\|[^\|]*\|)/; for my $part (@parts) { if ($part =~ /\|(.*)\|/) { $part = join '', $part =~ /([\da-f]+)/ig; } else { $part = join '', map {sprintf '%02x', ord $_} split '', $p +art; } print $part; } print "\n"; } __DATA__ |30 31 32 33 34|.dropbox.com User-Agent|3A|Google|20 20|Desktop

    Prints:

    30313233342e64726f70626f782e636f6d 557365722d4167656e743A476f6f676c6520204465736b746f70
    True laziness is hard work

      Thank You guys.. YOU GUYS ARE AWESOME.. :)

      I improvised your previous code(Although not efficient as yours).. and was able to do it.. THANKS A LOT..

      while (<INFILE>) { chomp($_); my @parts = split /(\|[^\|]*\|)/; for my $part (@parts) { if ($part =~ /\|(.*)\|/) { $part=~s/\|//gi; $part=~ tr/ //ds; print ($part); } else { my $hex = ascii_to_hex($part); print ($hex); } } print("\n"); }
Re: Problem with Data Extraction
by Athanasius (Archbishop) on Dec 04, 2012 at 04:39 UTC

    Hello jpvinny28, and welcome to the Monastery!

    In addition to the excellent advice given by GrandFather, above (don’t turn off warnings, avoid subroutine prototypes), here is some general coding advice that will help you in the long run:

    • Declare variables as late as possible. Your code contains a number of variables that are never used at all. Moreover, @a_str is re-written at the start of each iteration of the while loop, so it would be better to declare it at that point:

      my @a_str = split //, $line;
    • Don’t stringify numbers unless you need to (and in this case, you don’t!):

    • elsif ($count == 1) { if (...) { $count = 0; ...
    • Prefer the 3-argument form of open, and use lexical filehandles:

      open(my $INFILE, "<", $file) or die "can't open file '$file' for reading: $!";
    • Choose an indentation style, and stick to it! See perlstyle.

    • Use consistent variable names. For example, you have a filehandle OUTFILE2 opened for writing on a file named $file3. That’s a maintenance nightmare just waiting to bite you!

    Hope that helps,

    Athanasius <°(((><contra mundum

      Thank You for the advice.. :)
Re: Problem while Data Extraction
by Anonymous Monk on Dec 04, 2012 at 03:24 UTC

    The data between "|" need to be saved as it is without the spaces and data outside pipes need to be converted to ASCII.

    Can you explain this better, like give an example?

      Input =|14 0D 2A|abc

      Output =140D2A616263

      The values within two pipes is hexadecimal data.. so I need to keep the values as it is in output and value outside the pipe I need to convert to ascii equivalent (abc=61,62,63) in the above example

        my @F = split(/\|/, $_, 0); foreach $_ (@F) { if (/^[\da-z]{2}(?:\s[\da-z]{2})*$/i) { s/\s+//g; print $_; } else { print unpack('H*', $_), $/; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1006981]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-03-19 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found