Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Computing the percentage of certain characters in a file

by AnomalousMonk (Archbishop)
on Aug 04, 2014 at 23:38 UTC ( [id://1096225]=note: print w/replies, xml ) Need Help??


in reply to Computing the percentage of certain characters in a file

How many  'ff' character pair sequences are in the string  qq{\x0f\xf0\xff} after you've unpack-ed the string?

c:\@Work\Perl\monks\perl -wMstrict -le "my $file = qq{\x0f\xf0\xff}; my $data = unpack( 'H*', $file ); my $count =()= $data =~ /ff/g; print $count; " 2

If you want to count the number of  0xff characters in the raw file, maybe better to concentrate on 0xff:

c:\@Work\Perl\monks>perl -wMstrict -le "my $file = qq{\x0f\xf0\xff}; my $count =()= $file =~ /\xff/g; print $count; " 1
Or perhaps better with  tr/// (update: see Quote-Like Operators in perlop):
c:\@Work\Perl\monks>perl -wMstrict -le "my $file = qq{\x0f\xf0\xff}; my $count = $file =~ tr/\xff//; print $count; " 1

Update: I haven't checked this, but if you're running under Windoze, there may be a problem arising from the fact that Windose uses a  \x0d\x0a character pair to represent a newline in a file, but this may be translated into a  \n (newline) single character when the file is read depending on the read mode being used, e.g., binmode. stat will report the number of characters the operating system sees, i.e., the number before any file-read translation, and this may throw your calculation off a bit versus what the HxD hex editor (whatever that is and however it works) reports.

Replies are listed 'Best First'.
Re^2: Computing the percentage of certain characters in a file
by james28909 (Deacon) on Aug 05, 2014 at 02:24 UTC
    yes it works great, and actually, it seems HxD rounds to the nearest thousandth. now i am computing 10.4279696941376 % while hxd is computing 10.43 %

    Also the tr/\xff// method is ALOT faster so thank you for sharing that little bit of info :)
      and also i want to thank you guys and gals for helping me thru my way, i have learned a good bit but i am far from where i want to be. but thanks again for evenryones help :)
Re^2: Computing the percentage of certain characters in a file
by james28909 (Deacon) on Aug 06, 2014 at 00:02 UTC
    tr/// works very quickly and is what i need, but after some searching around, i found it is not possible to use a variable with tr/// as in "tr/\$_//". so that means in order to get statistics of the whole file, i am going to have to write out =~tr/\x00// all the way thru =~ tr/\xff// lol 255 different instances. not a big deal but i could have made what i have now into a subroutine and passed each element out of an array (x00 - xff) to tr///.

      It sounds like this might have been an XY Problem: "How do I count the occurrences of each character in a string/file/etc?" (Caution: The following solution only works for byte characters (i.e., 1 byte == 1 character), not Unicode characters.)

      c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'A man, a plan, a canal: Panama!'; ;; my %freq; ++$freq{ substr $s, $_, 1 } for 0 .. length($s) - 1; ;; printf qq{'$_' (0x%02x) == $freq{$_} (%6.3f%%) \n}, ord, $freq{$_} / length($s) * 100 for sort { ord($a) <=> ord($b) } keys %freq; " ' ' (0x20) == 6 (19.355%) '!' (0x21) == 1 ( 3.226%) ',' (0x2c) == 2 ( 6.452%) ':' (0x3a) == 1 ( 3.226%) 'A' (0x41) == 1 ( 3.226%) 'P' (0x50) == 1 ( 3.226%) 'a' (0x61) == 9 (29.032%) 'c' (0x63) == 1 ( 3.226%) 'l' (0x6c) == 2 ( 6.452%) 'm' (0x6d) == 2 ( 6.452%) 'n' (0x6e) == 4 (12.903%) 'p' (0x70) == 1 ( 3.226%)

        i want you to know... i wrote out 1853 lines to achieve what you were able to do in 7 lines lol. and not only that, but on a larger 256 MB file, your code is 3 secs faster(39 secs) as opposed to mine (42 secs). tho i am not sure how HxD is able to do it, but it gets these same statistics on a 256 MB file in 3-4 secs flat.

        But, i am going to study this code because it would have saved me hours if i knew exactly how do it this way in the first place. and thanks for sharing :)
        my $file = read_file($ARGV[0], { binmode => ':raw' }); my $x00 = $file =~ tr/\x00//; my $size = (stat($ARGV[0]))[7]; my $dec = $x00/$size; my $pc = ($dec*100); my $percentage00 = sprintf("%.2f", $pc); ..... same code, but tr/\x01// thru tr/\xfe// ..... my $xff = $file =~ tr/\xff//; my $dec = $xff/$size; my $pc = ($dec*100); my $percentageff = sprintf("%.2f", $pc); close($file); my $sum = $percentage01+$percentage02+$percentage03+$percentage04+$p +ercentage05+$percentage06+ $percentage07+$percentage08+$percentage09+$percentage0a+$p +ercentage0b+$percentage0c+ $percentage0d+$percentage0e+$percentage0f+$percentage10+$p +ercentage11+$percentage12+ $percentage13+$percentage14+$percentage15+$percentage16+$p +ercentage17+$percentage18+ $percentage19+$percentage1a+$percentage1b+$percentage1c+$p +ercentage1d+$percentage1e+ $percentage1f+$percentage20+$percentage21+$percentage22+$p +ercentage23+$percentage24+ $percentage25+$percentage26+$percentage27+$percentage28+$p +ercentage29+$percentage2a+ $percentage2b+$percentage2c+$percentage2d+$percentage2e+$p +ercentage2f+$percentage30+ $percentage31+$percentage32+$percentage33+$percentage34+$p +ercentage35+$percentage36+ $percentage37+$percentage38+$percentage39+$percentage3a+$p +ercentage3b+$percentage3c+ $percentage3d+$percentage3e+$percentage3f+$percentage40+$p +ercentage41+$percentage42+ $percentage43+$percentage44+$percentage45+$percentage46+$p +ercentage47+$percentage48+ $percentage49+$percentage4a+$percentage4b+$percentage4c+$p +ercentage4d+$percentage4e+ $percentage4f+$percentage50+$percentage51+$percentage52+$p +ercentage53+$percentage54+ $percentage55+$percentage56+$percentage57+$percentage58+$p +ercentage59+$percentage5a+ $percentage5b+$percentage5c+$percentage5d+$percentage5e+$p +ercentage5f+$percentage60+ $percentage61+$percentage62+$percentage63+$percentage64+$p +ercentage65+$percentage66+ $percentage67+$percentage68+$percentage69+$percentage6a+$p +ercentage6b+$percentage6c+ $percentage6d+$percentage6e+$percentage6f+$percentage70+$p +ercentage71+$percentage72+ $percentage73+$percentage74+$percentage75+$percentage76+$p +ercentage77+$percentage78+ $percentage79+$percentage7a+$percentage7b+$percentage7c+$p +ercentage7d+$percentage7e+ $percentage7f+$percentage80+$percentage81+$percentage82+$p +ercentage83+$percentage84+ $percentage85+$percentage86+$percentage87+$percentage88+$p +ercentage89+$percentage8a+ $percentage8b+$percentage8c+$percentage8d+$percentage8e+$p +ercentage8f+$percentage90+ $percentage91+$percentage92+$percentage93+$percentage94+$p +ercentage95+$percentage96+ $percentage97+$percentage98+$percentage99+$percentage9a+$p +ercentage9b+$percentage9c+ $percentage9d+$percentage9e+$percentage9f+$percentagea0+$p +ercentagea1+$percentagea2+ $percentagea3+$percentagea4+$percentagea5+$percentagea6+$p +ercentagea7+$percentagea8+ $percentagea9+$percentageaa+$percentageab+$percentageac+$p +ercentagead+$percentageae+ $percentageaf+$percentageb0+$percentageb1+$percentageb2+$p +ercentageb3+$percentageb4+ $percentageb5+$percentageb6+$percentageb7+$percentageb8+$p +ercentageb9+$percentageba+ $percentagebb+$percentagebc+$percentagebd+$percentagebe+$p +ercentagebf+$percentagec0+ $percentagec1+$percentagec2+$percentagec3+$percentagec4+$p +ercentagec5+$percentagec6+ $percentagec7+$percentagec8+$percentagec9+$percentageca+$p +ercentagecb+$percentagecc+ $percentagecd+$percentagece+$percentagecf+$percentaged0+$p +ercentaged1+$percentaged2+ $percentaged3+$percentaged4+$percentaged5+$percentaged6+$p +ercentaged7+$percentaged8+ $percentaged9+$percentageda+$percentagedb+$percentagedc+$p +ercentagedd+$percentagede+ $percentagedf+$percentagee0+$percentagee1+$percentagee2+$p +ercentagee3+$percentagee4+ $percentagee5+$percentagee6+$percentagee7+$percentagee8+$p +ercentagee9+$percentageea+ $percentageeb+$percentageec+$percentageed+$percentageee+$p +ercentageef+$percentagef0+ $percentagef1+$percentagef2+$percentagef3+$percentagef4+$p +ercentagef5+$percentagef6+ $percentagef7+$percentagef8+$percentagef9+$percentagefa+$p +ercentagefb+$percentagefc+ $percentagefd+$percentagefe; print "0x00 percentage: $percentage00\n"; print "0xFF percentage: $percentageff\n"; my $average = $sum/254; #divided by 254 bec +ause i just wanted to average everything inbetween x00 and xff $average = sprintf("%.3f", $average); print "0x01 - 0xFE percentage: $average\n\n";
        I tried to use eval with tr/// but it was returning unexpected results, and was probably something i was doing wrong. I did think about how i could do a loop or something, but the only way was with s///g and it took a while longer to accomplish what i was after. tho the script i made works great, yours is alot shorter and is faster.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1096225]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-03-28 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found