Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Byte repetition check

by james28909 (Deacon)
on Dec 11, 2014 at 03:46 UTC ( [id://1110005]=perlquestion: print w/replies, xml ) Need Help??

james28909 has asked for the wisdom of the Perl Monks concerning the following question:

I need a little help with this script. I wrote this small script to count repeating bytes in small files (200kb or less), which it is doing, but its not doing it correctly I do not think.
use strict; use warnings; my $count = 0; my @array; while ( read( DATA, my $byte, 1 ) ) { if ( $byte ~~ @array ) { $count++; print @array; undef @array; } else { undef @array; push @array, $byte; print @array; } } print $count; __DATA__ 1112223333
In this code, it should match "1" 2 times, "2" 2 times, and "3" 3 times, which would be a total of 7 repeating bytes. The script only counts 5 repeating bytes.

What am I missing? I know its something simple probably. Any help would be appreciated :)

Replies are listed 'Best First'.
Re: Byte repetition check
by BrowserUk (Patriarch) on Dec 11, 2014 at 04:02 UTC
    In this code, it should match "1" 2 times, "2" 2 times, and "3" 3 times, which would be a total of 7 repeating bytes. The script only counts 5 repeating bytes. What am I missing? I know its something simple probably. Any help would be appreciated :)

    You are smart matching a byte against an array.

    1. The array is initially empty, so you take the else branch and (re)empty it before pushing the character into it, and then print its content.
    2. On the second pass: the array will contain one character; so you increment your count; print the array -- which will still only have one character in it; and then empty it.
    3. The array is now empty (again), so goto step 1.

    So, step through the ten iterations of your loop on paper, recording the changes to @array and $count, and it will be very clear to you where you are going wrong.

    (You also appear to have a newline character as the first line of your file, which probably isn't meant to be there?)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Nope, the first newline isnt suppose to be there. Thanks for pointing that out.
Re: Byte repetition check
by ikegami (Patriarch) on Dec 11, 2014 at 03:56 UTC
    my $count; my $last = ''; while (read(DATA, my $byte, 1)) { ++$count if $byte eq $last; $last = $byte; }
Re: Byte repetition check
by Anonymous Monk on Dec 11, 2014 at 05:05 UTC
    Just don't use smartmatch... And if files are small, I don't see any reason to read them byte by byte.
    $ perl -0777 -nE ' my $count = 0; $count += length $2 while /(.)(\1+)/g; say $count; ' <<< "1112223333" 7 $ perldoc perlop | perl -0777 -nE ' my $count = 0; $count += length $2 while /(.)(\1+)/g; say $count; ' 22788
    (just don't decode your strings, and you'll have bytes... pretty much)
Re: Byte repetition check
by thezip (Vicar) on Dec 11, 2014 at 21:17 UTC

    Yet another WTDI:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '11122222344456788899'; my @data = split(//, $data); my $accum = {}; my @buffer = (); push(@buffer, shift(@data)); while (@data) { push(@buffer, shift(@data)); if($buffer[0] == $buffer[1]) { $accum->{$buffer[0]}++; } shift(@buffer); } print Dumper($accum);

    *My* tenacity goes to eleven...
Re: Byte repetition check
by james28909 (Deacon) on Dec 11, 2014 at 07:42 UTC
    It seems doing it this way will not work out in the end. And the issue is because when I move from reading 1 byte, to 2 bytes or more, I think what you call alignment becomes an issue. When 2 bytes are read at a time, 011110, $byte becomes 01 then 11 then 10, and common sense will tell you there is repeating byte sets in there.'11' and '11' but it doesnt catch it when 2 bytes are read, and so on with a larger read.

    Sooo... if I want to read 2 bytes at a time, I will have to read from the beginning of the file, then calculate how many reps, then seek 1 byte and do it again. Would that take care of this "alignment" issue?

      Why not read the file (or blocks of it, if it's too large to conveniently fit into memory) into a buffer and then process that buffer?

      This would add a couple of lines of code, but it would save you thousands of system calls and improve performance immensely.

      Looping through a buffer could then be done using substr() or by split()ting the buffer into an @array which you can then foreach() through.

      -- FloydATC

      Time flies when you don't know what you're doing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1110005]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-24 05:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found