I think
TedPride has the right idea. Setting $INPUT_RECORD_SEPERATOR to undefined (undef $/) has the effect of causing the while(<IN> statement to slurp the entire three+ gig file at once. From the docs
"Entirely undefining $/ makes the next line input operation slurp in the remainder of the file as one scalar value"
Set $/ to some reasonable size, maybe to satisfy memory limitations. Then you can match blocks of the file in much the way
TedPride describes. Something like this
#undef $/;
$/ = \2048; # 2K blocks
my $blocksz = 2048;
open IN, "tmp";
open OUT, ">>$ARGV[1]"; #more efficient when parsing more than one b
+lock
#Extract the necessary data bits
my $block01 = <IN>;
my $block02 = '';
while ($block02 = <IN>) {
my $block = $block01.$block02;
$block =~ s/11110100.{8}(.{1520})11110100.{8}(.{464}).{1056}/$1$2
+/g;
# $_ =~ s/11110100.{8}(.{1520})11110100.{8}(.{464}).{1056}/$1$2/g;
# $final - pack("B*", $_); #Conver data back to original binary fo
+rmat
# $final = pack("B*", $block); # this is wrong
$final = pack("B*", substr($block,0,$blocksz)); # this should wor
+k
print OUT "$final";
$final = ''; # this is strictly unnecssary but does keep variable
+ clean
# undef $final;
$block01 = substr($block,-$blocksz); # this moves the upper bloc
+k down
}
$final = pack("B*", substr($block,-$blocksz)); # get final block
print OUT "$final";
close OUT;
close IN;
Update:
Corrected a couple lines in code
PJ
use strict; use warnings; use diagnostics;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.