Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Masters of Loops and Filehandles

by graff (Chancellor)
on May 10, 2012 at 03:10 UTC ( #969730=note: print w/replies, xml ) Need Help??

in reply to Masters of Loops and Filehandles

Isn't this like your third thread in SoPW about the same basic task? Could you have given some more detail this time about how those previous threads didn't solve your particular problem?

How about breaking things down: (1) process all files in a directory -- this just means get a suitable list of file names and do the same thing to each file; (2) in a given file, seek to a fixed position, read 10 characters (? or 10 bytes?) in order to get a string pattern that needs to be replaced with something else; (3) do a pattern replacement globally in the given file, and save the modified version of the file. Is that what you're trying to do?

There are a couple things you haven't described yet, which might be relevant:

  • How big are the files?
  • How do you determine what the replacement pattern should be?
The steps above could actually be separate operations (the first one doesn't even need to be a perl script). Suppose you were to write a short little script that just does step 2: it takes a list of file names as input, and for each file in the list, it outputs a single line of text, containing the file name plus the 10 characters (bytes?) that are found at your mystical fixed position in the file.

Once you confirm that this script does the right thing, write another little script that takes as input a list of lines containing "filename 10_char_string". If there's something special about setting a replacement string for the "10_char_string", this script simply appends that replacement to the line and prints it out.

Once you confirm that the second script does the right thing, the third script is very simple: read the output of the second script, and for each line, open the file whose name is at the start of the line, slurp it into a single scalar variable, do a global regex substitution using the 2nd and 3rd tokens on the line, and write the resulting string to a new file.

Once you confirm that this last script does the right thing, you're done. Run something that prints a list of file names, one per line (e.g. "ls"), pipe its output to the "seek" script, pipe that one's output to the "set replacement string" script, and pipe that one's output to the "edit file data" script.

Each of those scripts is very short and simple. If you have trouble with any one of them, POST THE CODE THAT YOU TRIED for that step, together with a small sample of data that demonstrates the problem, and give us some idea about how the actual result differs from the intended result.

Replies are listed 'Best First'.
Re^2: Masters of Loops and Filehandles
by pbyfire (Novice) on May 10, 2012 at 14:21 UTC

    Graff - thanks for your reply. The files are small - around 112k to 236k. The replacement pattern can be whatever I want it to be but only 10 characters in length (no metacharacters). A perl one liner from the command line can do this easily but the string to replace is unknown at the time and a one-liner is not acceptable it needs to be in a perl script not shell. I could easily do this with a small bash script with a for loop and awk,sed and grep but Perl is the edict. My code so far works as far as finding the replacement string which is different in each file because it is a serial number and I can easily change it in that one spot using print (FH $replacementvalue) but cannot figure out how to do a global substitute even setting seek back to the beginning of the file. Code Below:

    opendir(TKS, $tktdir) || die "Oops ... $!"; my @files = readdir TKS; closedir (TKS); chdir "$tktdir"; my $cntr = "000"; foreach my $file (@files) { unless ( ($file eq ".") || ($file eq "..") ) { my $sncount = "$tapeDev$cntr"; open(FH,"+<$file") or warn "Oops - Cant open ticket $!"; binmode FH; while (<FH>) { #if (/SCSI:INQ:80/) if (/INQ:B1/) { $offset = tell (FH); print (FH "$sncount"); $cntr++; } } } }
      Well, this is a little bit of progress... (but not much). As far as having a single perl script to do all this instead of a complicated shell command with multiple scripts: treat the various steps as a series of blocks or subroutines so that you put the whole sequence into a single perl script, and nothing else needs to change from the plan that I outlined.

      There's still a problem about setting the replacement string for the file updates. Why are you not able to explain this clearly? You start by saying "The replacement pattern can be whatever I want it to be", then you say your "code so far works as far as finding the replacement string which is different in each file because it is a serial number", but the code you posted doesn't really show anything of that sort.

      I see you have a variable called "$tapeDev" (which is not declared or given a value in the posted snippet), you are appending to that a counter number that (probably) increments with each file. Is this string supposed to end up being 10 bytes long, and is it supposed to replace the 10 bytes you read from the mysterious (as yet unspecified) fixed offset in each file?

      I'll assume "yes" and "yes". Considering the code you just posted, I gather that you didn't really understand what I was saying above, so here's what I described as separate scripts, but implemented as steps in a single script (not tested):

      use strict; # get the list of files to work on: my $tkdir = "."; # put a real path here chdir $tkdir or die "chdir $tkdir: $!\n"; # this makes things easier +below opendir( TKS, "." ); my @files = grep { -f } readdir TKS ; # only keep the things you want +here closedir TKS; # read 10 bytes at fixed offset in each file, set replacement values: my %edit_list; my $counter = 0; my $fixed_offset = 50; # put your real byte offset value here for my $file ( @files ) { open( FH, '<', $file ) or do { warn " skipped $file: $!\n"; next; }; binmode FH; seek FH, $fixed_offset, 0; read FH, $_, 10; close FH; my $replace = sprintf( "foobar%04d", ++$counter ); $edit_list{$file} = [ $_, $replace ]; } # now, go through the files and edit each one for my $file ( keys %edit_list ) { local $/; # sets input record separator to undef for slurp mode open( FH, '<', $file ); $_ = <FH>; close FH; s/\Q$edit_list{$file}[0]/$edit_list{$file}[1]/g; open( FN, '>', "$file.edited.$$" ) # write to a different name, +just to be safe or die $!; print FN; close FN; }
      (Updated to include the "chdir()" step at the top, followed by opendir( TKS, "." ); -- if your previous attempts had anything other than "." as the value for $tkdir, that would have been a big part of your problem -- I see you covered that step in your snippet.)

      If that's the sort of thing you're trying to do, it shouldn't be a problem to use the different output file names, at least until you're confident that it really is doing the right thing. Then you can change the last open statement so that it uses each original file name (replacing the contents of the original files).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://969730]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2018-05-21 06:23 GMT
Find Nodes?
    Voting Booth?