Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Masters of Loops and Filehandles

by pbyfire (Novice)
on May 09, 2012 at 22:36 UTC ( #969713=perlquestion: print w/replies, xml ) Need Help??
pbyfire has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I am at wits end trying to accomplish something that I am sure is simple to many of you. Although I have a million lines of code attempts at this patched together from examples all over the internet I will not burden you with them.

Here is what I need to do:

Process all files in a single directory by seeking to a fixed position in each file and reading a string which will need to be replaced globally in each of the files read.

The problem I am having is that I only know that the string to be replaced is 10 characters long and are never the same thus seek and read or sysread will discover what they are but I have not been able to save them to a variable for use in a sed like statement such as 's/$stringfound/$stringreplacement/g' The global replacement does not work on an open FileHandle within the loop.

Yes, I know that a global substitution is easy as a one liner from the command line with redirects but I need to do this within a loop to discover the replacement string and perform a global replace on multiple files.

All suggestions, examples etc are greatly appreciated.

Bless the monks for their patience with beginners - pbyfire

Replies are listed 'Best First'.
Re: Masters of Loops and Filehandles
by roboticus (Chancellor) on May 09, 2012 at 23:49 UTC


    Here's a quickie example that should get you on your way:

    #!/usr/bin/perl my $Gibberish=<<EOGibberish; This is a sentence. It's not a particularly great sentence, but it's a sentence nonetheless. It would suck if someone accidentally changed it! EOGibberish use strict; use warnings; use autodie; my $search; open my $FH, '<', $0; binmode $FH; seek $FH, 55, 0; read $FH, $search, 8; close $FH; $Gibberish =~ s/$search/XXXXXXXX/g; print $Gibberish;

    When I run it here, I get:

    marco@Boink:~ $ perl This is a XXXXXXXX. It's not a particularly great XXXXXXXX, but it's a XXXXXXXX nonetheless. It would suck if someone accidentally changed it!


    When your only tool is a hammer, all problems look like your thumb.

      roboticus - Thanks for your reply and the code. I will see if I can work this into my loop since I need to process several files this way. I have already succeeded in replacing the pattern at a given location within a loop but replacing it globally doesnt seem to work even using variations of the $Gibberish sed line included in your example. I did attach an example of my existing code in this thread if you care to review it below. Thanks Again - pbyfire


        You can slurp the entire file into a variable by localizing the $/ variable:

        { local $/; $Gibberish = <$FH>; }

        If you do that after the open, and before the binmode & seek, you should be able to do a global search & replace on the entire file. Then just put it in a loop...


        When your only tool is a hammer, all problems look like your thumb.

Re: Masters of Loops and Filehandles
by graff (Chancellor) on May 10, 2012 at 03:10 UTC
    Isn't this like your third thread in SoPW about the same basic task? Could you have given some more detail this time about how those previous threads didn't solve your particular problem?

    How about breaking things down: (1) process all files in a directory -- this just means get a suitable list of file names and do the same thing to each file; (2) in a given file, seek to a fixed position, read 10 characters (? or 10 bytes?) in order to get a string pattern that needs to be replaced with something else; (3) do a pattern replacement globally in the given file, and save the modified version of the file. Is that what you're trying to do?

    There are a couple things you haven't described yet, which might be relevant:

    • How big are the files?
    • How do you determine what the replacement pattern should be?
    The steps above could actually be separate operations (the first one doesn't even need to be a perl script). Suppose you were to write a short little script that just does step 2: it takes a list of file names as input, and for each file in the list, it outputs a single line of text, containing the file name plus the 10 characters (bytes?) that are found at your mystical fixed position in the file.

    Once you confirm that this script does the right thing, write another little script that takes as input a list of lines containing "filename 10_char_string". If there's something special about setting a replacement string for the "10_char_string", this script simply appends that replacement to the line and prints it out.

    Once you confirm that the second script does the right thing, the third script is very simple: read the output of the second script, and for each line, open the file whose name is at the start of the line, slurp it into a single scalar variable, do a global regex substitution using the 2nd and 3rd tokens on the line, and write the resulting string to a new file.

    Once you confirm that this last script does the right thing, you're done. Run something that prints a list of file names, one per line (e.g. "ls"), pipe its output to the "seek" script, pipe that one's output to the "set replacement string" script, and pipe that one's output to the "edit file data" script.

    Each of those scripts is very short and simple. If you have trouble with any one of them, POST THE CODE THAT YOU TRIED for that step, together with a small sample of data that demonstrates the problem, and give us some idea about how the actual result differs from the intended result.

      Graff - thanks for your reply. The files are small - around 112k to 236k. The replacement pattern can be whatever I want it to be but only 10 characters in length (no metacharacters). A perl one liner from the command line can do this easily but the string to replace is unknown at the time and a one-liner is not acceptable it needs to be in a perl script not shell. I could easily do this with a small bash script with a for loop and awk,sed and grep but Perl is the edict. My code so far works as far as finding the replacement string which is different in each file because it is a serial number and I can easily change it in that one spot using print (FH $replacementvalue) but cannot figure out how to do a global substitute even setting seek back to the beginning of the file. Code Below:

      opendir(TKS, $tktdir) || die "Oops ... $!"; my @files = readdir TKS; closedir (TKS); chdir "$tktdir"; my $cntr = "000"; foreach my $file (@files) { unless ( ($file eq ".") || ($file eq "..") ) { my $sncount = "$tapeDev$cntr"; open(FH,"+<$file") or warn "Oops - Cant open ticket $!"; binmode FH; while (<FH>) { #if (/SCSI:INQ:80/) if (/INQ:B1/) { $offset = tell (FH); print (FH "$sncount"); $cntr++; } } } }
        Well, this is a little bit of progress... (but not much). As far as having a single perl script to do all this instead of a complicated shell command with multiple scripts: treat the various steps as a series of blocks or subroutines so that you put the whole sequence into a single perl script, and nothing else needs to change from the plan that I outlined.

        There's still a problem about setting the replacement string for the file updates. Why are you not able to explain this clearly? You start by saying "The replacement pattern can be whatever I want it to be", then you say your "code so far works as far as finding the replacement string which is different in each file because it is a serial number", but the code you posted doesn't really show anything of that sort.

        I see you have a variable called "$tapeDev" (which is not declared or given a value in the posted snippet), you are appending to that a counter number that (probably) increments with each file. Is this string supposed to end up being 10 bytes long, and is it supposed to replace the 10 bytes you read from the mysterious (as yet unspecified) fixed offset in each file?

        I'll assume "yes" and "yes". Considering the code you just posted, I gather that you didn't really understand what I was saying above, so here's what I described as separate scripts, but implemented as steps in a single script (not tested):

        use strict; # get the list of files to work on: my $tkdir = "."; # put a real path here chdir $tkdir or die "chdir $tkdir: $!\n"; # this makes things easier +below opendir( TKS, "." ); my @files = grep { -f } readdir TKS ; # only keep the things you want +here closedir TKS; # read 10 bytes at fixed offset in each file, set replacement values: my %edit_list; my $counter = 0; my $fixed_offset = 50; # put your real byte offset value here for my $file ( @files ) { open( FH, '<', $file ) or do { warn " skipped $file: $!\n"; next; }; binmode FH; seek FH, $fixed_offset, 0; read FH, $_, 10; close FH; my $replace = sprintf( "foobar%04d", ++$counter ); $edit_list{$file} = [ $_, $replace ]; } # now, go through the files and edit each one for my $file ( keys %edit_list ) { local $/; # sets input record separator to undef for slurp mode open( FH, '<', $file ); $_ = <FH>; close FH; s/\Q$edit_list{$file}[0]/$edit_list{$file}[1]/g; open( FN, '>', "$file.edited.$$" ) # write to a different name, +just to be safe or die $!; print FN; close FN; }
        (Updated to include the "chdir()" step at the top, followed by opendir( TKS, "." ); -- if your previous attempts had anything other than "." as the value for $tkdir, that would have been a big part of your problem -- I see you covered that step in your snippet.)

        If that's the sort of thing you're trying to do, it shouldn't be a problem to use the different output file names, at least until you're confident that it really is doing the right thing. Then you can change the last open statement so that it uses each original file name (replacing the contents of the original files).

Re: Masters of Loops and Filehandles
by ww (Archbishop) on May 10, 2012 at 02:18 UTC
    It sounds as though you might employ your time better by reading a standard Perl primer -- Learning Perl from O'Reilly would be a standard example (and a darn good one) -- than by fighting with "a million lines of code attempts at this patched together from examples all over the internet...."

    You may also want to visit our Tutorials for a selection of complimentary information.

      ww - Of course you are correct. Although I have taken Perl courses and have most of the OReilly books the occasion to actually write perl scripts never presented itself in my job until now. Scripting / Programming is like learning any foreign language - after reading several books about learning spanish I still have not gotten beyond Cerveza Por Favor because I am not in an environment where spanish is actually spoken and therefore I don't retain it. If bash were acceptable I could have managed this in a few minutes using a for loop in conjunction with awk, sed and grep but alas I must learn Perl. - Thanks for your reply. - pbyfire = Perl By Fire

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://969713]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (13)
As of 2018-06-22 12:07 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.