Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^20: search and replace strings in different files in a directory

by PitifulProgrammer (Acolyte)
on Sep 04, 2014 at 08:47 UTC ( [id://1099520]=note: print w/replies, xml ) Need Help??


in reply to Re^19: search and replace strings in different files in a directory
in thread search and replace strings in different files in a directory

Dear Anonymous Monk(s)

Of course, sorry I forgot to post the most recent version of the code.

use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Win32; use Win32::Unicode qw/ statW /; Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; my( $infile_paths ) = 'C:\dev\test_paths.txt';; my @paths = GetPaths( $infile_paths ); #print "The following paths were in the file:\n"; #say for @paths; for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } #return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { #~ my( $directory ) = @_; ## same as shift @_ ## same as shift my $directory = shift; my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #sets cu +rrent date and time to be added to backup file my $bak = "$date.bak"; #date added to the .bak file my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); #sub Replace using it's 2 para +meters as defined below } } ## end sub Main sub Replace { my( $in, $bak ) = shift; path( $in )->move( $bak ); #Creates a copy of the original file my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { #s{&}{&amp;}g; ## will match more than what you want fix it s{&amp;amp;}{&amp;}g; s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace

In case you spot an errors in terms of comments, please do comment on the comment(s) in the code, the more feedback the better, since I might be using that code a lot and might also have to explain to colleagues.

Thanks a mil for your help

Replies are listed 'Best First'.
Re^21: search and replace strings in different files in a directory
by Anonymous Monk on Sep 04, 2014 at 09:19 UTC

    Why did you write this ?   my( $in, $bak ) = shift;

    Have you read shift?

Re^21: search and replace strings in different files in a directory
by Anonymous Monk on Sep 04, 2014 at 10:31 UTC

    Regarding comments ... the ## end sub ... was added by perltidy ... I wouldn't spend much time mucking with those types of comments :)


    Now for real comments, this added comment isn't needed :) the information it adds is already provided by the code, and the information actually contradicts the code

    path( $in )->move( $bak ); #Creates a copy of the original file

    No, move is not copy, copy means duplicate, move means move , from here to there, from this name to that name, path $in move to path $bak

    You could add  # rename $in to $bak if move isn't part of your vocabulary ... even  # backup $in to $bak ... but sub is invoked as  Replace( $file, "$file-$bak" ); so its not exactly new information :)

    Now you could say the sub Replace creates a copy of the original file before it edits it to ... that is a comment for the subroutine , what the subroutine is supposed to accomplish (strategy) ... subroutine comments before subroutine (at top of subroutine), not on lines of code (this is misleading)


    This part  ## will match more than what you want fix it I was probably wrong on that ... :) this is why testing exists :)


    #say for @paths;

    if you're debugging dd()umper takes care of non-printableish chars ... so you know exactly what types of bytes you have ... some chars don't show up in the shell ... so use dd() ... perlrebackslash explains escape sequences as does chromatics free book Modern Perl


    my $date      = POSIX::strftime( '%Y-%m-%d', localtime ); #sets current date and time to be added to backup file

    Have you tried it?

    use POSIX; my $date = POSIX::strftime( '%Y-%m-%d', localtime ); print "$date\n"; __END__ 2014-09-04

    Is 2014-09-04 a "date and time"? The variable is named $date so adding date in comment is repetition :) and there is no time in the string, even if localtime function is used

    Also to be added to backup file seems like extra stuff since on the very next line you have my $bak       = "$date.bak"; #date added to the .bak file

    Adding comments like this to you code, and saving and keeping the file is a good idea, it helps you learn/remember things you were having trouble remembering... save it maybe as myproggie-2014-09-04-02-55-54-annotated.pl ... so a week a month a year from now you can read it and remember

    But you should strive for correctness in commentary, because computers are dumb, they don't skip steps, so "date added to .bak file" isn't exactly true, its a string, its a suffix, for a backup filename, so date isn't added to file ... if the $variable names aren't informative enough, don't add comments, change the name

    my $dateNow = POSIX::strftime( '%Y-%m-%d', localtime ); my $backupSuffix = $dateNow . ".bak"; ... my $backupFile = "$file-$backupSuffix"; Replace( $file, $backupFile );

    Maybe

    $ymdToday ... ... $backupFile = "$file-$ymdToday.bak";

    Or even  $yyyymmdd ... "$file-$ymdToday.bak";

    Remember your program outline in Re^16: search and replace strings in different files in a directory? That is a good for a first draft sketch:), but once you start giving good names to subs, you gotta keep going and give good variable names too ... names that are meaningful to you and your program ... good names beat good comments :) Strategy in Comments, Tactics in Code

    Did I mention , every time you make big changes to your program , you should back it up, say myproggie-2014-09-04-02-16-54.pl, myproggie-2014-09-04-03-16-54.pl, ... ? each time you start work on a new subroutine start a new file...


    Replace( $file, "$file-$bak" ); #sub Replace using it's 2 parameters as defined below

    Instead of documenting Replace() where you use it, try documenting it where its defined, like

    ## Replace( $inputFilename, $backupFilename ); sub Replace { my( $inputFilename, $backupFilename ) = @_; ... ## FixXmlEntities ( $inputFilename, $backupFilename ); sub FixXmlEntities { my( $inputFilename, $backupFilename ) = @_; ... ## FixXmlEntities ( $inputFilename, $backupFilename ); sub FixXmlEntities { my( $input, $backup ) = @_; ... ## FixUnencodedXmlEntities ( $inputFilename, $backupFilename ); ## FixStrayXmlEntities ( $inputFilename, $backupFilename ); sub FixStrayXmlEntities { my( $infile, $bakfile ) = @_; ... }

    What do you like? Whats memorable and correct?


    So correct comments are good, good for learning, improving the quality of your varnames/subnames so you need less comments comes with practice time ... backup your files ... as you incrementally create twenty small programs until you're comfortable with the syntax/grammar/vocabulary of the language perl... programming is a lot like carpentry except its ok to throw away your work and start over bytes are cheap:)

      Dear Monks

      Thanks a mil in advance for your comments on how to comment one's code. I am sorry for my improper use of some of the technical terms that have led to some confusion. I did not know that there was so much to consider.

      I went through some of the suggested links and promise to do better in the next project(s). It was very helpful and will surely be a great help in structuring my code and how I go about coding in general.

      I changed the code a bit, i.e. substituted move with copy and got the results as specified by colleagues. I will put the script to test next time, there might be some issues with running the script on the server and not everybody has Perl installed, so I guess I'll be getting txt.file and run the script on my machine.

      I would however like to post the recent version here so that it is accessible to others. It would also be grand if I got some feedback on the new comments.

      Yes, before I forget, one of you mentioned that the following line would not quite match as intended.

      sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #rename $in to $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## will match more than what you want fix it s{&amp;amp;}{&amp;}g; s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; }

      That contributor was right, but the subsitution is only carried out the wrong way if there the source file has a particular structure in terms of the items to be substituted. I have not yet found when, since all the recent substitutions proved to be ok.

      I'll let you guys know or some of you might have an idea

      Thanks a mil to all contributors for your patience and providing the bits and pieces which have created this wonderful script.

      Thank you and keep it going.

      Kind regards

      C

        Dear all

        This is my final (slightly anonymised version) of the code, which is working for me as intended.

        #!/usr/bin/perl -- use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{&amp;amp;}{&amp;}g; s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1099520]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-03-28 21:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found