Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^7: comparing contents of two arrays and output differences

by roboticus (Chancellor)
on Jan 05, 2015 at 16:38 UTC ( [id://1112209]=note: print w/replies, xml ) Need Help??


in reply to Re^6: comparing contents of two arrays and output differences
in thread comparing contents of two arrays and output differences

PitifulProgrammer:

Yeah, I hardcoded the filenames to simplify things. For your case, I'd probably load up the array with something like:

my @files = map { s/\.xml$//; $_ } glob('*.xml');

The map statement simply trims the ".xml" off the end of the list of XML files. Then when checking for the XML and/or BAK files, we glue 'em on as needed.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^8: comparing contents of two arrays and output differences
by PitifulProgrammer (Acolyte) on Jan 05, 2015 at 18:11 UTC

    Dear roboticus

    That is pretty amazing, I read about map, but at that time, I could not think about an application. That is neat line of code, I'll try to memorize it for the future.

    Thanks a lot for your help, I will go back to the code later and post my result(s).

    Thank you very much for taking the trouble and for your explanations. I should have joined the forum much earlier :)

    Kind regards

    C.

      PitifulProgrammer:

      Yeah, map is one of my favorite bits. It's a simple way to build one list given another list. You need only give it a chunk of code to call for each element in the list, and whatever it returns is the content of the new list.

      my @list = (1, 2, 3, 4, 5); # makes list: 2, 4, 6, 8, 10 my @even_numbers = map { $_*2 } @list; # makes list: 1, 4, 9, 16, 25 my @squares = map { $_*$_ } @list; # makes list: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 my @repeated_list= map { $_, $_ } @list; @list = qw(foo bar baz); # makes list 'foo', 'bar', 'baz' my @single_quoted = map { "'$_'" } @list;

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Re^8: comparing contents of two arrays and output differences
by PitifulProgrammer (Acolyte) on Jan 07, 2015 at 10:41 UTC

    Dear roboticus,

    I tried the code and made some changes, mostly to see which files are stored where. The script worked in the test run, my next effort is to create a subroutine from that script, but before I do that I still have a question, since I really would like to understand what the code does.

    I have a question about "trimming" the xml off as you nicely put it. The checking of the files does not raise an error, at least that is what I figure, since the errors messages are not printed out.

    My question is how the script can differentiate between .xml and .bak. Would that be a built-in feature of the Text::Diff used in line 45? How exactly does the interpolation between file extension work in that particular case?

    Would be grand if you or another monk could shed some light.

    Thanks a mil in advance and kind regards

    C.
    use 5.018; use strict; use warnings; use Data::Dumper; use File::Glob; use Text::Diff; use Text::Diff::Table; #Separating xml and backup files #my @xml_files = glob( '*xml' ); #say for @xml_files; #my @bak_files = glob( '*bak' ); #say for @bak_files; #Show differences between file_01.xml and file_01.xml.bak, etc... open my $FH, '>', "file_difference_report" or die $!; my @base_file_names_xml = map { s/\.xml$//; $_ } glob('*.xml'); print Dumper \@base_file_names_xml; my @base_file_names_bak = glob('*.bak'); print Dumper \@base_file_names_bak; #cutting off file extension to use file name only, extension for #comparing .xml and .bak added by code below; #print Dumper \@base_file_names; #print "\n\n\n"; for my $file_name ( @base_file_names_xml ) { if ( ! -e "$file_name.xml" ){ print "$file_name.xml: Not present ... not interesting file?\n +"; next; } if ( ! -e "$file_name.xml.bak" ){ print "$file_name: no backup, so probably not changed\n"; next; } # If we get here, we have a .bak and a .xml file, so make another # program to compare them for us: my $output = diff "$file_name.xml", "$file_name.xml.bak"; print $FH "\n\n===== $file_name changes =====\n"; print $FH $output; print $FH "\n\n"; }

      PitifulProgrammer:

      If $file_name contains the string "foo_bar", then -e "$file_name.xml" checks whether "foo_bar.xml" exists, and similarly, -e "$file_name.xml.bak" checks for "foo_bar.xml.bak". I didn't use a different list for the .bak files because they're only relevant if you have the original, and I can construct the name of the backup file easily given the base file name. I stripped the .xml off the end originally, because I was expecting the backups to have a .bak extension without the .xml part. Otherwise, we could simply skip the map statement, and then use -e "$file_name" for the .xml file[1], and -e "$file_name.bak" for the second one.

      [1] You don't need the quotes in the first one if you're not building a string from multiple parts, but I left 'em in for symmetry.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Dear roboticus,

        Thanks a mil for your explanation, it is much appreciated. Thank you very much.

        Now, back to coding.

        As said before, the script laid out here is just one part of a larger script ( since this is in a different post I am not sure where to put it, I hope the forum admins will forgive me. ). Anyway, I tried creating a subroutine from the code discussed and implementing it into another script.

        As far as I can observe, I think I surely have been wrong about putting the comparison below the other subroutine. My thoughts were that there should be a comparison summary in each folder. This also leads to the next issue I (usually) have with subroutines, I am not sure which arguments to pass to it and which arguments should be returned. I read up on subs, but somehow I cannot apply the underlying principle from the book(s) to present code ( I guess that is a matter of practice ).

        So like I said, the present code looks for the paths specified in the text file and replaces some entities, while also creating a backup file. After that a comparison of each .xml and .back file in the different folders should be carried out, with an individual report in each folder, summarising what has been replaced or not. The code has been anonymised for well you know why...

        In case I failed to mention details about the purpose or the code itself, please do let me know ( apologies in advance )

        Thanks a mil in advance for your comments

        Kind regards

        C.
        use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Encode qw(encode decode); use File::Glob; use Text::Diff; Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via command line my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); CompareAndCheckForReplacements( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending +specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; # In some very rare cases does not match as inte +nded, thus file comparison added s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace sub CompareAndCheckForReplacements{ my( $directory ) = shift; ## compare files to check where replacements were made #open log-file to write results to open my $FH, '>', "file_difference_report" or die $!; #retrieve xml file name and trim file extension my @base_file_names_xml = map { s/\.xml$//; $_ } glob('*.xml'); my @base_file_names_bak = glob('*.bak'); #cutting off file extension to use file name only, extension for #comparing .xml and .bak added by code below; for my $file_name ( @base_file_names_xml ) { if ( ! -e "$file_name.xml" ){ print "$file_name.xml: Not present ... not interesting fil +e?\n"; next; } if ( ! -e "$file_name.xml.bak" ){ print "$file_name: no backup, so probably not changed\n"; next; } # If we get here, we have a .bak and a .xml file, so make anot +her # program to compare them for us: my $output = diff "$file_name.xml", "$file_name.xml.bak"; print $FH "\n\n===== $file_name changes =====\n"; print $FH $output; print $FH "\n\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112209]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-19 04:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found