Re^5: reading files in @ARGV doesn't return expected output

Replies are listed 'Best First'.
Re^6: reading files in @ARGV doesn't return expected output by fasoli (Beadle) on Jun 27, 2017 at 13:50 UTC
"Yes, you can do that, but that would be very inefficient and that's most probably not what you should do, because it would mean opening the second series of files a number of times" Oh, would it? So the `for` loop would open the second series of files as many times as specified in the first series of files? "If, on the other hand, you want to combine in some ways files from your first set with files of your second set, then it is more complicated, but you still don't want to read the same files many times over. But the bottom line is that there is nothing in what you said so far that indicates something in this direction." Yes, I do want to combine them. As I wrote in an earlier reply, I want to calculate the averages and the deviation between all matrices from files $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. So if I have 5 $i files and 5 $j files, I want to get all their RMS deviations. Sorry, it's probably my fault that this got lost and not mentioned earlier. I was thinking that the most efficient way to ask my question (and at this point I apologise for the inconvenience of my frantic stressed posts so far) is to post the code that I have been using so far which was used only on 2 files at a time. That way I can highlight my efforts so far -and their shortcomings- to deal with multiple files instead of two, and that way you can see the whole code and get an idea of the bigger picture. I was trying to minimise the code so that it would be more efficient to read but I'm not conveying the end goal and ended up confusing people. Our server is down at the minute and will be live again tomorrow so I can't retrieve my existing script right now but hopefully if I post tomorrow there will still be some interest from you very kind folk and I will still get some feedback. Thank you for the efforts so far, I will continue trying to implement your recommendations. If anyone missed it as I wrongly didn't mention it efficiently enough, let me explain again what this was all about My data files are plain text files files that contain text and numbers stored in matrices, they look like this: `#title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9` [download] they're not necessarily 5 lines long, this is an example. The actual matrices are much bigger, I think my biggest file is a 85x85 matrix. What I want to do is perform some mathematical calculations on the combination of matrices $i and $j. Get the averages of those matrices, their deviation etc. I haven't included this bit of the code yet but (fingers crossed) it works. So I want to calculate the deviation between all matrices from file $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. But like I said this bit of the code isn't shown here as I tried to keep it to a minimum - if I fail in opening and splitting them in columns then I won't be able to move on to the next bit anyway. At this stage I was trying to print them for testing only but this was were all the maths are in the real code, written in a `$list[$a][$b]` basis, and this was why I was trying to print `$list[$a][$b]` successfully in the fist place. In reality this: `for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { print "$list[$a][$b] "; # for testing } print " \n"; }` [download] Would be: `for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { $m_avrg[$a][$b] += $list[$a][$b]; } }` [download] to start calculating the averages, here I would just be adding the numbers and further on I'd divide to get the average. That's why it's important that I keep the `$list[$a][$b]` notation as this is where I've based the whole code - which took me weeks/months to write, as you can probably guess.	[reply] [d/l] [select]
Re^7: reading files in @ARGV doesn't return expected output by Laurent_R (Canon) on Jun 27, 2017 at 14:45 UTC
Oh, would it? So the for loop would open the second series of files as many times as specified in the first series of files? Yes, if you have nested loops, you would open the files of the second dataset once for each iteration over the first dataset. So, assuming you have five items in your first dataset, you would open 5 times each file of the second dataset. If you don't believe me, try to run the following code: `for my $i (1..5) { for my $j (1..3) { print "i = $i; j = $j \n"; } }` [download] and observe how many lines you print (should be 15). Each value of $j is printed 5 times. That may be OK is you want to display some combinations (say print out the multiplication tables), but you don't want to open and read each file of the second data set 5 times. And that's even more true if you have more files. How do you want to combine them for your computations? All of them together? Or pairwise between the two dataset? Like the first matrix from the $i data set with the first one from the $j data set? Or each matrix of the $i data set with each matrix from the $j data set (a kind of Cartesian product)? or something else? Whichever way, read each file only once and store its content in a separate matrix in memory (say an array of arrays), and make your calculations at a later point. Since you'll have several matrices, you should end up with an AoAoA (array or arrays of arrays) or possibly a HoAoA (hash or arrays of arrays), or some similar data structure, depending on how you want to make the calculations later.	[reply] [d/l]
Re^7: reading files in @ARGV doesn't return expected output by Laurent_R (Canon) on Jun 27, 2017 at 16:27 UTC
This: `for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { print "$list[$a][$b] "; # for testing } print " \n"; }` [download] should probably avoid using the $a and $b variables, which have a special purpose (for sorting). In general, you should probably try to avoid using the C-style `for` loop and array subscripts when you can, because it can be made significantly simpler and easier by iterating directly over the values (no off-by-one errors, no out-of-range errors). `my @AoA = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); # an array of arrays for +testing purpose for my $row (@AoA) { for my $col (@$row) { print "$col "; } print "\n"; }` [download] which prints: `1 2 3 4 5 6 7 8 9` [download] And if you need to make some calculations: `my @AoA = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); for my $row (@AoA) { my ($sum, $count) = (0, 0); for my $col (@$row) { $sum += "$col "; $count ++; } print "Average: ", $sum / $count, "\n" if $count; }` [download] which will print the three computed averages: `Average: 2 Average: 5 Average: 8` [download] Update: The comment about $a and $b was made earlier by zentara, but below in this thread, I had not noticed it when I mentioned that.	[reply] [d/l] [select]


Just another Perl shrine
	PerlMonks