Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^5: reading files in @ARGV doesn't return expected output

by Laurent_R (Canon)
on Jun 27, 2017 at 12:51 UTC ( [id://1193686]=note: print w/replies, xml ) Need Help??


in reply to Re^4: reading files in @ARGV doesn't return expected output
in thread reading files in @ARGV doesn't return expected output

Basically what I want to know is: is there a way to open those files with two for loops like so
for ($i; $i<=4; $i++) { for ($j; $j<=4; $j++) { #etc
Yes, you can do that, but that would be very inefficient and that's most probably not what you should do, because it would mean opening the second series of files a number of times, and there is nothing in what you've described that would make this necessary.

With the code that I have provided in my first post (including the small corrections I made on the @ARGV array that I had forgot to remove in the second for loop), you should be able to read all the files.

If, on the other hand, you want to combine in some ways files from your first set with files of your second set, then it is more complicated, but you still don't want to read the same files many times over. But the bottom line is that there is nothing in what you said so far that indicates something in this direction.

Replies are listed 'Best First'.
Re^6: reading files in @ARGV doesn't return expected output
by fasoli (Beadle) on Jun 27, 2017 at 13:50 UTC
    "Yes, you can do that, but that would be very inefficient and that's most probably not what you should do, because it would mean opening the second series of files a number of times"

    Oh, would it? So the for loop would open the second series of files as many times as specified in the first series of files?

    "If, on the other hand, you want to combine in some ways files from your first set with files of your second set, then it is more complicated, but you still don't want to read the same files many times over. But the bottom line is that there is nothing in what you said so far that indicates something in this direction."

    Yes, I do want to combine them. As I wrote in an earlier reply, I want to calculate the averages and the deviation between all matrices from files $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. So if I have 5 $i files and 5 $j files, I want to get all their RMS deviations. Sorry, it's probably my fault that this got lost and not mentioned earlier.

    I was thinking that the most efficient way to ask my question (and at this point I apologise for the inconvenience of my frantic stressed posts so far) is to post the code that I *have* been using so far which was used only on 2 files at a time. That way I can highlight my efforts so far -and their shortcomings- to deal with multiple files instead of two, and that way you can see the whole code and get an idea of the bigger picture. I was trying to minimise the code so that it would be more efficient to read but I'm not conveying the end goal and ended up confusing people. Our server is down at the minute and will be live again tomorrow so I can't retrieve my existing script right now but hopefully if I post tomorrow there will still be some interest from you very kind folk and I will still get some feedback.

    Thank you for the efforts so far, I will continue trying to implement your recommendations.

    If anyone missed it as I wrongly didn't mention it efficiently enough, let me explain again what this was all about

    My data files are plain text files files that contain text and numbers stored in matrices, they look like this:

    #title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9

    they're not necessarily 5 lines long, this is an example. The actual matrices are much bigger, I think my biggest file is a 85x85 matrix.

    What I want to do is perform some mathematical calculations on the combination of matrices $i and $j. Get the averages of those matrices, their deviation etc. I haven't included this bit of the code yet but (fingers crossed) it works.

    So I want to calculate the deviation between all matrices from file $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. But like I said this bit of the code isn't shown here as I tried to keep it to a minimum - if I fail in opening and splitting them in columns then I won't be able to move on to the next bit anyway.

    At this stage I was trying to print them for testing only but this was were all the maths are in the real code, written in a $list[$a][$b] basis, and this was why I was trying to print $list[$a][$b] successfully in the fist place.

    In reality this:
    for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { print "$list[$a][$b] "; # for testing } print " \n"; }
    Would be:
    for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { $m_avrg[$a][$b] += $list[$a][$b]; } }

    to start calculating the averages, here I would just be adding the numbers and further on I'd divide to get the average. That's why it's important that I keep the $list[$a][$b] notation as this is where I've based the whole code - which took me weeks/months to write, as you can probably guess.

      Oh, would it? So the for loop would open the second series of files as many times as specified in the first series of files?
      Yes, if you have nested loops, you would open the files of the second dataset once for each iteration over the first dataset. So, assuming you have five items in your first dataset, you would open 5 times each file of the second dataset.

      If you don't believe me, try to run the following code:

      for my $i (1..5) { for my $j (1..3) { print "i = $i; j = $j \n"; } }
      and observe how many lines you print (should be 15). Each value of $j is printed 5 times. That may be OK is you want to display some combinations (say print out the multiplication tables), but you don't want to open and read each file of the second data set 5 times. And that's even more true if you have more files.

      How do you want to combine them for your computations? All of them together? Or pairwise between the two dataset? Like the first matrix from the $i data set with the first one from the $j data set? Or each matrix of the $i data set with each matrix from the $j data set (a kind of Cartesian product)? or something else?

      Whichever way, read each file only once and store its content in a separate matrix in memory (say an array of arrays), and make your calculations at a later point. Since you'll have several matrices, you should end up with an AoAoA (array or arrays of arrays) or possibly a HoAoA (hash or arrays of arrays), or some similar data structure, depending on how you want to make the calculations later.

      This:
      for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { print "$list[$a][$b] "; # for testing } print " \n"; }
      should probably avoid using the $a and $b variables, which have a special purpose (for sorting).

      In general, you should probably try to avoid using the C-style for loop and array subscripts when you can, because it can be made significantly simpler and easier by iterating directly over the values (no off-by-one errors, no out-of-range errors).

      my @AoA = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); # an array of arrays for +testing purpose for my $row (@AoA) { for my $col (@$row) { print "$col "; } print "\n"; }
      which prints:
      1 2 3 4 5 6 7 8 9

      And if you need to make some calculations:

      my @AoA = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); for my $row (@AoA) { my ($sum, $count) = (0, 0); for my $col (@$row) { $sum += "$col "; $count ++; } print "Average: ", $sum / $count, "\n" if $count; }
      which will print the three computed averages:
      Average: 2 Average: 5 Average: 8
      Update: The comment about $a and $b was made earlier by zentara, but below in this thread, I had not noticed it when I mentioned that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1193686]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-04-16 17:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found