Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

reading files in @ARGV doesn't return expected output

by fasoli (Beadle)
on Jun 26, 2017 at 23:04 UTC ( [id://1193642]=perlquestion: print w/replies, xml ) Need Help??

fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hello again Wise Monks,

I'm trying to read multiple files using @ARGV and then split them into columns and print the matrices. Then (if this ever works - I'm desperate at this point) I hope to do maths on the matrices. I have added comments at relevant places in the code, where I'm focusing my attention to solve my three problems - described further down.

I've tried all day to create a short self-contained example with dummy data but my skills are more limited than I thought - facing compilation errors literally all day and now at midnight I've given up. In that case and because I'm desperate for some feedback, I have just posted my code as it is. I hope you can find a few minutes and take a look.

My problems are three:

1. intuitively, I would think that the two for loops should enclose the whole code and close at the very end of the script. However when I did that, printing @ARGV returned wrong results. When I closed the for loops right after pushing into @ARGV, then printing @ARGV returned the correct result. Any explanation of this behaviour would be greatly appreciated, I can't figure it out at all.

2. print "@columns \n"; prints all files BUT ignores $nextUnless. Why?

3. print "$list[$a][$b] "; prints only first file and nothing else

. Does that mean that it's ignoring what's in @ARGV?

I would appreciate any pointers and I apologise for the lack of a proper example. Any feedback / insights would be immensely appreciated, Thank you so much in advance.

#!/bin/perl/ use strict; use warnings; my $molec1 = "molec1"; my $molec2 = "molec2"; my $input1; my $input2; @ARGV = ();; my $i; my $j; my $path = "/store/group/comparisons"; my $line; my @columns; my $nextUnless = 2; # nr of lines to skip my $CountLines = 0; # total nr of lines in all files for ($i=1; $i<=3; $i++) { open $input1, '<', "$path\/File-${molec1}-cluster${i}.out" or die $! +; push @ARGV, "$path\/File-${molec1}-cluster${i}.out"; } for ($j=1; $j<=2; $j++) { open $input2, '<', "$path\/File-${molec2}-cluster${j}.out" or die +$!; push @ARGV, "$path\/File-${molec2}-cluster${j}.out"; } print "@ARGV \n"; # for testing; indeed correct files are printed ## now split and print my @list; my $list; my $a; my $b; while ($line = <>) { $CountLines += 1; next unless $. > $nextUnless; chomp $line; $line =~ s/^\s+|\s+$//g; push @list, [split/\s+/, $line]; @columns = split /\s+/, $line; print "@columns \n"; ## problem: prints all columns from specifi +ed files but ignores the $nextUnless specification - why? } close $input1; close $input2; for ($a=0; $a<=$#columns; $a++) { for ($b=0; $b<=$#columns; $b++) { print "$list[$a][$b] "; ## problem: prints only first file ($i +=1) } print " \n"; }

Replies are listed 'Best First'.
Re: reading files in @ARGV doesn't return expected output
by Laurent_R (Canon) on Jun 27, 2017 at 09:04 UTC
    Hi fasoli,

    a few comments on your script (some of which have already been made by other monks).

    In your two nested for loops, you're opening files using $input1 and $input2 file handles, but you're never using these file handles. So this is useless. Even if it were not useless (i.e. if you had some code to read from them), it would still not work properly, because opening a file handle with the same name would close the one you've opened before.

    With your nested for loops, you're using the second set of data 3 times (one for each iteration over the first set of data). That's certainly not what you want. And that's why you don't get your expected result when printing @ARGV. You most probably need separate loops.

    You don't need to escape the slashes in your path.

    Even though it is possible to do so, you should probably not use the while ($line = <>) { syntax because it confuses matters. In particular, $. will not be properly reset, so that your $nextUnless conditional will work properly only for the first file.

    The @columns array will be clobbered with new values each time through the while loop. You may not care if you only want to print it out in the next line, but since you're using @column in the final for loop, I stringly suspect this is wrong. I am not sure, however what you need there, since you haven't shown what your data looks like.

    The Data::Dumper module may help you check the content of your data structures.

    This is an attempt at correcting the first part of your program (the part managing files), I can't help you with the second part without having seen the data (and the expected result).

    use strict; use warnings; use Data::Dumper; my $molec1 = "molec1"; my $molec2 = "molec2"; my $input1; my $input2; my $path = "/store/group/comparisons"; my @files; my $line; my @columns; my $nextUnless = 2; # nr of lines to skip my $CountLines = 0; # total nr of lines in all files for my $i (1..3) { push @files, "$path/File-${molec1}-cluster${i}.out"; } for my $j (1..2) { push @iles, "$path/File-${molec2}-cluster${j}.out"; } print "@files \n"; # for testing; are correct files printed? ## now split and print my @list; for my $file (@files) { open my $FH, "<", $file or die "Cannot open $file $!" while ($line = <$FH>) { $CountLines += 1; next unless $. > $nextUnless; chomp $line; $line =~ s/^\s+|\s+$//g; push @list, [split/\s+/, $line]; @columns = split /\s+/, $line; # this is most probably wrong, m +ay be you need to push a reference, as you did in the previous line } close $FH; } print Dumper \@list; # check that the content of @list is what you exp +ect # ...
    In brief, I haven't tested that (not possible without any data), and this is not the end of it (you need to fix the thing about @columns in the while loop and probably to change the last for loop, but this should get you much closer to what you need.

    Please, PLEASE, show a sample of your data (and expected result).

    Update: I had forgotten to change @ARGV to @files in the second ($j) for loop. Fixed now.

      Thank you so much for the effort, I haven't had a chance to look at your reply properly yet as I'm burning up with fever so probably not the best time to try and concentrate on scripting.

      My data files are plain text files files that contain text and numbers stored in matrices, they look like this:

      #title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9

      they're not necessarily 5 lines long, this is an example. The actual matrices are much bigger, I think my biggest file is a 85x85 matrix.

      What I want to do is perform some mathematical calculations on the combination of matrices $i and $j. Get the averages of those matrices, their deviation etc. I haven't included this bit of the code yet but (fingers crossed) it works.

      So I want to calculate the deviation between all matrices from file $i and files $j. The end result would be i) another matrix but this time with the average values or ii) a single value, for the deviation. But like I said this bit of the code isn't shown here as I tried to keep it to a minimum - if I fail in opening and splitting them in columns then I won't be able to move on to the next bit anyway.

      At this stage I was trying to print them for testing only.

      Hopefully this clears things up a bit and I honestly hope you guys will continue giving me feedback as I'm beyond stuck. Now off to read all answers from the top and hopefully understand what I've been messing up.

      This website is an immensely helpful resource, thank you everyone. I sure wish one day I'll know enough to help others with their scripting problems :D

        If all you wanted to do in your last nested for loop is to print out the data, then you don't need it, since this line:
        print Dumper \@list; # check that the content of @list is what you exp +ect
        which I have put at the very end of my suggested code will output your @list data structure in a nice and clear format.

        So you could try to run my code and see if you get what you want.

        As an aside, fasoli, you said,

        ... perform some mathematical calculations on the combination of matrices ... I haven't included this bit of the code yet but (fingers crossed) it works.

        When Marshall posted his example matrix_transpose, I was reminded that I wanted to point out: instead of crossing your fingers that your (or Marshall's) roll-your-own-code truly works, there are plenty of modules and families of modules that will do the matrix math and have been fully tested across edge cases. Math::MatrixReal, Math::GSL, and PDL::Matrix are three such well-tested Matrix modules. It is probably worth your time to try out one or more of those -- their math has been checked thoroughly over the years, and they are likely to run faster (Benchmark), too.

      Sorry just reading this more carefully now after I ran it. Am I correct in understanding that it only loops through my $i files? The $j files are pushed in @ARGV and are just left there. My problem all along was how to read both $i and $j files and split them using the while $line loop, so that I can move on to the maths later on. Your code seems to only deal with the $i files and loop through them, so how will I be able to build on this to get the $i-$j deviations later on if the $j files are completely ignored? I'm sorry, it's probably there and I'm just not getting it aren't I?
        Sorry, I forgot to change it in the second for loop, which should be:
        for my $j (1..2) { push @files, "$path/File-${molec2}-cluster${j}.out"; }
Re: reading files in @ARGV doesn't return expected output
by stevieb (Canon) on Jun 26, 2017 at 23:21 UTC

    My mind is very much elsewhere at the moment but if nobody else has the time to review and answer in the next bit I'll revisit. That said, why are you using @ARGV like that? This special variable is an array that contains values sent in on the command line; eg:

    if (! @ARGV){ die "need an argument..."; } my $username = $ARGV[1];

    etc. Then, your command line command:

    perl script.pl STEVIEB

    Use a lexical array instead. Using @ARGV the way you are is kind of a head-scratcher when initially glancing at the code.

      Seems to be an attempt to not use explicit open() by use of while ( ... = <> ){ } later in the code.

        Indeed, but it's not typical usage, and if I were to see something like that in a large project (something I deal with frequently) I'd either be confused, second guessing things immediately or wondering wtf else is odd like this that could be leading to issues :)

Re: reading files in @ARGV doesn't return expected output
by Cristoforo (Curate) on Jun 26, 2017 at 23:53 UTC
    ## problem: prints all columns from specified files but ignores the $nextUnless specification - why?
    The reason for this is that the line number $. doesn't reset for all the files (in @ARGV) unless you close ARGV at the end of each file.

    You need to add this line as the last line in your 'while' loop'

    close ARGV if eof;

    UPDATE: This only addresses a small part of the problem. Looking at your code above, why are you opening a file (and not use it), then push that file to '@ARGV'?

Re: reading files in @ARGV doesn't return expected output
by thanos1983 (Parson) on Jun 27, 2017 at 11:11 UTC

    Hello fasoli

    Why you do not simply search and get all files (do not load on @ARGV) just in a list and then simply read them in an array (remove 2nd line) and from there what ever you want.

    Since we do not have input data I used dummy text for representation purposes.

    #!usr/bin/perl use strict; use warnings; use Data::Dumper; use File::Find::Rule; # use Benchmark qw(:all) ; # WindowsOS # use Benchmark::Forking qw( timethese cmpthese ); # LinuxOS sub get_files { my @dirs = ('.'); my $level = shift // 2; my @files = File::Find::Rule->file() ->name('*.out') ->maxdepth($level) ->in(@dirs); return @files; } my @files = get_files(); # print Dumper \@files if @files; # open files and load them into an array my %HoA; foreach my $path_to_file (@files) { open my $fh, '<', $path_to_file; chomp(my @lines = <$fh>); splice @lines, 1, 1; $HoA{$path_to_file} = \@lines; close $fh; } print Dumper \%HoA; __END__ $ perl test.pl $VAR1 = { 'mySubDir/molec1-cluster2.out' => [ 'This is a sample line 1 + file 2', 'This is a sample line 3 + file 2' ], 'mySubDir/molec1-cluster1.out' => [ 'This is a sample line 1 + file 1', 'This is a sample line 3 + file 1' ] }; # file 1 $ cat mySubDir/molec1-cluster1.out This is a sample line 1 file 1 This is a sample line 2 (to be skipped) file 1 This is a sample line 3 file 1 # file 2 $ cat mySubDir/molec1-cluster2.out This is a sample line 1 file 2 This is a sample line 2 (to be skipped) file 2 This is a sample line 3 file 2

    I am using File::Find::Rule to retrieve all the files that I want to process, I also use Data::Dumper to debug and view the data, I also use splice to remove the second line from the array and finally I push all data in HASHES OF ARRAYS.

    I would recommend to read through these links that I provided you, if you continue developing you will use them very often.

    A minor note, do not re-post questions trying to read files in @ARGV but getting GLOB error! :( as Anonymous Monk noticed. Just update your question or reply on our comments and we will try to assist you resolve your problem.

    Hope this helps, let us know, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: reading files in @ARGV doesn't return expected output
by zentara (Archbishop) on Jun 27, 2017 at 13:06 UTC
    print "$list$a$b "; prints only first file and nothing else

    I don't know if you are aware of it, but one obscure rule in Perl is do not use $a and $b as variables, because they interfere with Perl's sort function. I don't think it's a problem here, but it could be that sort is invoked internally, and your $a and $b are getting messed up. Try changing those variable names.


    I'm not really a human, but I play one on earth. ..... an animated JAPH
      Good catch. No it hasn't created a problem here yet, thankfully. I will change them though. Thanks!
Re: reading files in @ARGV doesn't return expected output
by Anonymous Monk on Jun 26, 2017 at 23:54 UTC
    Show some sample input and expected output? Also, don't throw everything at Perl at once in frustration and hope something sticks, break the code down into small bits and test each part and print debug stuff for each, Data::Dumper is your friend. Basic debugging checklist

    If you don't get @ARGV then don't use it, magic <> isn't the only way to read files, when in doubt spell it out:

    my @file_list = ( "some_file", "other_file" ); push @file_list, "more_files", "and_more"; for my $file (@file_list) { open my $fh, '<', $file or die "$file: $!"; while (my $line = <$fh>) { chomp $line; # ... } close $fh; }
    And please don't define your vars before you need them - not my $i; ... for ($i=1;..., do for (my $i=1;... - makes the code hard to read, lots of distractions, and you'll have scoping problems later
Re: reading files in @ARGV doesn't return expected output
by Anonymous Monk on Jun 27, 2017 at 00:17 UTC
      Sorry, it was meant to be an update as I solved the GLOB error and now I have one million other problems. I thought it would be better to update my question but maybe that was wrong since yes essentially they are duplicates.
Re: reading files in @ARGV doesn't return expected output
by Marshall (Canon) on Jun 27, 2017 at 20:34 UTC
    Hi fasoli,

    1) Using @ARGV

    I read through lots of posts in this thread. It appears to me that this "requirement" of using @ARGV for multiple files is just a red-herring and makes your job more difficult than it needs to be. There are many woes that this approach can present.

    I think that if you had a single file which contained all the file names that you needed to process, that that would be sufficient? Read one file name from the command line @ARGV, open that file, then get all the other file names from there.

    2) Data Struture for a 2-D Matrix

    A Perl 2-D matrix is constructed of an array of references to row arrays. This is analogous to how dynamically created arrays in 'C' work. You should study Perl Data Structures carefully. It sounds like you need arrays of matricies? That adds a third dimension. Talking about that is pointless until you understand 2-D structures. The 3-D structure would be an array of references to 2-D matricies.

    3) Reading and transposing one of your Matricies

    See attached code for some ideas. Use a simple regex to skip non-data lines. I don't see the need to mess with the $. variable. This does have its uses but, I don't see it here. I don't see any big performance issues looming, an 85x85 matrix is not that big. the real problem is getting code that works.

    #!/usr/bin/perl use strict; use warnings; use Inline::Files; use Data::Dumper; my @matrix; while (defined (my $line=<DATA>)) { next unless $line =~ /^\s*\d/; # skip blanks or comments my @row = split ' ',$line; push @matrix, [@row]; } print "input matrix:\n"; dump_matrix(\@matrix); my $transposed_ref = transpose_matrix (\@matrix); print "\ntransposed matrix:\n"; dump_matrix($transposed_ref); # a "ragged" matrix would have differing number of # elements in each row. This won't work for that. sub transpose_matrix # only for "non-ragged" matricies { my $AoAref = shift; my @input_matrix = @$AoAref; #local copy to destroy my @result_matrix; while (defined $input_matrix[0][0]) { my @new_row; foreach my $row_ref (@input_matrix) { push @new_row, (shift @$row_ref); } push @result_matrix, [@new_row]; } # at this point, @input_matrix is an array of references # to empty rows, rows still "exist" but have no data in them print "At end of transpose, local copy of input_matrix is like this +:\n"; print Dumper \@input_matrix; return \@result_matrix; } sub dump_matrix { my $AoAref = shift; foreach my $row_ref (@$AoAref) { print "@$row_ref\n"; } } =Prints input matrix: 1 2 3 4 5 6 7 8 9 At end of transpose, local copy of input_matrix is like this: $VAR1 = [ [], [], [] ]; transposed matrix: 1 4 7 2 5 8 3 6 9 =cut __DATA__ #title line - (skipped it with $nextUnless) #title line - (skipped it with $nextUnless) 1 2 3 4 5 6 7 8 9
    Update:
    • A crucial concept is how matrix like structures are stored in Perl.
    • A 2-D array is stored as an array of references to arrays.
    • Each element of the 2-D structure represents a "row" via a reference to an array of the row's elements.
    • It is possible access a multi-dimensional structure by using the references to the rows to get entire rows. Or you can use the [$row][$col] notation to get individual elements.
    • If you have a bunch of 2-D matricies and you want to do operations between pairs of them, you will need subroutines that take references to those 2-D arrays.
    I wrote some more code that I think you will need to understand in order for your project to succeed. This uses some of the routines in the previous code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1193642]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2024-03-28 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found