Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

A file parsing and 2D array/matrix problem.

by zing (Beadle)
on Aug 25, 2012 at 12:51 UTC ( [id://989718]=perlquestion: print w/replies, xml ) Need Help??

zing has asked for the wisdom of the Perl Monks concerning the following question:

I am stuck with this complicated problem. I have a list

** LIST**

substrate[s]: 3649 product[s]: 3419 3648 substrate[s]: 3645 product[s]: 3647 substrate[s]: 3659 product[s]: 3647 substrate[s]: 3675 product[s]: 3674 substrate[s]: 3674 product[s]: 3490 3489 substrate[s]: 3489 product[s]: 3490 substrate[s]: 3490 product[s]: 3485 substrate[s]: 3485 product[s]: 3486 substrate[s]: 3486 product[s]: 3488 substrate[s]: 3488 product[s]: 3487 substrate[s]: 3487 product[s]: 3877 substrate[s]: 3877 product[s]: 3419 substrate[s]: 3182 product[s]: 1875 substrate[s]: 2809 product[s]: 3182 substrate[s]: 3186 product[s]: 2809
Now I have a superlist each of substrate & product as:-

**SUPERLIST_SUBSTRATE**

substrate[s]: 3649 substrate[s]: 3645 substrate[s]: 3659 substrate[s]: 3675 substrate[s]: 3674 substrate[s]: 3489 substrate[s]: 3490 substrate[s]: 3485 substrate[s]: 3486 substrate[s]: 3488 substrate[s]: 3487 substrate[s]: 3877 substrate[s]: 3182 substrate[s]: 2809 substrate[s]: 3186 substrate[s]: 3675 substrate[s]: 3492 substrate[s]: 3314 substrate[s]: 3006 substrate[s]: 3049

**SUPERLIST_PRODUCT**

product[s]: 3419 product[s]: 3648 product[s]: 3489 product[s]: 3647 product[s]: 3647 product[s]: 3674 product[s]: 3490 product[s]: 3490 product[s]: 3485 product[s]: 3486 product[s]: 3488 product[s]: 3487 product[s]: 3877 product[s]: 3419 product[s]: 1875 product[s]: 3182 product[s]: 2809 product[s]: 3492 product[s]: 3186 product[s]: 3492 product[s]: 1825 product[s]: 2543
The superlist_product and superlist_substrate will encompass all the possible substrates & products in LIST. ie. substrate(LIST) is a subset of superlist_substrate and similarly for product(LIST). Now i want to create a SUPERARRAY as superlist_substrate(rows) X superlist_product(columns). Now parse the LIST for each substrate id one by one insert a "1" for each product id in the SUPERARRAY. For example consider first two lines of LIST

substrates: 3649

products: 3419 3648

So for substrate id 3649 ,the row id=3649 will be selected from SUPERARRAY and a "1" will be inserted at column ids 3419 & 3648 of the SUPERARRAY. And so on for the entire LIST.Basically SUPERARRAY will be a matrix.

Replies are listed 'Best First'.
Re: A complicated file parsing and 2D array/matrix problem.
by roboticus (Chancellor) on Aug 25, 2012 at 13:05 UTC

    zing:

    You oversold your problem. It's a very basic question, not a complicated one. So beginners may skip over your question, thinking that it'll need complex methods and/or algorithms, so you'll lose some possible responses. When others see the question, they'll see that it's quite simple and not interesting enough to solve, so you'll lose a lot of other possible responses.

    I could just drop some code on you, but that wouldn't be very educational. What have you tried? What algorithm(s) have you considered? For simple problems, many monks want to see you put some effort into it: i.e. an attempt to solve the problem. I don't really see any effort expended, and we're not a code-writing service. If you had a problem with some code, I'd've offered some specific suggestions. If it were truly a complicated problem, and interesting enough, I might've spent some time playing with it.

    Having said all that, here are a couple of hints for you:

    First, if it's going to be a sparse matrix, you might consider a hash instead of an array.

    You can build the structure as you read it, rather than building two arrays and then merging the arrays.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thanks for suggestions. This is what I have tried, but am not able to go past this.
      my $file = "LIST"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @hsa_sub = <FH>; for($x=0;$x<=$#hsa_sub;$x++) { $s=$hsa_sub[$x]; $sline=`cat siper_sub | grep -n $s`; `sed '/substrate/!d; /$s/!d;{n;p}' everything_hsa | grep -i produc +t | awk {'print $2,$3,$4'} >> prodlist` my $fil = "prodlist"; open (FH, "< $fil") or die "Can't open $fil for read: $!"; my @prod.$s.$x = <FH>; }
        I'd rather not use sed and awk WITHIN a perl program. As far as your problem is concerned, perl would provide all facilities required.
      Hi guys. I have finally managed to assemble this code with some help.
      my %supermatrix; my $list = "LIST.txt"; open my $DATA, '<', $list or die "unable to open my list $list $!\n"; print "\nWORKING ON $list\n"; print `head $list`; while (my $line = <$DATA>) { chomp $line; my $substrate = 1 if $line =~ /substrate.*(\d+)$/; print "SUBSTRATE ===$substrate"; $line = <DATA>; # fetch next line print "PRODUCT====$line"; my @products; (undef, @products) = split / /, $line; foreach my $prod (@products) { $supermatrix{$substrate}{$prod} = 1; print "combination $substrate $prod exists ! \n" if exists $ +supermatrix{$substrate}{$prod}; } }

      But there are two problems :-

      1.] Its not printing any output.

      2.] My list would at max be ~5 to 10% smaller than superarrays(whereas this code will handle output data sparsely). Also I need the output in such a format that there are blanks wherever there isnt a "1".

      Actually the quality of the output depends on the number of blanks also, because this output will be then compared to other 20 such outputs. So in that sense the position of "blanks" and "1" is equally important.

        I think you've done well in your attempt to create a matrix to represent your data set. Perhaps the following will help further your efforts:

        use Modern::Perl; use File::Slurp qw/read_file/; use Text::Table; use Data::Dumper; my ( %supermatrix, @titles, %seen, @rows ); my @list = read_file 'LIST.txt'; for ( my $i = 0 ; $i < $#list + 1 ; $i += 2 ) { my ($substrateID) = $list[$i] =~ /(\d+)/g; $supermatrix{$substrateID}{$1} = 1 while $list[ $i + 1 ] =~ /(\d+) +/g; } for my $product ( read_file 'SUPERLIST_PRODUCT.txt' ) { my ($productID) = $product =~ /(\d+)/g; push @titles, $productID unless $seen{$productID}++; for my $substrate ( read_file 'SUPERLIST_SUBSTRATE.txt' ) { my ($substrateID) = $substrate =~ /(\d+)/g; $supermatrix{$substrateID}{$productID} //= '.'; } } my $titles = join ',', map "{title => 'p$_', align_title => 'center', align => 'center'}", sort { $a <=> $b } @titles; for my $y ( sort { $a <=> $b } keys %supermatrix ) { #rows my ( $rowLable, @row ); for my $x ( sort { $a <=> $b } keys %{ $supermatrix{$y} } ) { # +columns $rowLable = $y unless $rowLable; push @row, $supermatrix{$y}{$x}; } push @rows, [ "s$rowLable", @row ]; } my $tb = Text::Table->new( ' ', eval $titles ); $tb->load(@rows); say $tb; say "\n", Dumper \%supermatrix;

        Partial output:

        p1825 p1875 p2543 p2809 p3182 p3186 p3419 p3485 p3486 p3487 p348 +8 p3489 p3490 p3492 p3647 p3648 p3674 p3877 s2809 . . . . 1 . . . . . . + . . . . . . . s3006 . . . . . . . . . . . + . . . . . . . s3049 . . . . . . . . . . . + . . . . . . . s3182 . 1 . . . . . . . . . + . . . . . . . s3186 . . . 1 . . . . . . . + . . . . . . . s3314 . . . . . . . . . . . + . . . . . . . s3485 . . . . . . . . 1 . . + . . . . . . . s3486 . . . . . . . . . . 1 + . . . . . . . s3487 . . . . . . . . . . . + . . . . . . 1 s3488 . . . . . . . . . 1 . + . . . . . . . s3489 . . . . . . . . . . . + . 1 . . . . . s3490 . . . . . . . 1 . . . + . . . . . . . s3492 . . . . . . . . . . . + . . . . . . . s3645 . . . . . . . . . . . + . . . 1 . . . s3649 . . . . . . 1 . . . . + . . . . 1 . . s3659 . . . . . . . . . . . + . . . 1 . . . s3674 . . . . . . . . . . . + 1 1 . . . . . s3675 . . . . . . . . . . . + . . . . . 1 . s3877 . . . . . . 1 . . . . + . . . . . . . $VAR1 = { '3182' => { '1825' => '.', '3182' => '.', '3877' => '.', '3647' => '.', '3489' => '.', '3419' => '.', '2809' => '.', '3488' => '.', '1875' => 1, '2543' => '.', '3492' => '.', '3485' => '.', '3186' => '.', '3487' => '.', '3648' => '.', '3674' => '.', '3490' => '.', '3486' => '.' }, ...

        Values are stored in the hash as follows: $supermatrix{substrateID}{productID}, where substrateIDs name the rows, and productIDs name the columns.

        The hash is first initialized using the data from LIST.txt, assigning 1 to each substrateID/productID 'location.' The next, nested for loops complete the matrix, assigning a '.' to undefined substrateID/productID 'locations' (the '.' was used so the matrix could be visualized).

        Finally, the matrix is printed, followed by a dump of the hash.

        Hope this helps!

      Please help me on this. I cant go beyond what I have tried (posted the code).
Re: A complicated file parsing and 2D array/matrix problem.
by moritz (Cardinal) on Aug 25, 2012 at 12:58 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://989718]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-03-28 13:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found