### Re: A complicated file parsing and 2D array/matrix problem.

by roboticus (Chancellor)
 on Aug 25, 2012 at 13:05 UTC ( #989721=note: print w/replies, xml ) Need Help??

in reply to A file parsing and 2D array/matrix problem.

zing:

You oversold your problem. It's a very basic question, not a complicated one. So beginners may skip over your question, thinking that it'll need complex methods and/or algorithms, so you'll lose some possible responses. When others see the question, they'll see that it's quite simple and not interesting enough to solve, so you'll lose a lot of other possible responses.

I could just drop some code on you, but that wouldn't be very educational. What have you tried? What algorithm(s) have you considered? For simple problems, many monks want to see you put some effort into it: i.e. an attempt to solve the problem. I don't really see any effort expended, and we're not a code-writing service. If you had a problem with some code, I'd've offered some specific suggestions. If it were truly a complicated problem, and interesting enough, I might've spent some time playing with it.

Having said all that, here are a couple of hints for you:

First, if it's going to be a sparse matrix, you might consider a hash instead of an array.

You can build the structure as you read it, rather than building two arrays and then merging the arrays.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

• Comment on Re: A complicated file parsing and 2D array/matrix problem.

Replies are listed 'Best First'.
Re^2: A complicated file parsing and 2D array/matrix problem.
by zing (Beadle) on Aug 25, 2012 at 13:10 UTC
Thanks for suggestions. This is what I have tried, but am not able to go past this.
```my \$file = "LIST";
open (FH, "< \$file") or die "Can't open \$file for read: \$!";
my @hsa_sub = <FH>;

for(\$x=0;\$x<=\$#hsa_sub;\$x++)
{
\$s=\$hsa_sub[\$x];
\$sline=`cat siper_sub | grep -n \$s`;
`sed '/substrate/!d; /\$s/!d;{n;p}' everything_hsa | grep -i produc
+t | awk {'print \$2,\$3,\$4'} >> prodlist`

my \$fil = "prodlist";
open (FH, "< \$fil") or die "Can't open \$fil for read: \$!";
my @prod.\$s.\$x = <FH>;

}
I'd rather not use sed and awk WITHIN a perl program. As far as your problem is concerned, perl would provide all facilities required.
Re^2: A complicated file parsing and 2D array/matrix problem.
by zing (Beadle) on Aug 25, 2012 at 19:01 UTC
Hi guys. I have finally managed to assemble this code with some help.
```my %supermatrix;
my \$list = "LIST.txt";
open my \$DATA, '<', \$list or die "unable to open my list \$list \$!\n";
print "\nWORKING ON \$list\n";
while (my \$line = <\$DATA>)  {
chomp \$line;

my \$substrate = 1 if \$line =~ /substrate.*(\d+)\$/;
print "SUBSTRATE ===\$substrate";
\$line = <DATA>; # fetch next line
print "PRODUCT====\$line";
my @products;
(undef, @products) = split / /, \$line;
foreach my \$prod (@products) {
\$supermatrix{\$substrate}{\$prod} = 1;

print "combination \$substrate \$prod exists ! \n" if exists \$
+supermatrix{\$substrate}{\$prod};
}
}

But there are two problems :-

1.] Its not printing any output.

2.] My list would at max be ~5 to 10% smaller than superarrays(whereas this code will handle output data sparsely). Also I need the output in such a format that there are blanks wherever there isnt a "1".

Actually the quality of the output depends on the number of blanks also, because this output will be then compared to other 20 such outputs. So in that sense the position of "blanks" and "1" is equally important.

I think you've done well in your attempt to create a matrix to represent your data set. Perhaps the following will help further your efforts:

```use Modern::Perl;
use Text::Table;
use Data::Dumper;

my ( %supermatrix, @titles, %seen, @rows );

for ( my \$i = 0 ; \$i < \$#list + 1 ; \$i += 2 ) {
my (\$substrateID) = \$list[\$i] =~ /(\d+)/g;
\$supermatrix{\$substrateID}{\$1} = 1 while \$list[ \$i + 1 ] =~ /(\d+)
+/g;
}

for my \$product ( read_file 'SUPERLIST_PRODUCT.txt' ) {
my (\$productID) = \$product =~ /(\d+)/g;
push @titles, \$productID unless \$seen{\$productID}++;

for my \$substrate ( read_file 'SUPERLIST_SUBSTRATE.txt' ) {
my (\$substrateID) = \$substrate =~ /(\d+)/g;
\$supermatrix{\$substrateID}{\$productID} //= '.';
}
}

my \$titles = join ',',
map "{title => 'p\$_', align_title => 'center', align => 'center'}",
sort { \$a <=> \$b } @titles;

for my \$y ( sort { \$a <=> \$b } keys %supermatrix ) {    #rows
my ( \$rowLable, @row );

for my \$x ( sort { \$a <=> \$b } keys %{ \$supermatrix{\$y} } ) {    #
+columns
\$rowLable = \$y unless \$rowLable;
push @row, \$supermatrix{\$y}{\$x};
}
push @rows, [ "s\$rowLable", @row ];
}

my \$tb = Text::Table->new( ' ', eval \$titles );
say \$tb;

say "\n", Dumper \%supermatrix;

Partial output:

```      p1825 p1875 p2543 p2809 p3182 p3186 p3419 p3485 p3486 p3487 p348
+8 p3489 p3490 p3492 p3647 p3648 p3674 p3877
s2809   .     .     .     .     1     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3006   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3049   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3182   .     1     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3186   .     .     .     1     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3314   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3485   .     .     .     .     .     .     .     .     1     .     .
+    .     .     .     .     .     .     .
s3486   .     .     .     .     .     .     .     .     .     .     1
+    .     .     .     .     .     .     .
s3487   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     1
s3488   .     .     .     .     .     .     .     .     .     1     .
+    .     .     .     .     .     .     .
s3489   .     .     .     .     .     .     .     .     .     .     .
+    .     1     .     .     .     .     .
s3490   .     .     .     .     .     .     .     1     .     .     .
+    .     .     .     .     .     .     .
s3492   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     .     .
s3645   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     1     .     .     .
s3649   .     .     .     .     .     .     1     .     .     .     .
+    .     .     .     .     1     .     .
s3659   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     1     .     .     .
s3674   .     .     .     .     .     .     .     .     .     .     .
+    1     1     .     .     .     .     .
s3675   .     .     .     .     .     .     .     .     .     .     .
+    .     .     .     .     .     1     .
s3877   .     .     .     .     .     .     1     .     .     .     .
+    .     .     .     .     .     .     .

\$VAR1 = {
'3182' => {
'1825' => '.',
'3182' => '.',
'3877' => '.',
'3647' => '.',
'3489' => '.',
'3419' => '.',
'2809' => '.',
'3488' => '.',
'1875' => 1,
'2543' => '.',
'3492' => '.',
'3485' => '.',
'3186' => '.',
'3487' => '.',
'3648' => '.',
'3674' => '.',
'3490' => '.',
'3486' => '.'
},
...

Values are stored in the hash as follows: \$supermatrix{substrateID}{productID}, where substrateIDs name the rows, and productIDs name the columns.

The hash is first initialized using the data from LIST.txt, assigning 1 to each substrateID/productID 'location.' The next, nested for loops complete the matrix, assigning a '.' to undefined substrateID/productID 'locations' (the '.' was used so the matrix could be visualized).

Finally, the matrix is printed, followed by a dump of the hash.

Hope this helps!

Hello Kenosis. Now there's the last piece of puzzle. I have created 5 such matrices (with obviously same number of rows and column). Now the problem is that I have to concatenate (OR logic operation) two such matrices,

INPUT = Two matrices A,B of same row and column saved in text files A.txt and B.txt

OUTPUT = A single matrix C ( Cij = Aij OR Bij )

```==============INPUT=======
MAT - A
1875 2809 3182 3419
2809    -    1    1    -
3182    1    -    -    -
3186    1   1    -    -
3485    -    -    -    -
3486    -    -    -    -

MAT - B
1875 2809 3182 3419
2809    1    -    -    1
3182    -    -    -    -
3186    -    1    1    -
3485    -    -    -    -
3486    -    1    -    1

========== OUTPUT===========
MAT - C
1875 2809 3182 3419
2809    1    1    1    1
3182    1    -    -    -
3186    1    1    1    -
3485    -    -    -    -
3486    -    1    -    1

I.e. an element of matrix will be one if either of the corresponding element of A or B is one.
Kenosis, Theres a small problem. I tried to check whether the code is giving desired output,but it isnt giving the complete result.Consider my

LIST.txt

```
substrate[s]: 1 2
product[s]: 3
substrate[s]: 6 9
product[s]: 8 10
substrate[s]: 3
product[s]: 6
substrate[s]: 9
product[s]: 5
substrate[s]: 5
product[s]: 2
substrate[s]: 3
product[s]: 9
substrate[s]: 8
product[s]: 9
substrate[s]: 8
product[s]: 1
substrate[s]: 7
product[s]: 11
substrate[s]: 19
product[s]: 17
substrate[s]: 14
product[s]: 13
substrate[s]: 14
product[s]: 11
substrate[s]: 18
product[s]: 19
substrate[s]: 7 14
product[s]: 15
substrate[s]: 7 16
product[s]: 7 17
substrate[s]: 5
product[s]: 6
substrate[s]: 18 15
product[s]: 7
substrate[s]: 7 8
product[s]: 8 18
substrate[s]: 6
product[s]: 9
substrate[s]: 11
product[s]: 12
SUPERLIST_SUBSTRATE
```1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SUPERLIST_PRODUCT
```1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
********OUTPUT********
```    p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19
s1   .  .  1  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s2   .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s3   .  .  .  .  .  1  .  .  1  .   .   .   .   .   .   .   .   .   .
s4   .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s5   .  1  .  .  .  1  .  .  .  .   .   .   .   .   .   .   .   .   .
s6   .  .  .  .  .  .  .  1  1  1   .   .   .   .   .   .   .   .   .
s7   .  .  .  .  .  .  1  1  .  .   1   .   .   .   1   .   1   1   .
s8   1  .  .  .  .  .  .  .  1  .   .   .   .   .   .   .   .   .   .
s9   .  .  .  .  1  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s10  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s11  .  .  .  .  .  .  .  .  .  .   .   1   .   .   .   .   .   .   .
s12  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s13  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s14  .  .  .  .  .  .  .  .  .  .   1   .   1   .   .   .   .   .   .
s15  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s16  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s17  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s18  .  .  .  .  .  .  1  .  .  .   .   .   .   .   .   .   .   .   1
s19  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   1   .   .
Consider first two lines of LIST.txt -
``` substrate[s]: 1 2
product[s]: 3
. Then there is a "1" s1-p3, but there isn;t a "1" for s2-p3. ################DESIRED OUTPUT -- Places marked with "X" need to be "1" #################
```
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19
s1   .  .  1  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s2   .  .  X  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s3   .  .  .  .  .  1  .  .  1  .   .   .   .   .   .   .   .   .   .
s4   .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s5   .  1  .  .  .  1  .  .  .  .   .   .   .   .   .   .   .   .   .
s6   .  .  .  .  .  .  .  1  1  1   .   .   .   .   .   .   .   .   .
s7   .  .  .  .  .  .  1  1  .  .   1   .   .   .   1   .   1   1   .
s8   1  .  .  .  .  .  .  X  1  X   .   .   .   .   .   .   .   X   .
s9   .  .  .  .  1  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s10  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s11  .  .  .  .  .  .  .  .  .  .   .   1   .   .   .   .   .   .   .
s12  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s13  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s14  .  .  .  .  .  .  .  .  .  .   1   .   1   .   X   .   .   .   .
s15  .  .  .  .  .  .  X  .  .  .   .   .   .   .   .   .   .   .   .
s16  .  .  .  .  .  .  X  .  .  .   .   .   .   .   .   .   X   .   .
s17  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   .   .   .
s18  .  .  .  .  .  .  1  .  .  .   .   .   .   .   .   .   .   .   1
s19  .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .   1   .   .
Re^2: A complicated file parsing and 2D array/matrix problem.
by zing (Beadle) on Aug 25, 2012 at 14:19 UTC

Create A New User
Node Status?
node history
Node Type: note [id://989721]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2017-07-24 08:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
I came, I saw, I ...

Results (348 votes). Check out past polls.