Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

sorting columns and getting the largest ansd smallest values in each column

by $new_guy (Acolyte)
on May 10, 2011 at 14:57 UTC ( #903992=perlquestion: print w/ replies, xml ) Need Help??
$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I have read how to sort columns on this node:

http://www.perlmonks.org/?node_id=674374

I am trying to sort my columns in a similar manner. I have tried implementing the subroutine but it doesn't work. My idea was to first sort the columns and then to shift and pop each sorted column to get the largest and smallest value in each column. For each column I want to print the largest and smallest value, just as I have printed the sum of the entries in each column.

#!/usr/bin/perl #use strict; #use warnings; use integer; use Math::Matrix; ##you will print results but first remove any previous files my $fcgs = "final_cg_size.txt"; if (unlink($fcgs) == 1) { print "Existing \"final_cg_size.txt\" file was removed\n"; } #now make a file for the final core genome sizes output my $output_fcgs = "final_cg_size.txt"; if (! open(FCGS, ">>$output_fcgs") ) { print "Cannot open file \"$output_fcgs\" to writ +e to!!\n\n"; exit; } open IN, 'cg_size.txt' or die$!; my @colSum; while(<IN>){ chomp($_); our @array =(); our @colArray = split(/\s/,$_); for(my $i = 0; $i <= $#colArray; $i++){ $colSum[$i] += int($colArray[$i]); } for(our $i = 0; $i <= $#colArray; $i++){ $error_bars[$i] = $colArray[$i]; } my $sorted = sortSub(@colArray, 0, 1, 3); # print "$error_bars[16] \n"; # prints elements of col +umn 16 # print "@error_bars \n"; # not what I want print "***************************\n"; } print "\n"; foreach my $ j(0 .. $#colSum){ print "The sum of column",$j+1," is: ",$colSum[$j],"\n"; } sub sortSub { my @array = @{ +shift }; my @sorted = sort { for my $ix ( @_ ) { my $cmp = $a->[$ix] <=> $b->[$ix]; return $cmp if $cmp; } return 0; } @array; return \@sorted; } _FILE_ 2124 1827 1702 1710 1658 1637 1676 1630 1602 1607 1579 1558 1575 1550 +1548 1540 1537 2058 1771 1776 1701 1645 1672 1615 1599 1609 1583 1570 1587 1578 1563 +1555 1541 1537 2237 1864 1684 1673 1633 1667 1610 1624 1591 1589 1593 1576 1565 1570 +1559 1539 1537 2056 1845 1748 1693 1679 1670 1663 1638 1590 1564 1564 1573 1571 1545 +1542 1543 1537 2056 1808 1755 1689 1684 1663 1627 1581 1594 1580 1580 1616 1559 1568 +1541 1539 1537 1957 1889 1769 1667 1697 1669 1637 1631 1584 1621 1576 1570 1569 1552 +1547 1542 1537 2181 1991 1706 1715 1654 1686 1605 1622 1594 1597 1576 1564 1567 1560 +1567 1540 1537 2056 1873 1803 1711 1715 1669 1630 1619 1578 1604 1584 1568 1557 1556 +1544 1549 1537 1996 1915 1746 1690 1685 1630 1625 1589 1609 1593 1569 1562 1574 1555 +1548 1541 1537 2181 1875 1810 1728 1638 1625 1628 1621 1614 1587 1576 1558 1552 1557 +1543 1541 1537

Comment on sorting columns and getting the largest ansd smallest values in each column
Select or Download Code
Re: sorting columns and getting the largest ansd smallest values in each column
by LanX (Canon) on May 10, 2011 at 15:13 UTC
Re: sorting columns and getting the largest ansd smallest values in each column
by Marshall (Prior) on May 10, 2011 at 18:54 UTC
    Another approach is to transpose the columns into rows as you read them in. Then use List::Util functions of min,max,sum on each row which of course used to be the columns.

    note: pp is a bit funky as it prints to STDERR instead of STDOUT, I adjusted print order to handle that.

    #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); use List::Util qw(first max maxstr min minstr reduce shuffle sum); my @cols; while (<DATA>) { my $i=0; #transpose cols to rows foreach (split) { push @{$cols[$i++]}, $_; } } print STDERR "Now the rows of @cols are the columns:\n"; pp \@cols; foreach my $colref (@cols) { my @column = @$colref; print "smallest: ",min(@column), " largest: ",max(@column), " sum: ",sum(@column),"\n"; } =PRINTS ################### Now the rows of @cols are the columns: [ [2124, 2058, 2237, 2056, 2056, 1957, 2181, 2056, 1996, 2181], [1827, 1771, 1864, 1845, 1808, 1889, 1991, 1873, 1915, 1875], [1702, 1776, 1684, 1748, 1755, 1769, 1706, 1803, 1746, 1810], [1710, 1701, 1673, 1693, 1689, 1667, 1715, 1711, 1690, 1728], [1658, 1645, 1633, 1679, 1684, 1697, 1654, 1715, 1685, 1638], [1637, 1672, 1667, 1670, 1663, 1669, 1686, 1669, 1630, 1625], [1676, 1615, 1610, 1663, 1627, 1637, 1605, 1630, 1625, 1628], [1630, 1599, 1624, 1638, 1581, 1631, 1622, 1619, 1589, 1621], [1602, 1609, 1591, 1590, 1594, 1584, 1594, 1578, 1609, 1614], [1607, 1583, 1589, 1564, 1580, 1621, 1597, 1604, 1593, 1587], [1579, 1570, 1593, 1564, 1580, 1576, 1576, 1584, 1569, 1576], [1558, 1587, 1576, 1573, 1616, 1570, 1564, 1568, 1562, 1558], [1575, 1578, 1565, 1571, 1559, 1569, 1567, 1557, 1574, 1552], [1550, 1563, 1570, 1545, 1568, 1552, 1560, 1556, 1555, 1557], [1548, 1555, 1559, 1542, 1541, 1547, 1567, 1544, 1548, 1543], [1540, 1541, 1539, 1543, 1539, 1542, 1540, 1549, 1541, 1541], [1537, 1537, 1537, 1537, 1537, 1537, 1537, 1537, 1537, 1537], ] smallest: 1957 largest: 2237 sum: 20902 smallest: 1771 largest: 1991 sum: 18658 smallest: 1684 largest: 1810 sum: 17499 smallest: 1667 largest: 1728 sum: 16977 smallest: 1633 largest: 1715 sum: 16688 smallest: 1625 largest: 1686 sum: 16588 smallest: 1605 largest: 1676 sum: 16316 smallest: 1581 largest: 1638 sum: 16154 smallest: 1578 largest: 1614 sum: 15965 smallest: 1564 largest: 1621 sum: 15925 smallest: 1564 largest: 1593 sum: 15767 smallest: 1558 largest: 1616 sum: 15732 smallest: 1552 largest: 1578 sum: 15667 smallest: 1545 largest: 1570 sum: 15576 smallest: 1541 largest: 1567 sum: 15494 smallest: 1539 largest: 1549 sum: 15415 smallest: 1537 largest: 1537 sum: 15370 =cut __DATA__ 2124 1827 1702 1710 1658 1637 1676 1630 1602 1607 1579 1558 1575 1550 +1548 1540 1537 2058 1771 1776 1701 1645 1672 1615 1599 1609 1583 1570 1587 1578 1563 +1555 1541 1537 2237 1864 1684 1673 1633 1667 1610 1624 1591 1589 1593 1576 1565 1570 +1559 1539 1537 2056 1845 1748 1693 1679 1670 1663 1638 1590 1564 1564 1573 1571 1545 +1542 1543 1537 2056 1808 1755 1689 1684 1663 1627 1581 1594 1580 1580 1616 1559 1568 +1541 1539 1537 1957 1889 1769 1667 1697 1669 1637 1631 1584 1621 1576 1570 1569 1552 +1547 1542 1537 2181 1991 1706 1715 1654 1686 1605 1622 1594 1597 1576 1564 1567 1560 +1567 1540 1537 2056 1873 1803 1711 1715 1669 1630 1619 1578 1604 1584 1568 1557 1556 +1544 1549 1537 1996 1915 1746 1690 1685 1630 1625 1589 1609 1593 1569 1562 1574 1555 +1548 1541 1537 2181 1875 1810 1728 1638 1625 1628 1621 1614 1587 1576 1558 1552 1557 +1543 1541 1537
      note: pp is a bit funky as it prints to STDERR instead of STDOUT,

      pp prints to STDERR if used in a void context. You can avoid that by using print pp ...;


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: sorting columns and getting the largest ansd smallest values in each column
by jpl (Monk) on May 10, 2011 at 20:19 UTC
    If all you care about is the minimum and maximum value in the column, sorting is overkill. Sorting is an O(NlogN) operation. You can make a single pass (O(N)) through the values in the column, keeping track of the minimum and maximum values.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://903992]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2014-09-20 02:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (151 votes), past polls