Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re-organising entries

by $new_guy (Acolyte)
on Feb 01, 2011 at 11:55 UTC ( #885470=perlquestion: print w/ replies, xml ) Need Help??
$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a question about re-organising entries. Currently ,I have files with entries that look like this:

1 SP_85(IS33, qqq), SP_155(IS33eee) spr_111(ISyyy33, qqq), spr_1 +71(IS33eee) 2 SP_83(S3 , jgjg), SP_32(IS33, jhdjdjd) spr_113(Stty3 , jgjg), +spr_1881(IS33, jhdjdjd) 3 SP_78(3jmdsjkdej), SP_66(IShbdhdhd33) spr_115(3jmhhggggdsjkdej +), spr_1551(IShbdhdjjjhd33), spr_88881(Iyt33ff), spr_145411(Iddd3ff) 4 SP_77(3jmdsjkdej), SP_1485(Idhd33ff) spr_116(3jmdhhhhhsjkdej), + spr_17781(Idhdhhtytyt33ff)

How do I re-organise them so that all entries with a similar name will be in the same column

Here is my desired output:

1 SP_85(IS33, qqq) spr_111(ISyyy33, qqq) 1 SP_155(IS33eee) spr_171(IS33eee) 2 SP_83(S3 , jgjg) spr_113(Stty3 , jgjg) 2 SP_32(IS33, jhdjdjd) spr_1881(IS33, jhdjdjd) 3 SP_66(IShbdhdhd33) spr_115(3jmhhggggdsjkdej) 3 SP_66(IShbdhdhd33) spr_1551(IShbdhdjjjhd33) 3 - spr_88881(Iyt33ff) 3 - spr_145411(Iddd3ff) 4 SP_77(3jmdsjkdej) spr_116(3jmdhhhhhsjkdej) 4 SP_1485(Idhd33ff) spr_17781(Idhdhhtytyt33ff)

Please bear in mind that I have hundreds of columns and thousands of rows

The code i have tried implementing so far:

my $columnFile = q{re-organized.txt}; open my $columnFH, q{<}, $columnFile or die qq{open: < $columnFile: $!\n}; my @results; while ( <$columnFH> ) { my @cols = split; foreach my $idx ( 0 .. $#tests ) { foreach my $subidx ( 0 .. $#{ $tests[ $idx ] } ) { my @posns = split m{,}, $tests[ $idx ]->[ $subidx ]; $results[ $idx ]->[ $subidx ] ++ if scalar @posns == grep { q{z} eq $cols[ $_ ] } @posns; } } } close $columnFH or die qq{close: < $columnFile: $!\n};

Thanks,

$new_guy

Comment on Re-organising entries
Select or Download Code
Re: Re-organising entries
by moritz (Cardinal) on Feb 01, 2011 at 12:09 UTC

    Read the file one line at a time. For each line, split it into separate records, rearrange the records, and print them out.

    Which part exactly are you having problems with? What have you tried so far?

      Hi I have updated my post and I show what i have tried so far. However, it prints each entry into its own column and I end up with only one very long row!
Re: Re-organising entries
by Limbic~Region (Chancellor) on Feb 01, 2011 at 17:02 UTC
    $new_guy,
    This appears to do what you want but your requirements are not very clear so I had to guess.
    #!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] or die "Usage: $0 <input>"; open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!"; while (<$fh>) { chomp; my ($num, @data) = split m|,?\s+(?=sp)|i; die "Need rules for handling odd elements" if @data % 2; for (my $idx = 0; $idx < @data - 3; $idx += 4) { print join("\t", $num, @data[$idx, $idx + 2]), "\n"; print join("\t", $num, @data[$idx + 1, $idx + 3]), "\n"; } if (@data % 4) { print join("\t", $num, "-", $data[-2]), "\n"; print join("\t", $num, "-", $data[-1]), "\n"; } }

    Cheers - L~R

Re: Re-organising entries
by moritz (Cardinal) on Feb 01, 2011 at 17:04 UTC
    Since some of your records contain spaces, a simple call to split with no regex doesn'T do any good. Also you need to special-case the number in the first column, since you want to repeat it.

    Here's my approach (reading from DATA instead of a file handle for convenience):

    use strict; use warnings; while (<DATA>) { chomp; my ($id, @records) = split /\t|(?<=\)),\s+/, $_; my (@left, @right); for my $r (@records) { if ($r =~ /^SP_/) { push @left, $r; } else { push @right, $r; } } while (@left || @right) { print $id, "\t", (shift(@left) || ' - '), ', ', (shift(@right) || ' - '), "\n"; } } __DATA__ 1 SP_85(IS33, qqq), SP_155(IS33eee) spr_111(ISyyy33, qqq), spr_1 +71(IS33eee) 2 SP_83(S3 , jgjg), SP_32(IS33, jhdjdjd) spr_113(Stty3 , jgjg), +spr_1881(IS33, jhdjdjd) 3 SP_78(3jmdsjkdej), SP_66(IShbdhdhd33) spr_115(3jmhhggggdsjkdej +), spr_1551(IShbdhdjjjhd33), spr_88881(Iyt33ff), spr_145411(Iddd3ff) 4 SP_77(3jmdsjkdej), SP_1485(Idhd33ff) spr_116(3jmdhhhhhsjkdej), + spr_17781(Idhdhhtytyt33ff)

      Hi Moritz and Limbic_Region,

      Am afraid your scripts don't work very well. They only generate 2 coulumns and the contents/entries in the columns don't have the same prefix. If you look at my scratchpad I have put the first seven rows (of which there are about 8000 rows in my .txt file). Note that each row starts with Cluster(\d+) i.e. the word "Cluster" followed by a number eg 1,2,3 etc.

      The code I have so far come up with is:

      #!usr/bin/perl -w use warnings; use strict; use List::Util 'max'; # Read in the file my $FILENAME3 = "clusters3.txt"; open(DATA, $FILENAME3); #create arrays and hashes to store stuff my (%data, %all, @keys); while (<DATA>) { # avoid \n on last field chomp; #split the data into chunks my @chunks = split; #create keys for the chunks my $key = shift @chunks; #store the keys in an array unless they already exist push @keys, $key unless exists $data{$key}; foreach my $chunk (@chunks) { #return references using hashes $data{$key}{$chunk}++; #add all chunks to the hash '%all' $all{$chunk} = 1; } #now make a file for the ouput my $outputfile = "new_cluster.txt"; if (! open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n" +; exit; } #sort the fields/columns keys and save them as an array #my @fields = sort {$a <=> $b} keys %all; my @fields = sort {lc($a) cmp lc($b)} keys %all; ##<--this sorting did +n't work #find the longest entry in an array #my $longest = max map {length} @fields; my $longest = max map {scalar grep $_=~ /\(\d+\)\_\(\d+\)\_\(\d+\)\_/, + @fields} @fields; #the line I think has a problem! #organise the data foreach my $key (@keys) { while (keys %{$data{$key}}) { print POS $key, " "; foreach my $field (@fields) { if ($data{$key}{$field}){ printf POS "%${longest}s ", $field; delete $data{$key}{$field} unless --$data{$key}{$field +}; } else { printf POS "%${longest}s ", "-"; } } print POS"\n"; }}}

      In the code cluster3.txt is my .txt file But it spits out rubbish

      Is it possible to have for each entry in each row arranged tidyly in columns

      Generally the prefixes are separated by an underscore for this beginning with letter, except 'spr, HMPREF, and pseudoSPN23F(which is also exactly similar or should be in the same column as SPN23F)'

      For this beginning with digits/numbers. The prefix is from the beginning to the last underscore e.g. 3850_1_7_ and 3850_1_8_ .

      Thanks

      $new_guy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://885470]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (11)
As of 2014-07-31 08:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (245 votes), past polls