How can one join the shortest and longest strings of different text files?

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I have three text files (1.txt, 2.txt & 3.txt) on desktop. Each file has 5 strings separated by comma. I am interested to upload these files without using <STDIN> and then to join the corresponding strings of the files based on length (two longest & two shortest). I have written a script (test.pl) which can find out the two longest and two shortest strings of each file. But I am at my wit's end to join the corresponding short and long strings across files based on length. I am looking forward to suggestions from you to sort out this problem. The text files, the script, the results of command script and the desired strings are given below:

Here goes the three text files:

1.txt is given below:

A
,AA
,AAA
,AAAA
,AAAAA
,
[download]

2.txt

T
,TT
,TTT
,TTTT
,TTTTT
,
[download]

3.txt

G
,GG
,GGG
,GGGG
,GGGGG
,
[download]

Here goes the script: test.pl

 
#!/usr/bin/perl  
use warnings;  
 
 @apple=(1 ... 3); 
     $nm=0; 
foreach my $num (@apple) {$nm++; 
  $output_fle="$nm.txt";
 
  if (-e $output_fle) { 
  open FILE,"$output_fle" or die "Couldn't open file: $!"; 
    $fle=join(" ",<FILE>);
    close FILE;  
  $fle=~ s/\s//g;  
  @fle=split(' ',$fle);     
  push  @all_file,@fle; 

  } # End of if LOOP for required files:  

}  # Last curly brace of Foreach LOOP for uploading all files: 

#########################
# Code for each file:  ##
#########################

    $file_no=0;
foreach my $each_fle ( @all_file) { $file_no++;
 
  @a=split(',',$each_fle);  
  $seq_no=0; 

foreach my $seq (@a) { $seq_no++; # For each sequence of the file
    $seq=~ s/,//g;  
    $seq= uc$seq;   
    $seq_len=length($seq); # For testing  

print"\n Element $seq_no of File $file_no: $seq
   Length: $seq_len\n";     

# push length & each seq to an array:     
     push  @names,$seq;  
     push  @values,$seq_len;  
    
    } # End of foreach LOOP for each file: 

#######################################################
# Find two lowest & two highest values of each file with sequences: 
#######################################################
use 5.010; 
use Data::Dumper; 
use constant IWANT => 2; 
my @data; 
my $pos=1; 
  
 for my $val (@values) { 
   my $name=shift @names; 
   my $rec=sprintf"\n Length %0.1f; Seq: %s",$val,$name; 
   push @data,$rec;} 

print"\n\nLength (Small to big) with sequences for File $file_no:\n";
 @data= sort @data; 
 for(0 .. IWANT-1) {say $data[$_];} 

print"\n";  
 print"\nLength (Big to small) with sequences for File $file_no:\n"; 
 for (1 .. IWANT) {say $data[-$_];}        

############################
# End Max & Min codes here: 
#############################

 @values=(); # To empty the array
 @names=();  # To empty the array
print"\n######## File $file_no ends ##############\n\n"; 

} # End of foreach LOOP for all files
 
exit; 
########################################
[download]

The results of the cmd goes like:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\x>cd desktop

C:\Users\x\Desktop>test.pl

 Element 1 of File 1: A
   Length: 1

 Element 2 of File 1: AA
   Length: 2

 Element 3 of File 1: AAA
   Length: 3

 Element 4 of File 1: AAAA
   Length: 4

 Element 5 of File 1: AAAAA
   Length: 5


Length (Small to big) with sequences for File 1:

 Length 1.0; Seq: A

 Length 2.0; Seq: AA


Length (Big to small) with sequences for File 1:

 Length 5.0; Seq: AAAAA

 Length 4.0; Seq: AAAA

######## File 1 ends ##############


 Element 1 of File 2: T
   Length: 1

 Element 2 of File 2: TT
   Length: 2

 Element 3 of File 2: TTT
   Length: 3

 Element 4 of File 2: TTTT
   Length: 4

 Element 5 of File 2: TTTTT
   Length: 5


Length (Small to big) with sequences for File 2:

 Length 1.0; Seq: T

 Length 2.0; Seq: TT


Length (Big to small) with sequences for File 2:

 Length 5.0; Seq: TTTTT

 Length 4.0; Seq: TTTT

######## File 2 ends ##############


 Element 1 of File 3: G
   Length: 1

 Element 2 of File 3: GG
   Length: 2

 Element 3 of File 3: GGG
   Length: 3

 Element 4 of File 3: GGGG
   Length: 4

 Element 5 of File 3: GGGGG
   Length: 5


Length (Small to big) with sequences for File 3:

 Length 1.0; Seq: G

 Length 2.0; Seq: GG


Length (Big to small) with sequences for File 3:

 Length 5.0; Seq: GGGGG

 Length 4.0; Seq: GGGG

######## File 3 ends ##############
[download]

In addition to the above results, I need the following desired strings based on length:

Two shortest strings (small to big): 
 Seq 1: ATG
 seq 2: AATTGG 

Two longest strings (big to small):
 seq 1: AAAAATTTTTGGGGG 
 seq 2: AAAATTTTGGGG
[download]

Comment on How can one join the shortest and longest strings of different text files? Select or Download Code

Replies are listed 'Best First'.
Re: How can one join the shortest and longest strings of different text files? by swampyankee (Parson) on Jul 04, 2013 at 19:46 UTC
I looks like you're about 90% of the way to your solution: you've found the longest and shortest string in each file. Now, what I would do (this is guaranteed to be a less-than-optimal solution) is to a) open the files, using the open function (don't forget to close the files when you're through with them), e.g., `open(FILE, "<", $this_file) or die "Could not open $this_file because +$!\n";` [download] and, b) scan through the file to find the shortest and longest lines. Store these into hashes: `$long{$file} = $longest_string_in_this_file; $short{$file} = $shortest_string_in_the_same_file;` [download] Repeat for each file, and do whatever joining you wish. If you want to join the two longest and the two shortest lines across files, use a hash of hashes instead of a hash for the longest and shortest lines. This does seem like an odd question, however. It's not homework, is it? Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply] [d/l] [select]
Re: How can one join the shortest and longest strings of different text files? by kcott (Archbishop) on Jul 05, 2013 at 06:16 UTC
G'day supriyoch_2008, Here's a technique that involves using the lengths of the original strings as hash keys. $ perl -Mstrict -Mwarnings -E ' use autodie qw{:all}; my $ext = ".txt"; my @nums = 1 .. 3; my %result; for (@nums) { open my $fh, "<", $_ . $ext; while (<$fh>) { chomp; y/,//d; next unless length; $result{+length} .= $_; } close $fh; } my @sorted_keys = sort { $a <=> $b } keys %result; say "* Input Data Strings "; say "Shortest: ", $sorted_keys[0]; say "Longest: ", $sorted_keys[-1]; say " Output Data Strings "; say $result{$_} for @sorted_keys; say " Two Shortest "; say $result{$_} for @sorted_keys[0, 1]; say " Two Longest "; say $result{$_} for @sorted_keys[-2, -1]; ' Input Data Strings * Shortest: 1 Longest: 5 * Output Data Strings * ATG AATTGG AAATTTGGG AAAATTTTGGGG AAAAATTTTTGGGGG * Two Shortest * ATG AATTGG * Two Longest * AAAATTTTGGGG AAAAATTTTTGGGGG [download] I'm not sure how much of your other output was actually required or just for debugging; regardless, you should be able to add code for that quite easily. -- Ken	[reply] [d/l]


The stupid question is the question not asked
	PerlMonks