Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How can one join the shortest and longest strings of different text files?

by supriyoch_2008 (Scribe)
on Jul 04, 2013 at 19:04 UTC ( #1042508=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I have three text files (1.txt, 2.txt & 3.txt) on desktop. Each file has 5 strings separated by comma. I am interested to upload these files without using <STDIN> and then to join the corresponding strings of the files based on length (two longest & two shortest). I have written a script (test.pl) which can find out the two longest and two shortest strings of each file. But I am at my wit's end to join the corresponding short and long strings across files based on length. I am looking forward to suggestions from you to sort out this problem. The text files, the script, the results of command script and the desired strings are given below:

Here goes the three text files:

1.txt is given below:

A ,AA ,AAA ,AAAA ,AAAAA ,

2.txt

T ,TT ,TTT ,TTTT ,TTTTT ,

3.txt

G ,GG ,GGG ,GGGG ,GGGGG ,

Here goes the script: test.pl

#!/usr/bin/perl use warnings; @apple=(1 ... 3); $nm=0; foreach my $num (@apple) {$nm++; $output_fle="$nm.txt"; if (-e $output_fle) { open FILE,"$output_fle" or die "Couldn't open file: $!"; $fle=join(" ",<FILE>); close FILE; $fle=~ s/\s//g; @fle=split(' ',$fle); push @all_file,@fle; } # End of if LOOP for required files: } # Last curly brace of Foreach LOOP for uploading all files: ######################### # Code for each file: ## ######################### $file_no=0; foreach my $each_fle ( @all_file) { $file_no++; @a=split(',',$each_fle); $seq_no=0; foreach my $seq (@a) { $seq_no++; # For each sequence of the file $seq=~ s/,//g; $seq= uc$seq; $seq_len=length($seq); # For testing print"\n Element $seq_no of File $file_no: $seq Length: $seq_len\n"; # push length & each seq to an array: push @names,$seq; push @values,$seq_len; } # End of foreach LOOP for each file: ####################################################### # Find two lowest & two highest values of each file with sequences: ####################################################### use 5.010; use Data::Dumper; use constant IWANT => 2; my @data; my $pos=1; for my $val (@values) { my $name=shift @names; my $rec=sprintf"\n Length %0.1f; Seq: %s",$val,$name; push @data,$rec;} print"\n\nLength (Small to big) with sequences for File $file_no:\n"; @data= sort @data; for(0 .. IWANT-1) {say $data[$_];} print"\n"; print"\nLength (Big to small) with sequences for File $file_no:\n"; for (1 .. IWANT) {say $data[-$_];} ############################ # End Max & Min codes here: ############################# @values=(); # To empty the array @names=(); # To empty the array print"\n######## File $file_no ends ##############\n\n"; } # End of foreach LOOP for all files exit; ########################################

The results of the cmd goes like:

Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>test.pl Element 1 of File 1: A Length: 1 Element 2 of File 1: AA Length: 2 Element 3 of File 1: AAA Length: 3 Element 4 of File 1: AAAA Length: 4 Element 5 of File 1: AAAAA Length: 5 Length (Small to big) with sequences for File 1: Length 1.0; Seq: A Length 2.0; Seq: AA Length (Big to small) with sequences for File 1: Length 5.0; Seq: AAAAA Length 4.0; Seq: AAAA ######## File 1 ends ############## Element 1 of File 2: T Length: 1 Element 2 of File 2: TT Length: 2 Element 3 of File 2: TTT Length: 3 Element 4 of File 2: TTTT Length: 4 Element 5 of File 2: TTTTT Length: 5 Length (Small to big) with sequences for File 2: Length 1.0; Seq: T Length 2.0; Seq: TT Length (Big to small) with sequences for File 2: Length 5.0; Seq: TTTTT Length 4.0; Seq: TTTT ######## File 2 ends ############## Element 1 of File 3: G Length: 1 Element 2 of File 3: GG Length: 2 Element 3 of File 3: GGG Length: 3 Element 4 of File 3: GGGG Length: 4 Element 5 of File 3: GGGGG Length: 5 Length (Small to big) with sequences for File 3: Length 1.0; Seq: G Length 2.0; Seq: GG Length (Big to small) with sequences for File 3: Length 5.0; Seq: GGGGG Length 4.0; Seq: GGGG ######## File 3 ends ##############

In addition to the above results, I need the following desired strings based on length:

Two shortest strings (small to big): Seq 1: ATG seq 2: AATTGG Two longest strings (big to small): seq 1: AAAAATTTTTGGGGG seq 2: AAAATTTTGGGG

Comment on How can one join the shortest and longest strings of different text files?
Select or Download Code
Re: How can one join the shortest and longest strings of different text files?
by swampyankee (Parson) on Jul 04, 2013 at 19:46 UTC

    I looks like you're about 90% of the way to your solution: you've found the longest and shortest string in each file. Now, what I would do (this is guaranteed to be a less-than-optimal solution) is to a) open the files, using the open function (don't forget to close the files when you're through with them), e.g.,

    open(FILE, "<", $this_file) or die "Could not open $this_file because +$!\n";
    and, b) scan through the file to find the shortest and longest lines. Store these into hashes:
    $long{$file} = $longest_string_in_this_file; $short{$file} = $shortest_string_in_the_same_file;

    Repeat for each file, and do whatever joining you wish. If you want to join the two longest and the two shortest lines across files, use a hash of hashes instead of a hash for the longest and shortest lines.

    This does seem like an odd question, however. It's not homework, is it?


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Re: How can one join the shortest and longest strings of different text files?
by kcott (Abbot) on Jul 05, 2013 at 06:16 UTC

    G'day supriyoch_2008,

    Here's a technique that involves using the lengths of the original strings as hash keys.

    $ perl -Mstrict -Mwarnings -E ' use autodie qw{:all}; my $ext = ".txt"; my @nums = 1 .. 3; my %result; for (@nums) { open my $fh, "<", $_ . $ext; while (<$fh>) { chomp; y/,//d; next unless length; $result{+length} .= $_; } close $fh; } my @sorted_keys = sort { $a <=> $b } keys %result; say "*** Input Data Strings ***"; say "Shortest: ", $sorted_keys[0]; say "Longest: ", $sorted_keys[-1]; say "*** Output Data Strings ***"; say $result{$_} for @sorted_keys; say "*** Two Shortest ***"; say $result{$_} for @sorted_keys[0, 1]; say "*** Two Longest ***"; say $result{$_} for @sorted_keys[-2, -1]; ' *** Input Data Strings *** Shortest: 1 Longest: 5 *** Output Data Strings *** ATG AATTGG AAATTTGGG AAAATTTTGGGG AAAAATTTTTGGGGG *** Two Shortest *** ATG AATTGG *** Two Longest *** AAAATTTTGGGG AAAAATTTTTGGGGG

    I'm not sure how much of your other output was actually required or just for debugging; regardless, you should be able to add code for that quite easily.

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1042508]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-12-25 23:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (163 votes), past polls