Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Count the sequence length of each entry in the file

by kcott (Archbishop)
on Oct 01, 2020 at 23:25 UTC ( [id://11122443]=note: print w/replies, xml ) Need Help??


in reply to Count the sequence length of each entry in the file

G'day davi54,

"I want to count the number of alphabets in each entry, length of each entry, etc."

That's confusing as you combine the counts for all entries in @count. Other variables, outside the while loop, appear misplaced as they look like they're associated with individual entries: I'd expect them to be inside the while loop.

The following code collects all the data that I believe you want. You can combine values for all entries if necessary.

#!/usr/bin/env perl use strict; use warnings; my %results; { local $/ = ''; while (my $record = <DATA>) { $record =~ s/\A>(.+?)$//m; my $entry = $1; $record =~ s/\s//gm; $results{$entry}{len} = length $record; for (0 .. $results{$entry}{len} - 1) { ++$results{$entry}{count}{substr $record, $_, 1}; } } } use Data::Dump; dd \%results; __DATA__ >sp_0005 VQLQESGGGLVQAGGSLRLSCAASGRAVSMYNMGWFRQAPGQERELVAAISRGGSIYYA DSVKGRFTISRDNAKNTLYLQMNNLKPEDTGVYQCRQGSTLGQGTQVTVSS >sp_0017 HVQLVESGGGSVQAGGSLRLTCAASGFTFSNYYMSWVRQAPGKGLEWVSSIYSVGSNGYY ADSVKGRSTISRDNAKNTLYLQMNSLKPEDTAVYYCAAEPGGSWWDAYSYWGQGTQVTVS S

Extract of output:

{ sp_0005 => { count => { A => 9, C => 2, ..., W => 1, Y => 5, }, len => 110, }, sp_0017 => { count => { A => 10, C => 2, ..., W => 5, Y => 10, }, len => 121, }, }

Notes:

  • You posted sample data but did so using paragraph text. Please use <code> tags for this sort of thing. I've made a best guess at what the original looked like: I was unsure about the embedded space at the end of the second entry (cf. 'QVTVSS' in 1st entry and 'QVTVS S' in 2nd entry) so left it as written.
  • I've truncated the names to avoid excessive text wrapping. They're still unique but now just look like sp_NNNN.
  • When you modify special variables, such as $/ here, you should localise the change in the smallest scope possible (see my code for an example of this).
  • I've used substr instead of a regex to get the counts. Perl's string-handling functions are typically faster than regexes that achieve the same outcome.
  • Do note that this code is only intended to show a technique. It does not contain any I/O or formatted reporting.

— Ken

Replies are listed 'Best First'.
Re^2: Count the sequence length of each entry in the file
by davi54 (Sexton) on Oct 02, 2020 at 16:54 UTC
    Thank you for your help. Actually, the input file is formatted to have only 60 characters in each line and then it moves to the new line. So, that's the input file format when you see just a single S on the last line of the second entry.

    On a different note, for the first entry, the sequence length value you get in your output (110) is one less than the actual sequence length which is 111. However, my output gives me a sequence length of 115, which is even worse. Do you know where the error might be?

      "for the first entry, the sequence length value you get in your output (110) is one less than the actual sequence length which is 111."
      $ perl -E 'say length "VQLQESGGGLVQAGGSLRLSCAASGRAVSMYNMGWFRQAPGQERELV +AAISRGGSIYYA"' 59 $ perl -E 'say length "DSVKGRFTISRDNAKNTLYLQMNNLKPEDTGVYQCRQGSTLGQGTQV +TVSS"' 51 $ perl -E 'say 59+51' 110

      If you add the newline between those two strings you'll get 111 for \n or 112 for \r\n. There's also whitespace after those strings which will further increase the length of the line. As I already stated, you posted your data as paragraph text: I can't tell what the original data was.

      I removed all whitespace in my code:

      $record =~ s/\s//gm;

      The correct length, after removing white space, is 110.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11122443]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (10)
As of 2024-04-18 15:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found