Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Pattern finding and printing with regex

by kernelpanic@thedisco (Initiate)
on Oct 21, 2015 at 18:16 UTC ( #1145573=perlquestion: print w/replies, xml ) Need Help??

kernelpanic@thedisco has asked for the wisdom of the Perl Monks concerning the following question:

I'm pretty new to perl and I'm taking a metagenomics course; for our homework, we have to use regular expressions, take a fasta file and print out the sequence IDs, then state how many sequences were found. I've got the regex to find the right pattern and I can print all the right information out, but since my prof uses a tester to check all of our homework, if our output isn't exactly what he wants, it's wrong. For his expected output, he wants something like:

1: foo

2: bar

3: baz

Found 3 sequences.

I can do that, but I can't quite get my program to not print out the '>' in front of the sequence ID. My output right now looks like this:

1: >foo

2: >bar

3: >baz

Found 3 sequences.

How can I get it to not print out that '>'? Thanks for any help you can give!

#!/usr/bin/env perl use strict; use warnings; use autodie; use feature 'say'; my $i = 0; my $file = shift @ARGV; my $pattern; open my $fh, '<', $file; while (my $line = <$fh>) { chomp $line; $pattern = '>'; if ($line =~ /^$pattern/) { $i++; print "$i: $line \n"; } } printf "Found %s sequence%s.\n", $i, $i == 1 ? '' : 's';

Replies are listed 'Best First'.
Re: Pattern finding and printing with regex
by toolic (Bishop) on Oct 21, 2015 at 18:29 UTC
    One way:
    use warnings; use strict; my $i = 0; my $pattern; while (my $line = <DATA>) { chomp $line; $pattern = '>'; if ($line =~ s/^$pattern//) { $i++; print "$i: $line \n"; } } __DATA__ >foo

    Now it's up to you to figure out how you explain to your professor that someone on the internet did your homework for you.

      Now it's up to you to figure out how you explain to your professor that someone on the internet did your homework for you.

      Your heart is in the right place, but brother, I think you're being too harsh! He didn't ask the monks to do his homework for him, he did what he could and only asked for our help when he got stuck, sharing the code he had already written. Surely that is fair in the Monastery.

      Thanks for your help.
Re: Pattern finding and printing with regex
by jeffa (Bishop) on Oct 21, 2015 at 18:28 UTC

    It helps us more if you provide the data too, but as it stands you should only need to add

    $line =~ s/^$pattern//;
    before you print the matched line.


    (the triplet paradiddle with high-hat)
      Thanks! That did it.
Re: Pattern finding and printing with regex
by BillKSmith (Monsignor) on Oct 21, 2015 at 21:39 UTC

    A more general solution is to write your regex to find the sequence ID rather than removing the '>' character.

    $pattern = qr/^(\w*)/; if ($line =~ /$pattern/) { my $id = $1; $I++; print "$i: $id \n"; }

    This approach would exclude the optional sequence description if it were included in the fasta file.

    UPDATE: AnomalousMonk is correct. I accidentally omitted the '<' from my regex.


      But  qr/^(\w*)/ won't capture anything — or more precisely, it will match and capture the empty string. Maybe try something like  qr/^>(\w*)/ instead:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $line = '>CATCATCATCAT'; ;; my $pattern = qr/^>(\w*)/; if ($line =~ /$pattern/) { my $id = $1; print qq{got: '$id'}; } " got: 'CATCATCATCAT'

      Give a man a fish:  <%-{-{-{-<

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1145573]
Approved by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2022-05-20 01:07 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (72 votes). Check out past polls.