Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Bioinformatics, Error: explicit package name

by mlsmit10 (Initiate)
on Jul 17, 2014 at 14:08 UTC ( #1094053=perlquestion: print w/ replies, xml ) Need Help??
mlsmit10 has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script that will take a genome and cut it at two restriction sites. The original script gave me the numbers of size fragments flanked by these two cut sites that would be produced. Now, I am trying to add a third restriction enzyme and exclude from the final counts any fragments containing this cut site. I've tried using grep to exclude such fragments, but am running into one problem I can't figure out.

This is the portion of the code I have altered: (for the rest of the code, see the end of the post)
my @third_fragments = grep -v ($rsite3), $second_fragments[$i] my @final_fragment1 = $seqname."_".$i."_1"; my @final_fragment2 = $seqname."_".$i."_2"; $final_fragments{$final_fragment1} = $third_fragments[0]; $final_fragments{$final_fragment2} = $third_fragments[scalar @ +third_fragments - 1];

I receive the following error message when I try to run the code:

syntax error at /Users/smithcabinets26/Desktop/RAD/Digester/Improving/ +MyTripleDigester-3.pl line 68, near "my " Global symbol "@final_fragment1" requires explicit package name at /Us +ers/smithcabinets26/Desktop/RAD/Digester/Improving/MyTripleDigester-3 +.pl line 68. Global symbol "$final_fragment1" requires explicit package name at /Us +ers/smithcabinets26/Desktop/RAD/Digester/Improving/MyTripleDigester-3 +.pl line 70. Global symbol "$final_fragment2" requires explicit package name at /Us +ers/smithcabinets26/Desktop/RAD/Digester/Improving/MyTripleDigester-3 +.pl line 71. Execution of /Users/smithcabinets26/Desktop/RAD/Digester/Improving/MyT +ripleDigester-3.pl aborted due to compilation errors.
Here is the complete code:
#! /usr/bin/perl -w # a script to get fragments of a genome based on restriction enzyme # retrieves the fragments greater than XX bp and less than XX bp # whole genome in one gzip file (fasta format)! # calculates the number of fragments you will get per genome for RADse +q # perl genomeFragmentor_doubleDigest.pl genome.gz restriction_site1 re +striction_site2 organism # perl genomeFragmentor_doubleDigest.pl Anolisgenome.gz CCTGCAGG GGATC +C Anolis use strict; my %genome = fasta_read_gzip_alt($ARGV[0]); #my %genome = fasta_read_alt($ARGV[0]); my $rsite1 = $ARGV[1]; my $rsite2 = $ARGV[2]; my $rsite3 = $ARGV[3]; my $totalCount = 0; # count of all fragments in the genome my $totalBP = 0; # the total number of base pairs in the se +lected fragments my $organism = $ARGV[4]; my $radtagfile = $organism."_Radseq_fragments_doubleDigest_".$rsite1." +_".$rsite2."_".$rsite3.".fasta"; my $sizesfile = $organism."_Radseq_fragments_doubleDigest_".$rsite1."_ +".$rsite2."_".$rsite3."_sizes.txt"; open (OUT, ">$radtagfile"); open (SIZE, ">$sizesfile"); print SIZE "length organism\n"; # prepare a summary file my ($Second, $Minute, $Hour, $Day, $Month, $Year, $WeekDay, $DayOfYear +, $IsDST) = localtime(time); $Month++; $Year += 1900; my $date = $Month."_".$Day."_".$Year; my $summaryfile = $organism."_summary_doubleDigest_".$rsite1."_".$rsit +e2."_".$rsite3."_".$date.".txt"; open (RESULTS, ">$summaryfile"); my %final_fragments; my @all_size_fragments; # start creating the fragments foreach my $seqname (keys %genome) { print "Working on sequence $seqname.\n"; my @first_fragments = split("$rsite1", $genome{$seqname}); + # split the fragments up based on the enzyme motif # add the restriction site motif back onto the fragments $first_fragments[0] .= $rsite1; $first_fragments[scalar @first_fragments - 1] = $rsite1.$first_fra +gments[scalar @first_fragments - 1]; for (my $i = 0; $i < scalar @first_fragments; ++$i) { if ($i != 0 or $i != (scalar @first_fragments - 1)) { $first_fragments[$i] = $rsite1.$first_fragments[$i].$rsite +1; } # now split the fragment using $rsite2 # repair the first and last fragments to include $rsite2 # these are the only fragments to contain both restriction sit +es, so keep them in @final_fragments my @second_fragments = split($rsite2, $first_fragments[$i]); $second_fragments[0] .= $rsite2; $second_fragments[scalar @second_fragments - 1] = $rsite2.$sec +ond_fragments[scalar @second_fragments - 1]; foreach my $fragment (@second_fragments) { push(@all_size_fragments, length($fragment)); } my @third_fragments = grep -v ($rsite3), $second_fragments[$i] my @final_fragment1 = $seqname."_".$i."_1"; my @final_fragment2 = $seqname."_".$i."_2"; $final_fragments{$final_fragment1} = $third_fragments[0]; $final_fragments{$final_fragment2} = $third_fragments[scalar @ +third_fragments - 1]; } } # keep a score of how many fragments fall within a particular size ran +ge my $size_651_750 = 0; my $size_551_650 = 0; my $size_501_550 = 0; my $size_401_500 = 0; my $size_301_400 = 0; my $size_small = 0; my $size_large = 0; foreach my $fragment (keys %final_fragments) { # add on $rsite1 to both sides of the fragment my $fragmentLength = length($final_fragments{$fragment}); print OUT ">$fragment", "_", "1\n"; print OUT substr($final_fragments{$fragment}, 0, 96), "\n"; print OUT ">$fragment", "_", "2rc\n"; print OUT revcom(substr($final_fragments{$fragment}, $fragmentLeng +th - 96, 96)), "\n"; $totalBP += $fragmentLength; if ($fragmentLength >= 300 and $fragmentLength <= 400) { ++$size_301_400; } elsif ($fragmentLength > 400 and $fragmentLength <= 500) { ++$size_401_500; } elsif ($fragmentLength > 500 and $fragmentLength <= 550) { ++$size_501_550; } elsif ($fragmentLength > 550 and $fragmentLength <= 650) { ++$size_551_650; } elsif ($fragmentLength > 650 and $fragmentLength <= 750) { ++$size_651_750; } elsif ($fragmentLength < 300) { ++$size_small; } elsif ($fragmentLength > 750) { ++$size_large; } } $totalCount = scalar keys %final_fragments; # count of all +fragments in the genome print RESULTS "The restriction sites used were:\n"; print RESULTS $ARGV[1], "\n"; print RESULTS $ARGV[2], "\n\n"; print RESULTS $ARGV[3], "\n\n\n"; print RESULTS "There were ", $totalCount, " fragments from the whole g +enome.\n\n"; print RESULTS "For MiSeq v3 there are ", $size_651_750, " fragments\n" +; print RESULTS "For MiSeq v2 there are ", $size_551_650, " fragments\n" +; print RESULTS "For HiSeq 2X150 there are ", $size_401_500, " fragments +\n"; print RESULTS "For HiSeq 2X100 there are ", $size_301_400, " fragments +\n\n"; print RESULTS "There were ", $size_small, " fragments smaller than 100 + bp.\n"; print RESULTS "There were ", $size_large, " fragments larger than 800 +bp.\n\n"; print RESULTS "There are ", $totalBP, " base pairs in the fragments.\n +"; print RESULTS "\nJust some notes:\n"; print RESULTS "3RAD adapters are ~140bp, select below +/- 50bp\n"; print RESULTS "MiSeq v3 2X300 you will want 700bp fragments.\n"; print RESULTS "MiSeq v2 2X250 you will want 600bp fragments.\n"; print RESULTS "HiSeq Fast 2X150 you will want 450bp fragments.\n"; print RESULTS "HiSeq 2X100 you will want 350bp fragments.\n"; close OUT; close RESULTS; foreach my $length (@all_size_fragments) { print SIZE $length, "\t", $organism, "\n"; } exit; sub fasta_read_gzip_alt { # reads in a gzip fasta file and pases it into a hash # version 1.0 (my $filename) = @_; # be sure to include the path my %fasta; open(FASTA, "gunzip -c $filename |") || die "can't open pipe to $f +ilename"; my $fastaData; my $sequence = ''; my $name = ''; while(<FASTA>) { $fastaData = $_; $fastaData =~ s/\n//gms; if ($fastaData =~ />/) { if ($sequence) { # if there is a sequence, +then the sequence belongs to the last name $fasta{$name} = $sequence; } # reinitialize everything $sequence = ''; # start over! $name = $fastaData; $name =~ s/>//gms; } elsif (eof FASTA) { $fasta{$name} = $sequence; } else { $sequence .= $fastaData; } } close FASTA; return %fasta; } sub revcom { (my $sequence) = @_; $sequence = reverse($sequence); $sequence =~ tr/AGCTRYMKSWHBVDNagctrymkswhbvdn/TCGAYRKMSWDVBHNtcga +yrkmswdvbhn/; return $sequence; } sub fasta_read_alt { # reads in a fasta file and pases it into a hash # version 1.0 (my $filename) = @_; # be sure to include the path my %fasta; open(FASTA, $filename); my $fastaData; my $sequence = ''; my $name = ''; while(<FASTA>) { $fastaData = $_; $fastaData =~ s/\n//gms; if ($fastaData =~ />/) { if ($sequence) { # if there is a sequence, +then the sequence belongs to the last name $fasta{$name} = $sequence; } # reinitialize everything $sequence = ''; # start over! $name = $fastaData; $name =~ s/>//gms; } elsif (eof FASTA) { $fasta{$name} = $sequence; } else { $sequence .= $fastaData; } } close FASTA; return %fasta; }

My apologies if the answer is obvious or the question is posed poorly. I'm new to coding, and I haven't been able to solve this by looking at previous posts. Thanks for the help.

Problem solved! Thanks for all the help, everyone. After I fixed the semicolon issue, I played around with the grep function a bit (which I now see some of you suggested), and was able to get the script up and running. In case anyone else has a similar situation where they want to use grep in this way, the modifications I made to the grep line are included below (I've only included the one line, and it should just replace the grep line in the above code). The script now does exactly what I wanted it to do. Again, thanks for the help!

my @third_fragments = grep !/$rsite3/, $second_fragments[$i];

Comment on Bioinformatics, Error: explicit package name
Select or Download Code
Re: Bioinformatics, Error: explicit package name
by toolic (Chancellor) on Jul 17, 2014 at 14:14 UTC
    This line is causing the syntax error:
    my @third_fragments = grep -v ($rsite3), $second_fragments[$i]

    Generally, Perl statements end with a semicolon. That being said, I think you are confusing Perl's built-in grep function with the unix grep utility. This is a common problem for new-comers. Simply adding the semicolon will not completely fix that line. You need to describe more fully what you are trying to achieve.

Re: Bioinformatics, Error: explicit package name
by kcott (Abbot) on Jul 17, 2014 at 14:15 UTC

    G'day mlsmit10,

    Welcome to the monastery.

    The line:

    my @third_fragments = grep -v ($rsite3), $second_fragments[$i]

    is missing a terminal semi-colon.

    -- Ken

Re: Bioinformatics, Error: explicit package name
by SuicideJunkie (Priest) on Jul 17, 2014 at 14:15 UTC

    Line 65 is missing a semicolon, which makes the subsequent code invalid. That invalid code then does not declare your variables, resulting in the rest of the errors.

    Always check the lines just before and after the reported error if it doesn't appear to make sense for the actual line.

Re: Bioinformatics, Error: explicit package name
by Athanasius (Monsignor) on Jul 17, 2014 at 14:23 UTC

    Hello mlsmit10, and welcome to the Monastery!

    The missing semicolon identified by toolic and kcott accounts for the first two error messages. The third arises because in the line:

    $final_fragments{$final_fragment1} = $third_fragments[0];

    the variable $final_fragment1 has not been declared. Likewise, the fourth error is due to the undeclared $final_fragment2 in the following line. In Perl, variables are designated by sigils (in this case, @ for an array and $ for a scalar), and variables with different sigils are different variables.1 So, e.g., $final_fragment1 is an entirely different variable to @final_fragment1, and unrelated to it.

    I’m not sure what you were intending in these two lines. If you wanted the number of elements in each array, you would write:

    $final_fragments{scalar @final_fragment1} = $third_fragments[0]; $final_fragments{scalar @final_fragment2} = $third_fragments[scalar @t +hird_fragments - 1];

    Update: But on second thought, I think you meant to declare these variables as scalars:

    my $final_fragment1 = $seqname."_".$i."_1"; my $final_fragment2 = $seqname."_".$i."_2"; $final_fragments{$final_fragment1} = $third_fragments[0]; $final_fragments{$final_fragment2} = $third_fragments[scalar @third_fr +agments - 1];

    Hope that helps,

    1See perldata#Variable-names:

    Every variable type has its own namespace, as do several non-variable identifiers. This means that you can, without fear of conflict, use the same name for a scalar variable, an array, or a hash--or, for that matter, for a filehandle, a directory handle, a subroutine name, a format name, or a label. This means that $foo and @foo are two different variables. It also means that $foo[1] is a part of @foo, not a part of $foo. This may seem a bit weird, but that's okay, because it is weird.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Bioinformatics, Error: explicit package name
by roboticus (Canon) on Jul 17, 2014 at 15:56 UTC

    mismit10:

    You've already gotten good answers for your problems. I saw a bit in your code that I thought I would mention, though. One line says:

    $final_fragments{$final_fragment2} = $third_fragments[scalar @third_fr +agments - 1];

    You're wanting the last value in the array, but that's a funky way to get it. Since you're subtracting one from the value returned by scalar @third_fragments, you're automatically in a scalar context, so you don't really need scalar at all, so it could be simplified to:

    $final_fragments{$final_fragment2} = $third_fragments[@third_fragments + - 1];

    But perl already has a bit of magic you can use to get the index of the last value in an array: replacing the @ with $# in front of the array name gives you the index of the last value:

    $final_fragments{$final_fragment2} = $third_fragments[$#third_fragment +s];

    Even more interesting, perl has another bit of magic: Using negative index values you can access values from the end of the array, so -1 is the last element, -2 is the element before the last element. So you could use:

    $final_fragments{$final_fragment2} = $third_fragments[-1];

    Here's a quick example:

    roboticus@sparky:~$ cat t.pl #!/usr/bin/perl use strict; use warnings; my @foo = (5, 7, 9, 11, 13); print $foo[scalar @foo - 1], "\n"; print $foo[@foo - 1], "\n"; print $foo[$#foo], "\n"; print $foo[-1], "\n"; print $foo[-2], "\n"; print $foo[-3], "\n"; roboticus@sparky:~$ perl t.pl 13 13 13 13 11 9 roboticus@sparky:~$

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Bioinformatics, Error: explicit package name
by Laurent_R (Parson) on Jul 17, 2014 at 19:12 UTC
    Yes, the missing semi-colon is the cause for the errors displayed on your screen, but the most important issue is that this line:
    my @third_fragments = grep -v ($rsite3), $second_fragments[$i];
    is just plain wrong Perl syntax (even with the added semi-colon). Perl's grep has little to do with the Unix shell grep (at least as far as syntax is concerned). The "-v" flag does not exist in Perl for the grep function. Please try:
    perldoc -f grep
    Just a quick example on how to use grep in Perl:
    $ perl -e 'my @number_larger_than_5 = grep {$_ > 5} 1..10; print " @nu +mber_larger_than_5";' 6 7 8 9 10

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1094053]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-12-25 22:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (163 votes), past polls