Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

matching problem

by Balaton (Initiate)
on Feb 24, 2013 at 18:08 UTC ( [id://1020413]=perlquestion: print w/replies, xml ) Need Help??

Balaton has asked for the wisdom of the Perl Monks concerning the following question:

Could anybody please help me why I got this error message? Use of uninitialized value $_[0] in pattern match (m//) at ModuleMatching.pm line 11, <> line And this is the line I am matching LOCUS AB007147 460 bp DNA linear PRI 08-FEB-2002 it worked without the subroutine. Thanks a lot! Here is the code: ##########################################
use lib "/d/user2/aszend01/BCII/240213"; use ModuleMatching; my $locus_acc_no; while (my $line = <>) { # locus_acc_no if ($line =~ ModuleMatching::MatchLAC($locus_acc_no)) { unless (ModuleMatching::MatchLAC($locus_acc_no)){ print "something is wrong with the matching", "\n"; next; } } print $locus_acc_no, "\n"; } # This module will be called from the myparserM.pl. It will pull out t +he locus_acc_no. package ModuleMatching; use strict; # Here are the matchig code for the locus_acc_no (LAC) sub MatchLAC($) { return $_[0] =~ /^LOCUS\s{7}(\w+)\s+\w+/; } 1;
######################################################### And here is the file I want to do the matching: ####################################################### LOCUS AB007147 460 bp DNA linear PRI 08-FEB-2002 DEFINITION Homo sapiens gene for ribosomal protein S2, partial cds. ACCESSION AB007147 VERSION AB007147.1 GI:3077742 KEYWORDS ribosomal protein S2. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 AUTHORS Kenmochi,N., Kawaguchi,T., Rozen,S., Davis,E., Goodman,N., Hudson,T.J., Tanaka,T. and Page,D.C. TITLE A map of 75 human ribosomal protein genes JOURNAL Genome Res. 8 (5), 509-523 (1998) PUBMED 9582194 REFERENCE 2 (bases 1 to 460) AUTHORS Kenmochi,N. TITLE Direct Submission JOURNAL Submitted (10-SEP-1997) Naoya Kenmochi, Miyazaki Medical College, Central Research Laboratories; 5200 Kihara, Kiyotake, Miyazaki 889-1692, Japan (E-mail:kenmochi@post.miyazaki-med.ac.jp, Tel:81-985-85-9665, Fax:81-985-85-1514) FEATURES Location/Qualifiers source 1..460 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="16" /map="16p13.3" CDS join(<1..73,300..>460) /codon_start=1 /product="ribosomal protein S2" /protein_id="BAA25813.1" /db_xref="GI:3088335" /translation="LLMMAGIDDCYTSARGCTATLGNFAKATFDAISKTYSYLTPDLW KETVFTKSPYQEFTDHLVKTHTRVSVQRTQAPAV" exon <1..73 /product="ribosomal protein S2" intron 74..299 exon 300..>460 /product="ribosomal protein S2" ORIGIN 1 ctgctcatga tggctggtat cgatgactgc tacacctcag cccggggctg cactgccacc 61 ctgggcaact tcggtaggtg gtccacacat ggggcatagc catggtctct cagctccgct 121 taaccacacg ggtccagtgt gtgcttggcg tgttttcagg gaggcagaga aaggctctcc 181 taatgnacga cagacccgcc cagaatggcc tctctgttcc taggagtgcg acaatttttg 241 ggttggggga cttgcctcaa gcacaccact gaccctcctg gggttctttt gttttgcagc 301 caaggccacc tttgatgcca tttctaagac ctacagctac ctgacccccg acctctggaa 361 ggagactgta ttcaccaagt ctccctatca ggagttcact gaccacctcg tcaagaccca 421 caccagagtc tccgtgcagc ggactcaggc tccagctgtg //

Replies are listed 'Best First'.
Re: matching problem
by linuxer (Curate) on Feb 24, 2013 at 20:10 UTC
    Apart from the bad format of the data section:

    You declare $locus_acc_no, but you don't initialize it; therefor it stays undefined.

    In your while-Loop you provide this undefined value as argument to ModuleMatching::MatchLAC() which tries a pattern matching against that value.

    $locus_acc_no is intentionally undefined, so the warning is correct. You are using an undefined value within a pattern matching.

    So, what exactly are you wondering about?

      Thanks Linuxer, Here are my changes, now the program finishes and I have no error message but I only have the locus_acc_no = printed out several times and it has no value. Thanks, B
      #!/usr/bin/perl -w use strict; use warnings; use lib "/d/user2/aszend01/BCII/240213"; use ModuleMatching; my $locus_acc_no = ""; my $key = 1; print '$key = ', $key, "\n"; # test for not allowing more than one file to work on unless (1 == scalar(@ARGV)){ die "two many files"; } # test if the infile cannot be opened open(IN, $ARGV[0]) or die "unable to open input file $ARGV[0]\n"; while (my $line = <IN>) { # give the number of records a new key number started with 1 incre +mented with 1 in each new record devided with "//" signs if ($line =~ /^\/\//){ $key++; print '$key = ', $key, "\n"; $locus_acc_no = ""; } # locus_acc_no if ($line =~ ModuleMatching::MatchLAC($locus_acc_no)) { print "locus_acc_no ", $locus_acc_no; # print $2, "\n"; # $hLocus_Acc{$key} = $1; # print "%hLocus_Acc{$key} = $1\n"; # $hLength{$key} = $2; # print "%hLength{$key} = $2\n"; } }
        I only have the locus_acc_no = printed out several times and it has no value.

        Use a text editor, and do a search on $locus_acc_no. You will see it occurs 4 times in the code:

        1. my $locus_acc_no = "";
          Here it is declared and initialised to the empty string.

        2. $locus_acc_no = "";
          Here (within an if block) it is conditionally re-set to the empty string.

        3. if ($line =~ ModuleMatching::MatchLAC($locus_acc_no)) {
          Here it is passed into1 the function ModuleMatching::MatchLAC.

        4. print "locus_acc_no ", $locus_acc_no;
          Here its value is printed.

        So it is never set to anything but the empty string, which is what gets printed! This is essentially the same point already made by linuxer, above. You fixed the warning but failed to address the underlying logic problem.

        I don’t know anything about the ModuleMatching module, which I can’t find on , but I think it’s highly unlikely that this line:

        if ($line =~ ModuleMatching::MatchLAC($locus_acc_no)) {

        can be correct. But without knowing what sub ModuleMatching::MatchLAC is supposed to do, it’s hard to give advice.

        Hope that helps,

        1Since Perl uses pass-by-reference, it is possible that ModuleMatching::MatchLAC sets the value of $locus_acc_no; except that MatchLAC receives no information to use in determining the new value.

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: matching problem
by toolic (Bishop) on Feb 24, 2013 at 18:32 UTC
Re: matching problem
by AnomalousMonk (Archbishop) on Feb 24, 2013 at 18:31 UTC
Re: matching problem
by manorhce (Beadle) on Feb 24, 2013 at 18:15 UTC

    Can you please post your code inside code block so that it will be easy for looking your code/else read how to write a post

Re: matching problem
by AnomalousMonk (Archbishop) on Feb 25, 2013 at 15:31 UTC

    Balaton: Athanasius has replied:

    ... I think it’s highly unlikely that this line:
        if ($line =~ ModuleMatching::MatchLAC($locus_acc_no)) {
    can be correct. But without knowing what  sub ModuleMatching::MatchLAC is supposed to do, it’s hard to give advice.

    I agree that the definition of  ModuleMatching::MatchLAC() given in the OP is most likely some sort of dummy placeholder, but in any event, the given function can be explained as follows (this is for Balaton; I believe Athanasius understands all this quite clearly):

    • The function returns the result of a match of the passed argument  $_[0] against a literal regex (i.e., a regex having no interpolations (Update: literal regex: my terminology may be a bit off here));
    • The function is called in scalar context imposed by the  =~ operator, so the result of the match within the function is either 1 (successful match) or '' (the empty string; match failed);
    • The '' or 1 returned by the function is then converted to a regex (with 1 stringized to '1') and a match is made against  $line. If the match is against  /1/ the result is obvious. If the match is against  // (the empty regex, created from the empty string), the result will come from a match against the regex most recently matched or, if no regex has ever been matched, against the null regex, which matches anything. Straight from the docs: "If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. [...] If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match)."

    The problematic results of these matches can be illustrated as follows:

    >perl -wMstrict -le "for my $line ('', 'X', 'Y') { for my $locus_acc_no ('', 'X', 'Y') { if ($line =~ MatchLAC($locus_acc_no)) { print qq{ match: '$line' =~ MatchLAC('$locus_acc_no')}; } else { print qq{NO match: '$line' =~ MatchLAC('$locus_acc_no')}; } } } ;; sub MatchLAC { return $_[0] =~ /^X$/; } " match: '' =~ MatchLAC('') NO match: '' =~ MatchLAC('X') match: '' =~ MatchLAC('Y') match: 'X' =~ MatchLAC('') NO match: 'X' =~ MatchLAC('X') match: 'X' =~ MatchLAC('Y') match: 'Y' =~ MatchLAC('') NO match: 'Y' =~ MatchLAC('X') match: 'Y' =~ MatchLAC('Y')

      Actually, I missed the definition of sub MatchLAC in the OP; my reply was directed solely at Re^2: matching problem. Mea culpa.

      ++AnomalousMonk for the excellent exposition! I didn’t know that the empty regex acts as a stand-in for the regex most recently matched (whether the match was successful or not). Is this documented anywhere? I’ve been looking through perlre, etc., but the only mention of the empty regex I’ve found so far relates to its use with split, where it means “split the string into individual characters.”

      I guess overloading it makes sense, as a match that always succeeds isn’t much use. Are there any typical use cases for employing the empty regex to mean “repeat the regex used in the previous match”?

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        ... excellent exposition!

        Thank you very much!

        ... the empty regex acts as a stand-in for the regex most recently matched (whether the match was successful or not).

        In looking for empty pattern documentation (see below), I discovered this is not the case: "If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead." (Strange what you can find when you actually read the docs!) Fixed my reply: thanks!

        Is this documented anywhere?

        The only place I've seen it is in perlop in the Regexp Quote-Like Operators section: the discussion of the  m// operator has a sub-section titled "The empty pattern //" (there's also a brief back-reference to it in the discussion of the  s/// operator).

        Are there any typical use cases for employing the empty regex to mean “repeat the regex used in the previous match”?

        My vague impression is this is something that evolved early-on as an emulation of shell usage or maybe from a desire for some kind of command line one-liner short-cut facility: saves typing, y'know. Offhand, I can't come up with a compelling example.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1020413]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2024-04-23 08:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found