http://www.perlmonks.org?node_id=1005077

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I think it might be a very silly question to ask but I am at my wit's end to find the solution. I am interested to produce the first tetra-word (ABCD) from a long string (ABCDEFGH), then to delete the first letter from the string and get the tetra word (BCDE) along with its starting position. I have written a script  try.pl using foreach LOOP which shows the tetra words but starting positions for tetra-word CDEF and DEFG are wrong. Moreover, the loop does not show the last tetra-word i.e. EFGH. So, after the loop I have written the code in line 15 to get the last tetra-word. Can I expect a better code to get all the tetra words i.e. ABCD,BCDE,CDEF,DEFG,EFGH with correct starting positions?

Here goes the script try.pl:

#!/usr/bin/perl use warnings; $pro="ABCDEFGH"; @pro=split('',$pro); print"\n\n Tetra words are:\n"; $one=1; foreach my $item (@pro) { @tetra=@pro [0..3]; $pos=$+[0]+$one; # Line 8 $tetra=join('',@tetra); print"\n $tetra ->Starting at pos $pos\n"; $pro =~ s/.//; @pro=split('',$pro); } # To get the last tetra: $last=join('',@pro); print"\n $last\n"; # Line 15 exit;

I have got the following wrong results in cmd:

Microsoft Windows [Version 6.1.7600] C:\Users\x>cd desktop C:\Users\x\Desktop>try.pl Tetra words are: Use of uninitialized value in addition (+) at C:\Users\x\Desktop\try.p +l line 8. ABCD ->Starting at pos 1 BCDE ->Starting at pos 2 CDEF ->Starting at pos 2 DEFG ->Starting at pos 2 EFGH

The correct results should look like:

Tetra words are: ABCD ->Starting at pos 1 BCDE ->Starting at pos 2 CDEF ->Starting at pos 3 DEFG ->Starting at pos 4 EFGH ->Starting at pos 5

Replies are listed 'Best First'.
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by choroba (Cardinal) on Nov 22, 2012 at 08:51 UTC
    Use substr:
    #!/usr/bin/perl use warnings; use strict; use feature 'say'; my $string = 'ABCDEFGH'; my $length = 4; for my $start (0 .. length($string) - $length) { say substr($string, $start, $length), " -> Starting at position $s +tart."; }
    Note that Perl uses 0 for the starting position, not 1. If you really need 1, just output $start + 1.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      choroba

      Thank you very much for your prompt reply. I am sorry for late response as I did not have access to internet for quite a few days due to some technical problem.

      Regards,

Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by GrandFather (Saint) on Nov 22, 2012 at 09:28 UTC

    There are many issues with your code. First off though, always use strictures (use strict; use warnings; - see The strictures, according to Seuss). You use warnings, but strict is at least as important for catching errors. As another general coding tip: don't use the same name for multiple variables. In your sample code you use both $pro and @pro as well as @tetra and $tetra.

    Although it is often a good idea to give a manifest constant a name so the intent of the constant is clear, using a variable for 1 called $one adds no information and is likely to cause confusion just because there seems no reason to use the variable.

    Your "uninitialized value" variable warning is because you use @+ before the first regular expression match.

    You aren't getting the number of iterations in the loop you expect because you update @pro within the loop. That is almost always a bad idea.

    There are many ways to skin this cat. One trick is to use a look ahead match and take advantage of the fact that the regular expression engine doesn't allow successive matches at the same position. Consider:

    #!/usr/bin/perl use warnings; use strict; my $pro = "ABCDEFGH"; my @tetras; push @tetras, [$1, $+[0] + 1] while $pro =~ /(?=(.{4}))/g; print "$_->[0] -> Starting at pos $_->[1]\n" for @tetras;

    Prints:

    ABCD -> Starting at pos 1 BCDE -> Starting at pos 2 CDEF -> Starting at pos 3 DEFG -> Starting at pos 4 EFGH -> Starting at pos 5
    True laziness is hard work

      Grand Father

      Thanks for the code. I am sorry for late reply as I had no access to internet for a few days due to some technical problem. Your code has worked nicely and it has solved my problem.

      With Regards,

Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by BrowserUk (Patriarch) on Nov 22, 2012 at 09:24 UTC
    [0] Perl> print $1 while 'abcdefgh' =~ m[(?=(.{4}))]g;; abcd bcde cdef defg efgh

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Hi BrowserUK

      Thanks for the code. I am sorry for late reply as I had no access to internet due to some technical problem.

      With kind regards,

Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by space_monk (Chaplain) on Nov 22, 2012 at 08:48 UTC

    You should look up substr. (Update: choroba put a more detailed explanation while I was writing this, so see below)

    Alternatively, in the spirit of TMTOWDI, you also could adopt the following algorithm

    1. split the words into an array characters
    2. terminate if less than 4 chars in array
    3. print the first 4 characters on the array
    4. remove (shift the first character off the array
    5. go back to 2)
    my $word = 'ABCDEFGH'; my @word = split //, $word; my $pos = 1; while (scalar(@word) >= 4) { print @word[0..3]."==> starting at $pos"; shift @word; $pos++; }
    A Monk aims to give answers to those who have none, and to learn from those who know more.
      One quibble: you forgot to increment your $pos variable.


      When's the last time you used duct tape on a duct? --Larry Wall
        Thanks - I haven't had my first coffee yet!
        A Monk aims to give answers to those who have none, and to learn from those who know more.

      Hi space_monk

      Thanks for your suggestions. I am sorry for late reply as I had no access to internet for a few days.

      With DEEP REGARDS,

Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by ColonelPanic (Friar) on Nov 22, 2012 at 09:20 UTC

    choroba's substr solution is the best for the problem as you have presented it. However, a regex solution could be useful if you will need to introduce other requirements (such as only matching certain characters).

    Here is a simple regex solution:

    use strict; use warnings; my $string = 'ABCDEFGHIJKL'; print "$1$2 at ".pos($string)."\n" while ($string =~ /(.)(?=(...))/g);

    Note that pos($string) returns the position where the next match on $string will start. In this case, that happens to be exactly what you want: it is one greater than the (zero-based) position of the current match, meaning it is the position of the current match with one-based indexing.

    Update: as I think about it more, using pos() is probably not the best. It is misleading to use it to refer to the match start position, because that is not what it really means. It works in this case, but the code would break if you changed your regex to match something different. Here is the correct way to get the position of the beginning of your match:

    print "$1$2 at ". ($-[0] + 1) ."\n" while ($string =~ /(.)(?=(...))/g) +;

    @- is a special variable containing the offset of each subpattern in the previous match. $-[0] will always refer to the beginning of the match (I have added one to give you the one-based position).



    When's the last time you used duct tape on a duct? --Larry Wall

      ColonelPanic

      Thanks for your suggestions and code. I am sorry for late reply as I did not have access to internet for a few days.

      With deep regards,

Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by AnomalousMonk (Archbishop) on Nov 22, 2012 at 18:54 UTC

    I, too, thought of BrowserUk's  (?= (overlapping capture)) hack (Update: ColonelPanic previously used a version of this hack.) when I first read the OP, but supriyoch_2008 also wants starting positions. No problem, thought I, just throw in a little  (?{ code }) and the necessary info can be captured. (The offsets produced in the code examples below are 0-based rather than 1-based as supriyoch_2008 wants, but that's a mere detail. Also, I don't maintain that this approach is necessarily to be preferred as being faster/better/etc.)

    However, a little fly in the soup. The code examples 'work', but I don't quite understand what's going on: the positions in the  @tetras_pos array are doubled for some reason, hence the  $_ * 2 indexing hack in printing position info. In the second example, I can understand the presence of the (5, 6, 7, 8) positions at the end of the (0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8) list of positions as resulting from failed attempts by  (?= (....)) to match in positions in which a match is impossible because there are fewer than four characters remaining in the string, but I still don't understand the doubling in the previous part of the list.

    I have the feeling this behavior has been touched on before somewhere, but I can't lay my hands on a reference. Can anyone offer any insight?

    >perl -wMstrict -le "my $pro = 'ABCDEFGH'; ;; my @tetras_pos; my @tetras = $pro =~ m{ (?= (....) (?{ push @tetras_pos, $-[1] })) }xmsg; ;; print qq{'$tetras[$_]' @ $tetras_pos[$_ * 2]} for 0 .. $#tetras; print qq{@tetras_pos}; ;; @tetras_pos = (); @tetras = $pro =~ m{ (?= ((?{ push @tetras_pos, pos $pro }) ....)) }xmsg; ;; print qq{'$tetras[$_]' @ $tetras_pos[$_ * 2]} for 0 .. $#tetras; print qq{@tetras_pos}; " 'ABCD' @ 0 'BCDE' @ 1 'CDEF' @ 2 'DEFG' @ 3 'EFGH' @ 4 0 0 1 1 2 2 3 3 4 4 'ABCD' @ 0 'BCDE' @ 1 'CDEF' @ 2 'DEFG' @ 3 'EFGH' @ 4 0 0 1 1 2 2 3 3 4 4 5 6 7 8

      AnomalousMonk

      Thank you very much for the code. It has solved my problem. I am sorry for late reply as I had no access to internet for a few days due to some technical problem.

      With Regards,