Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by choroba (Cardinal) on Nov 22, 2012 at 08:51 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my $string = 'ABCDEFGH';
my $length = 4;
for my $start (0 .. length($string) - $length) {
say substr($string, $start, $length), " -> Starting at position $s
+tart.";
}
Note that Perl uses 0 for the starting position, not 1. If you really need 1, just output $start + 1.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by GrandFather (Saint) on Nov 22, 2012 at 09:28 UTC
|
There are many issues with your code. First off though, always use strictures (use strict; use warnings; - see The strictures, according to Seuss). You use warnings, but strict is at least as important for catching errors. As another general coding tip: don't use the same name for multiple variables. In your sample code you use both $pro and @pro as well as @tetra and $tetra.
Although it is often a good idea to give a manifest constant a name so the intent of the constant is clear, using a variable for 1 called $one adds no information and is likely to cause confusion just because there seems no reason to use the variable.
Your "uninitialized value" variable warning is because you use @+ before the first regular expression match.
You aren't getting the number of iterations in the loop you expect because you update @pro within the loop. That is almost always a bad idea.
There are many ways to skin this cat. One trick is to use a look ahead match and take advantage of the fact that the regular expression engine doesn't allow successive matches at the same position. Consider:
#!/usr/bin/perl
use warnings;
use strict;
my $pro = "ABCDEFGH";
my @tetras;
push @tetras, [$1, $+[0] + 1] while $pro =~ /(?=(.{4}))/g;
print "$_->[0] -> Starting at pos $_->[1]\n" for @tetras;
Prints:
ABCD -> Starting at pos 1
BCDE -> Starting at pos 2
CDEF -> Starting at pos 3
DEFG -> Starting at pos 4
EFGH -> Starting at pos 5
True laziness is hard work
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by BrowserUk (Patriarch) on Nov 22, 2012 at 09:24 UTC
|
[0] Perl> print $1 while 'abcdefgh' =~ m[(?=(.{4}))]g;;
abcd
bcde
cdef
defg
efgh
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP Neil Armstrong
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by space_monk (Chaplain) on Nov 22, 2012 at 08:48 UTC
|
You should look up substr. (Update: choroba put a more detailed explanation while I was writing this, so see below)
Alternatively, in the spirit of TMTOWDI, you also could adopt the following algorithm
- split the words into an array characters
- terminate if less than 4 chars in array
- print the first 4 characters on the array
- remove (shift the first character off the array
- go back to 2)
my $word = 'ABCDEFGH';
my @word = split //, $word;
my $pos = 1;
while (scalar(@word) >= 4) {
print @word[0..3]."==> starting at $pos";
shift @word;
$pos++;
}
A Monk aims to give answers to those who have none, and to learn from those who know more.
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
| [reply] [Watch: Dir/Any] |
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by ColonelPanic (Friar) on Nov 22, 2012 at 09:20 UTC
|
choroba's substr solution is the best for the problem as you have presented it. However, a regex solution could be useful if you will need to introduce other requirements (such as only matching certain characters).
Here is a simple regex solution:
use strict;
use warnings;
my $string = 'ABCDEFGHIJKL';
print "$1$2 at ".pos($string)."\n" while ($string =~ /(.)(?=(...))/g);
Note that pos($string) returns the position where the next match on $string will start. In this case, that happens to be exactly what you want: it is one greater than the (zero-based) position of the current match, meaning it is the position of the current match with one-based indexing.
Update: as I think about it more, using pos() is probably not the best. It is misleading to use it to refer to the match start position, because that is not what it really means. It works in this case, but the code would break if you changed your regex to match something different. Here is the correct way to get the position of the beginning of your match:
print "$1$2 at ". ($-[0] + 1) ."\n" while ($string =~ /(.)(?=(...))/g)
+;
@- is a special variable containing the offset of each subpattern in the previous match. $-[0] will always refer to the beginning of the match (I have added one to give you the one-based position).
When's the last time you used duct tape on a duct? --Larry Wall
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Is it possible to get all tetra words with correct starting position using a better code within a loop?
by AnomalousMonk (Archbishop) on Nov 22, 2012 at 18:54 UTC
|
I, too, thought of BrowserUk's (?= (overlapping capture)) hack (Update: ColonelPanic previously used a version of this hack.) when I first read the OP, but supriyoch_2008 also wants starting positions. No problem, thought I, just throw in a little (?{ code }) and the necessary info can be captured. (The offsets produced in the code examples below are 0-based rather than 1-based as supriyoch_2008 wants, but that's a mere detail. Also, I don't maintain that this approach is necessarily to be preferred as being faster/better/etc.)
However, a little fly in the soup. The code examples 'work', but I don't quite understand what's going on: the positions in the @tetras_pos array are doubled for some reason, hence the $_ * 2 indexing hack in printing position info. In the second example, I can understand the presence of the (5, 6, 7, 8) positions at the end of the (0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8) list of positions as resulting from failed attempts by (?= (....)) to match in positions in which a match is impossible because there are fewer than four characters remaining in the string, but I still don't understand the doubling in the previous part of the list.
I have the feeling this behavior has been touched on before somewhere, but I can't lay my hands on a reference. Can anyone offer any insight?
>perl -wMstrict -le
"my $pro = 'ABCDEFGH';
;;
my @tetras_pos;
my @tetras =
$pro =~ m{ (?= (....) (?{ push @tetras_pos, $-[1] })) }xmsg;
;;
print qq{'$tetras[$_]' @ $tetras_pos[$_ * 2]} for 0 .. $#tetras;
print qq{@tetras_pos};
;;
@tetras_pos = ();
@tetras =
$pro =~ m{ (?= ((?{ push @tetras_pos, pos $pro }) ....)) }xmsg;
;;
print qq{'$tetras[$_]' @ $tetras_pos[$_ * 2]} for 0 .. $#tetras;
print qq{@tetras_pos};
"
'ABCD' @ 0
'BCDE' @ 1
'CDEF' @ 2
'DEFG' @ 3
'EFGH' @ 4
0 0 1 1 2 2 3 3 4 4
'ABCD' @ 0
'BCDE' @ 1
'CDEF' @ 2
'DEFG' @ 3
'EFGH' @ 4
0 0 1 1 2 2 3 3 4 4 5 6 7 8
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |