Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Two-dimensional match/regex? (2)

by rsFalse (Chaplain)
on Nov 28, 2014 at 22:23 UTC ( [id://1108671]=perlquestion: print w/replies, xml ) Need Help??

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I tried to make a primitive subroutine, which could match some two-dimensional set of characters.

And my questions are: 1) have you used two-dimensional search and where? 2) what are good approaches of making match more affective and more fast?

I've find similar, but old node Two-dimensional match/regex?, almost all links there are non-working.

Below is my try.
My example search is:
# # #.
And subroutine searches all occurrences of such combination of these 4 chars.
In usual one-dimension search matches can not overlap if we use /g modifier and do not change pos(), but here I'm changing pos() and matches can overlap.
use warnings; use strict; sub two_d_search{ my $amount_of_data = shift; my @data = splice @_, 0, $amount_of_data; my @pattern = (); my ($arg, $indentation, $string); my $i = 0; while (@_){ $arg = shift; if ($arg eq "\n"){ $i++; $arg = shift; } $indentation = $arg; $string = shift; $pattern[ $i ] .= ".{$indentation}" if $indentation; $pattern[ $i ] .= $string; print "($i: $pattern[ $i ])", "\n"; } my $pos; my $match = 0; my @matches = (); for my $i (0 .. @data - 1){ undef pos $data[ $i ]; OUT_2: while ($data[ $i ] =~ m/$pattern[0]/g){ ($pos) = @-; # matches can overlap, so 'pos' increases only by +1: (pos $data[ $i ]) = $pos + 1; for my $j (1 .. @pattern - 1){ pos ($data[ $i + $j ]) = $pos; if ($data[ $i + $j ] =~ m/\G$pattern[$j]/){ # do nothing } else { next OUT_2 } } $match ++; push @matches, "[row: $i|column: $pos]"; } } $" = "\n"; # set list output separator to "\n" return "Number of matches: $match;", "Upper-left corners match at:\n@matches" } my @data = <DATA>; chomp @data; my @info = &two_d_search( scalar @data, @data, # indentation; string; argument of line separation 1, '#', "\n", # two_d_regex first line 0, '#', "\n", # two_d_regex second line 2, '#\.' # two_d_regex third line ); print "@info",$/; __DATA__ #..#.....#. ..#...##... .#....#..## #..#....#.. ..#...#..#. .......#.#. ...........
http://ideone.com/O1j7vp
OUTPUT: (0: .{1}#) (1: #) (2: .{2}#\.) Number of matches: 3; Upper-left corners match at: [row: 1|column: 1] [row: 1|column: 6] [row: 2|column: 0]
I think that speed complexity of this search is slow, maybe O(size_of_data * size_of_search_pattern).

Replies are listed 'Best First'.
Re: Two-dimensional match/regex? (2)
by Loops (Curate) on Nov 28, 2014 at 23:56 UTC

    Howdy,

    It's easier if you just search by column:

    my $data = do { local $/; <DATA> }; my $p1 = qr(.{1}#); my $p2 = qr(#); my $p3 = qr(.{2}#\.); for my $column (0 .. 80) { my $start = qr(.{$column}); my $next = qr(.*\n$start); my $target = qr($start(?=$p1$next$p2$next$p3)); while ( $data =~ m/^$target/mg) { my $row = () = substr($data,0,pos $data) =~ /\n/g; print "[row: $row|column: $column]\n"; } } __DATA__ #..#.....#. ..#...##... .#....#..## #..#....#.. ..#...#..#. .......#.#. ...........
    Prints:
    [row: 2|column: 0] [row: 1|column: 1] [row: 1|column: 6]
Re: Two-dimensional match/regex? (2)
by CountZero (Bishop) on Nov 29, 2014 at 12:59 UTC
    Inspired by Loops' suggestion, I made it a bit more flexible and easier to use. As you will note, the position information is directly extracted from the regex variables without the need for a second regex. It should be a little bit faster as well.
    use Modern::Perl '2014'; my $delim = '&'; my @findme = ( '.#', '#', '..#\.' ); my @data = <DATA>; chomp @data; my $data; $data .= "$delim$_" for @data; for my $column ( 0 ... length( $data[0] ) - 1 ) { my $regex = build_regex( $column, @findme ); while ( my $result = $data =~ m/$regex/g ) { my $row = ( $-[0] / length( $data[0] . $delim ) ); say "Hit at row $row, column $column"; } } sub build_regex { my $column = shift; my @pattern = @_; my $pre = '[^' . $delim . ']{' . $column . '}'; my @accumulator; push @accumulator, quotemeta($delim) . $pre . $_ . '[^' . $delim . + ']*' for @pattern; my $regex = join '', @accumulator; return qr/$regex/; } __DATA__ #..#.....#. ..#...##... .#....#..## #..#....#.. ..#...#..#. .......#.#. ...........

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Two-dimensional match/regex? (2)
by Eily (Monsignor) on Nov 29, 2014 at 01:42 UTC

    In usual one-dimension search matches can not overlap if we use /g modifier and do not change pos(), but here I'm changing pos() and matches can overlap.
    Since you said "In usual one-dimension search", I suppose you may already know how to make it possible for matches to overlap without changing pos. It is done by using look-around assertions.

    perl -E 'say scalar (() = "banana" =~ /(?=ana)/g)' will print 2 (because the (?=) portion makes sures there's a match at that position, but perl moves at least one char forward after a zero width match). And you can still get the content of the matches by adding captures inside the look-ahead assertion. perl -E 'say join ", ", "banana" =~ /(?=(ana))/g'.

    In the similar node you gave a link to, the main difficulty came from the fact that the lines where not of equal size, but since yours are, your problem translates pretty well to a one-dimension regex:

    use v5.14; my $pattern = qr/#.{10}#.{13}#\./ms; $_ = <<STRING; #..#.....#. ..#...##... .#....#..## #..#....#.. ..#...#..#. .......#.#. ........... STRING say pos while /(?=$pattern)/g; __DATA__ 14 19 25
    Now you can find the row and column of the matches by using modulos and division.

    This, of course, only works when the lines are of constant length, but actually, you can just reformat the input string:

    use v5.14; my $length = 20; # Must be higher than the length of the longest line # Add padding so that each line is $length chars long # Without checking what char there are (ie, the \n are kept) # So we know that if a char A is in position $a, the char B below it w +ill be in position # $a+$length, or there will be A.{$length-1}B $_ = pack "(A$length)*", <DATA>; # $length - 2 is one line down, one char to the left my $l1 = $length-2; # $length + 1 is one line down, one char to the right my $l2 = $length+1; my $pattern = qr/#.{$l1}#.{$l2}#\./s; # The first row and column have index 0 with this code say join ", ", 1+int(pos() / $length), 1+(pos() % $length) while /(?= +$pattern)/g; __DATA__ #..#.....# ..#...## .#....#..## #..#....#.....# ..#...#..#.## .......#.# ...........
    1, 2 1, 7 2, 1

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1108671]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-20 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found