Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Finding common substrings

by ktsirig (Sexton)
on Sep 20, 2006 at 22:04 UTC ( #574016=perlquestion: print w/replies, xml ) Need Help??

ktsirig has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks,
I was wondering if there is a way to do the following.
Say you have 2 strings, like:
$a = 'PF01389 6 218 1 255 430.09'; $b = 'PF00691 PF01389';
You can notice that these 2 strings have PF01389 in common. So, I wanted to knows if there is a way of checking for values inside one string that are equal with values in the other.
I tried using:
if ($a=~/$b/ or $b=~/$a/) {print "correct";}
but it doesn't work, because it searches for the whole $a or $b and not parts of them.
Any suggestions?

2006-09-21 Retitled by GrandFather, as per Monastery guidelines
Original title: 'Is there a way?'

Replies are listed 'Best First'.
Re: Finding common substrings
by jdporter (Chancellor) on Sep 20, 2006 at 22:19 UTC

    Of course there's an algorithm for doing that, but it's undoubtedly overkill for your application. I think you should bring some "semantics" to the problem to make it easier. For example, both of those strings look like they contain space-separated values. So you could convert both to arrays of non-space strings, and compare the two arrays for any elements in common. One way:

    $a = 'PF01389 6 218 1 255 430.09'; $b = 'PF00691 PF01389'; my @a = split ' ', $a; my @b = split ' ', $b; my %a; @a{@a} = (); my @common = grep { exists $a{$_} } @b;

    Also see the Categorized Question How can I find the union/difference/intersection of two arrays?

    Update: Just for fun, here's another way, using regexes:

    my $pattern = $b; # so as not to bash $b $pattern =~ s/\s+/|/g; @common = " $a " =~ /\s($pattern)\s/g;

    Update2: That breaks down if the string being used as the source of the regex ($b here) contains regex-special characters. Better:

    my $pattern = join '|', map quotemeta($_), split ' ', $b; @common = " $a " =~ /\s($pattern)\s/g;

    We're building the house of the future together.
Re: Finding common substrings
by johngg (Canon) on Sep 20, 2006 at 22:29 UTC
    Firstly, $a and $b are best avoided as variable names as they are pre-declared for use with the sort function. You could do this task by splitting one string on spaces (assuming your data is always space delimited) to populate a hash and then splitiing the second string and checking if any part of that exists already in the hash.

    use strict; use warnings; my $str1 = q{PF01389 6 218 1 255 430.09}; my $str2 = q{PF00691 PF01389}; my %str1Hash = map {$_ => 1} split m{\s+}, $str1; foreach my $possible (split m{\s+}, $str2) { print qq{$possible common\n} if exists $str1Hash{$possible}; }

    I hope this is of use.



Re: Finding common substrings
by ayrnieu (Beadle) on Sep 20, 2006 at 22:13 UTC
    use List::Compare; use caveat "I haven't tested this."; my @in_both = List::Compare->new([split /\s+/, $a], [split /\s+/, $b]) +->get_intersection;
Re: Finding common substrings
by mreece (Friar) on Sep 20, 2006 at 22:34 UTC
    split each string and look for dupes ..
    $a = 'PF01389 6 218 1 255 430.09'; $b = 'PF00691 PF01389'; my %counts; foreach ( split /\s+/, $a ) { $counts{$_} = 1; } foreach ( split /\s+/, $b ) { $counts{$_}++ if exists $counts{$_}; } my @common = grep $counts{$_} > 1, keys %counts; if ( @common ) { print "correct\n"; }
    or, less verbose,
    $a = 'PF01389 6 218 1 255 430.09'; $b = 'PF00691 PF01389'; my %in_a = map { $_ => 1 } split /\s+/, $a; my @in_both = grep { exists $in_a{$_} } split /\s+/, $b; if ( @in_both ) { print "correct\n"; }
      Thank you all! You really helped me understand a lot of things just by this question I had!
      I might be wrong but I think your first method will give a false positive if one string contains a duplicated word but that word doesn't appear in the other string. The $counts{$_} will be more than one but only because the word appeared twice in the same string, not because it was duplicated in the other string.



        actually, it won't, because the first foreach only sets to 1 and not ++, and the second foreach only does ++ it if already exists, which means it was already found in $a.
Re: Finding common substrings
by Anonymous Monk on Sep 21, 2006 at 04:12 UTC
    With a bunch of assumptions:
    $a = 'PF01389 6 218 1 255 430.09'; $b = 'PF00691 PF01389'; $_ = " $a \n $b "; # combine strings for one regex print "$_\n" for / (\S+) (?=.*\n.* \1 )/g;
Re: Finding common substrings
by Persib (Acolyte) on Sep 21, 2006 at 10:10 UTC
    if($a =~ /[$b]/) { print "true \n" };


    Just Ignore my Code, this's totally wrong, i'm sorry, (maybe i'm too tired)
Re: Finding common substrings
by bsb (Priest) on Sep 21, 2006 at 15:29 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://574016]
Approved by Corion
Front-paged by Marza
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2022-05-21 15:54 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (76 votes). Check out past polls.