Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

matching substrings on each line

by anasuya (Novice)
on Oct 08, 2012 at 04:44 UTC ( #997740=perlquestion: print w/ replies, xml ) Need Help??
anasuya has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I have a file which looks like this:
a_12_3_5- k_3_4_6-a_12_3_5- q_1_5_7_9- q_1_5_7_9- a_9_4_5-c_3_4_6- c_3_4_6-r_4_5_7- b_1_1_3- v_1_5_7- d_12_4_5-e_4_5_6- g_5_6_7-d_6_8_6- b_1_1_7-f_3_8_7_8-d_4_1_4- d_4_1_5-b_1_1_7-f_3_8_3 b_1_1_7-f_3_8_7_8-d_4_1_4- e_3_3_1-f_3_8_7-f_21_3_1-b_1_1_7-a_1_1_1-
This is a space-separated file. The strings appearing before and after the space is made up of smaller substrings, namely b_1_1_3- , k_3_4_6-, b_12_4_5- etc. I have to check how many of the smaller substrings on the Left side of the space are contained in the string appearing on right side of the space. For eg: The 1st line has the substring a_12_3_5- which is present in the string on the right hand side. The 2nd line shows exact match with the right side. 3rd line however shows that there is only one substring on the left side (c_3_4_6-) which is contained in the string on the right side. 4th and 5th lines represent cases wherein no substring on left side is contained in the string on right side. Next line shows that there is only one substring (b_1_1_7-) that is contained in the string on right side. The last line shows that there are 2 substrings b_1_1_7- and f_3_8_7- which are contained in the string on the right hand side. Thus effectively, I want to see how many of the substrings on the left hand side are contained in the string on the right hand side. I am trying to get the output in the following manner, such that each line in my file is appended with three numbers i.e number of substrings in left side, number of substrings in right side, no.of substrings on left side that are contained in the right side.
a_12_3_5- k_3_4_6-a_12_3_5- 1 2 1 q_1_5_7_9- q_1_5_7_9- 1 1 1 a_9_4_5-c_3_4_6- c_3_4_6-r_4_5_7- 2 2 1 b_1_1_3- v_1_5_7- 1 1 0 d_12_4_5-e_4_5_6- g_5_6_7-d_6_8_6- 2 2 0 b_1_1_7-f_3_8_7_8-d_4_1_4- d_4_1_5-b_1_1_7-f_3_8_3- 3 3 1 b_1_1_7-f_3_8_7_8-d_4_1_4- e_3_3_1-f_3_8_7-f_21_3_1-b_1_1_7-a_1_1_1- 3 + 5 2
I have so far been able to get the above output , but cant seem to get upto the last portion of the line. I have tried using os and also index functions. but cant seem to get ahead. PLease help!

Comment on matching substrings on each line
Select or Download Code
Re: matching substrings on each line
by kcott (Abbot) on Oct 08, 2012 at 06:22 UTC

    G'day anasuya,

    "I have so far been able to get the above output ..."

    Please show the code you used to achieve this.

    "... but cant seem to get upto the last portion of the line."

    The results you've produced seem to indicate you have read to the end of each line. Please clarify.

    Line 6 has no terminal hyphen in the input but it does have one in the output. Is this a typo? If not, perhaps this is related to whatever problem you're experiencing.

    Your output seems to be in line with what you describe you're trying to do. It's not the easiest data to check visually; perhaps I'm missing something. Please provide expected output and show how this differs from what you're currently getting.

    Update:

    "The last line shows that there are 2 substrings b_1_1_7- and f_3_8_7- which are contained in the string on the right hand side."

    Thanks to ++BrowserUk's output below, I note that this statement is incorrect. On the left, you have f_3_8_7_8- which doesn't match f_3_8_7- on the right.

    That may be enough for you to solve the problem yourself. If not, you'll still need to post your code so that we can help you to fix it.

    -- Ken

Re: matching substrings on each line
by BrowserUk (Pope) on Oct 08, 2012 at 06:24 UTC

    C:\test>perl -anlE"my@l=split'(?<=-)',$F[0];my%r;$r{$_}=1 for split'( +?<=-)',$F[1];say $_.' '.@l.' '.keys(%r).' '.grep defined, @r{@l}" jun +k.dat a_12_3_5- k_3_4_6-a_12_3_5- 1 2 1 q_1_5_7_9- q_1_5_7_9- 1 1 1 a_9_4_5-c_3_4_6- c_3_4_6-r_4_5_7- 2 2 1 b_1_1_3- v_1_5_7- 1 1 0 d_12_4_5-e_4_5_6- g_5_6_7-d_6_8_6- 2 2 0 b_1_1_7-f_3_8_7_8-d_4_1_4- d_4_1_5-b_1_1_7-f_3_8_3 3 3 1 b_1_1_7-f_3_8_7_8-d_4_1_4- e_3_3_1-f_3_8_7-f_21_3_1-b_1_1_7-a_1_1_1- 3 + 5 1

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: matching substrings on each line
by choroba (Abbot) on Oct 08, 2012 at 06:24 UTC
    Crossposted at StackOverflow. It is considered polite to inform about crossposting so people not attending both sites do not waste their time on problems already solved at the other end of the internets.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: matching substrings on each line
by grizzley (Chaplain) on Oct 08, 2012 at 06:31 UTC
    Did I understand correctly, that you want to
    • find string until first space /^(.*?)\s(.*)/; ($left, $right) = ($1, $2);
    • split it by '-' characters @tokens=split /(?<=-)/, $left
    • and find each of the tokens in right side string for $token(@tokens) { @result = $right =~ /$token/g }
    ?
Re: matching substrings on each line
by Generoso (Vicar) on Oct 08, 2012 at 13:39 UTC

    Is this what you are looking for?

    #!/usr/bin/perl use strict; use warnings; use v5.10; my @data = <DATA>; foreach (@data) { my ($left, $right)=split' ',$_; my @l=split'(?<=-)',$left; foreach (@l) { say $_." is in ".$right if index($right, $_) > 0; } } __DATA__ a_12_3_5- k_3_4_6-a_12_3_5- q_1_5_7_9- q_1_5_7_9- a_9_4_5-c_3_4_6- c_3_4_6-r_4_5_7- b_1_1_3- v_1_5_7- d_12_4_5-e_4_5_6- g_5_6_7-d_6_8_6- b_1_1_7-f_3_8_7_8-d_4_1_4- d_4_1_5-b_1_1_7-f_3_8_3 b_1_1_7-f_3_8_7_8-d_4_1_4- e_3_3_1-f_3_8_7-f_21_3_1-b_1_1_7-a_1_1_1-

    Result

    perl "F:\perl_TK\perldb\perl1data3.pl" Process started >>> a_12_3_5- is in k_3_4_6-a_12_3_5- b_1_1_7- is in d_4_1_5-b_1_1_7-f_3_8_3 b_1_1_7- is in e_3_3_1-f_3_8_7-f_21_3_1-b_1_1_7-a_1_1_1- <<< Process finished. ================ READY ================

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://997740]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-08-21 16:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (136 votes), past polls