Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

question : regex

by greatshots (Pilgrim)
on Oct 11, 2007 at 07:01 UTC ( #644148=perlquestion: print w/replies, xml ) Need Help??

greatshots has asked for the wisdom of the Perl Monks concerning the following question:

$string = "KHI0339B__P_H_Vita_Korangi_Ind_A"; my $string_1 = "HGW6120A__S_Popalzai_Cross_A"; $string_1 =~ s/__[a-zA-Z]{1,2}_/__/g; print ":$string_1:\n"; Output :- :HGW6120A__gPopalzai_Cross_A:
those input I get it from a huge input file. I have only those 2 pattern types. what is the modification required to handle those both types ?

Replies are listed 'Best First'.
Re: question : regex
by moritz (Cardinal) on Oct 11, 2007 at 07:19 UTC
    If I understood your question correctly: s/__[a-zA-Z]{1,2}_(?:[a-zA-Z]_)?/__/
Re: question : regex
by johngg (Canon) on Oct 11, 2007 at 10:20 UTC
    I'm not sure if I've understood your requirement as I can't tell where the 'g' comes from in your output. However, this should get rid of any 'X_' occurances after a '__'.

    use strict; use warnings; my @strings = qw{ KHI0339B__P_H_Vita_Korangi_Ind_A HGW6120A__S_Popalzai_Cross_A }; my $rxPatt = qr{(?<=__)[A-Z]_}; foreach my $string ( @strings ) { print qq{Original: $string\n}; $string =~ s{$rxPatt}{} while $string =~ $rxPatt; print qq{Modified: $string\n\n}; }

    The output is

    Original: KHI0339B__P_H_Vita_Korangi_Ind_A Modified: KHI0339B__Vita_Korangi_Ind_A Original: HGW6120A__S_Popalzai_Cross_A Modified: HGW6120A__Popalzai_Cross_A

    I hope this is of use.

    Cheers,

    JohnGG

Re: question : regex
by jesuashok (Curate) on Oct 11, 2007 at 07:11 UTC
    s/__[a-zA-Z]{1,2}_[a-zA-Z]{0,1}/__/g


    i m possible
Re: question : regex
by Anonymous Monk on Oct 11, 2007 at 08:10 UTC
    from the examples given, my guess is that the need is for any sub-string matching  "__A_" or  "__A_A_" (where  A is any alpha character) to be replaced by  "__" (a double-underscore). (i assume the "g" in "__gPopalzi" is a typo.)

    in that case, this may be appropriate (UNTESTED):

    $string =~ s/ __ (?: [a-zA-Z]_ ){1,2} /__/xms;

    i also assume there is only one instance of the sub-string in a string, and so the  /g regex modifier is not needed. also, that variations on the sub-strings  "__AA_" and  "__AA_AA_" will not occur.

    note: more examples would help the humble monks to better understand the requirements.

      actually, "any sub-string matching" in the first sentence should be "the first sub-string matching".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://644148]
Approved by jesuashok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2020-10-19 22:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (207 votes). Check out past polls.

    Notices?