Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex Question

by Anonymous Monk
on Jan 22, 2013 at 11:43 UTC ( [id://1014636]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have some doubts with a regex

 $data = "Subject: ABCD Details - MICHAEL NICHOLSON - **Senior Sales* (344536)"

Here I want to take out the "Senior Sales" (Always separated by**&* like, **Name*)to another string for later use. Need to mention the "Subject:" also in the regex, bcoz some lines are like below,

 $data = "NoSub: ABCD Details - MICHAEL NICHOLSON - **Senior Sales* (344536)"

and I dont want to take it if the starting is like "NoSub" or anything else. I only want if the Starting is like "Subject:" Thats why in regex, it is necessary to mention the "Subject:" also.

Thanks Monks.

Replies are listed 'Best First'.
Re: Regex Question
by Ratazong (Monsignor) on Jan 22, 2013 at 11:47 UTC

    The following could be a start:

    $data =~ /^Subject.*\*\*(.*?)\*/; print "$1\n";
    See here how it works.

    HTH, Rata
      Ratazong,
      I think is good practice to always check "match" m// string. Incase, the match string fails, one can know.
        The link posted in Re^2 by AnonyMonk points to an elderly post which links to the (sorely outdated) docs here on-site.

        A better way to do this is [docs://perlre] which produces a link, perlre, (to the perlre page http://perldoc.perl.org/) and the current docs.

        Though, I suppose, if directing one's suggestions to someone who hasn't updated/upgraded his or her installed Perl since 1999, the Monastery docs might be more appropriate.

Re: Regex Question
by Matt™ (Acolyte) on Jan 22, 2013 at 11:48 UTC
    if (($data =~ /Subject:(.*)\*\*(.*)\*/i)) { $out = trim($2); } print $out; exit;
    hope this helps...

    Note: Sorry, Remove '&trim' or use an appropriate trim subroutine

Re: Regex Question
by tmharish (Friar) on Jan 22, 2013 at 14:44 UTC

    I agree with BillKSmith and ww.

    but ...

    use 5.010 ; use strict ; use warnings ; use Data::Dump qw( dump ) ; my @data = @{ _get_data() } ; my @clean_data ; my @removed_data ; foreach my $data_elem ( @data ) { # "Subject: 4 Details - MICHAEL4 NICHOLSON4 - **Senior S4* (4)", if( $data_elem =~ /^(.*?):(.*?)-(.*?)-.*?\*\*(.*?)\*.*?\((.*?)\)$/ + ) { my $subject = $1 ; my $item = $2 ; my $name = $3 ; my $position = $4 ; my $id = $5 ; unless( $subject eq 'Subject' ) { warn( "Subject not 'Subject', Ignoring ... \n" ) ; next ; } push @clean_data, [ $subject, $item, $name, $id ] ; push @removed_data, $position ; } else { warn( "Unknown data format - Ignoring\n" ) ; } } dump( \@data ) ; dump( \@clean_data ) ; dump( \@removed_data ) ; exit() ; sub _get_data { my @data ; foreach my $data_item ( 0 .. 10 ) { if( rand() < .5 ) { push @data, "Subject: $data_item Details - MICHAEL$data_item N +ICHOLSON$data_item - **Senior S$data_item* ($data_item)" ; } else { push @data, "NoSub: $data_item Details - MICHAEL$data_item N +ICHOLSON$data_item - **Senior S$data_item* ($data_item)" ; } } return \@data ; }
    OUTPUT ( Note there is a 'rand' in there so you might not see this )
    Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... [ "NoSub: 0 Details - MICHAEL0 NICHOLSON0 - **Senior S0* (0)", "Subject: 1 Details - MICHAEL1 NICHOLSON1 - **Senior S1* (1)", "Subject: 2 Details - MICHAEL2 NICHOLSON2 - **Senior S2* (2)", "Subject: 3 Details - MICHAEL3 NICHOLSON3 - **Senior S3* (3)", "Subject: 4 Details - MICHAEL4 NICHOLSON4 - **Senior S4* (4)", "NoSub: 5 Details - MICHAEL5 NICHOLSON5 - **Senior S5* (5)", "NoSub: 6 Details - MICHAEL6 NICHOLSON6 - **Senior S6* (6)", "NoSub: 7 Details - MICHAEL7 NICHOLSON7 - **Senior S7* (7)", "Subject: 8 Details - MICHAEL8 NICHOLSON8 - **Senior S8* (8)", "Subject: 9 Details - MICHAEL9 NICHOLSON9 - **Senior S9* (9)", "Subject: 10 Details - MICHAEL10 NICHOLSON10 - **Senior S10* (10)", ] [ ["Subject", " 1 Details ", " MICHAEL1 NICHOLSON1 ", 1], ["Subject", " 2 Details ", " MICHAEL2 NICHOLSON2 ", 2], ["Subject", " 3 Details ", " MICHAEL3 NICHOLSON3 ", 3], ["Subject", " 4 Details ", " MICHAEL4 NICHOLSON4 ", 4], ["Subject", " 8 Details ", " MICHAEL8 NICHOLSON8 ", 8], ["Subject", " 9 Details ", " MICHAEL9 NICHOLSON9 ", 9], ["Subject", " 10 Details ", " MICHAEL10 NICHOLSON10 ", 10], ] [ "Senior S1", "Senior S2", "Senior S3", "Senior S4", "Senior S8", "Senior S9", "Senior S10", ]
Re: Regex Question
by ww (Archbishop) on Jan 22, 2013 at 14:18 UTC
    I'm with Bill -- I'm not able to be certain about what you want as output (where "output" is more or less synonymous with "take out" which is -- maybe -- roughly equivalent to "capture" where the word "capture" is a hint, if you choose to read perlretut and company, to try to answer your own (lazy) question).

    Correct that guess, please, or clarify your requirements -- with a more coherent description and an example.

    And please read The Perl Monks Guide to the Monastery and On asking for help & How do I post a question effectively? -- for guidance on when and how best to obtain help here.

Re: Regex Question
by BillKSmith (Monsignor) on Jan 22, 2013 at 14:01 UTC

    What do you mean by "take out"? Do you wish to remove the matching characters from the original string or do you want to assign them to another variable?

    Bill
Re: Regex Question
by sundialsvc4 (Abbot) on Jan 22, 2013 at 16:14 UTC

    One thing that you need to keep in mind when building regexes like these is the subject of greedy.   By default, the regex will (“greedily ...”) find the longest available string that matches.   You can specify that it should, instead, opt for the shortest one, and sometimes (depending on the nature of the string that is to be processed) you must do that.

    I also strongly advocate that your programming should be suspicious of its input files.   If you expect 5 strings, check each time that you have them.   In short, if you can be certain of anything in the correctly-running program given correct data, “be from MIssouri ... show me.”   Quite frankly, most of the time, I’ve encountered broken data.   The supplier of the data didn’t know it was broken.   “Inexplicable bugs” turned out to be from that cause.   Only the computer itself is in the necessary position to recognize the existence of these issues ... take the slight extra time to make it do so.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1014636]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2024-04-18 12:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found