Regex Question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex Question by Ratazong (Monsignor) on Jan 22, 2013 at 11:47 UTC
The following could be a start: `$data =~ /^Subject.\\(.?)\*/; print "$1\n";` [download] See here how it works. HTH, Rata	[reply] [d/l]
Re^2: Regex Question by Anonymous Monk on Jan 22, 2013 at 15:10 UTC
Ratazong, I think is good practice to always check "match" m// string. Incase, the match string fails, one can know.	[reply]
Re^3: Regex Question by ww (Archbishop) on Jan 22, 2013 at 19:23 UTC
The link posted in Re^2 by AnonyMonk points to an elderly post which links to the (sorely outdated) docs here on-site. A better way to do this is `[docs://perlre]` which produces a link, perlre, (to the perlre page `http://perldoc.perl.org/`) and the current docs. Though, I suppose, if directing one's suggestions to someone who hasn't updated/upgraded his or her installed Perl since 1999, the Monastery docs might be more appropriate.	[reply] [d/l] [select]
Re: Regex Question by Matt™ (Acolyte) on Jan 22, 2013 at 11:48 UTC
`if (($data =~ /Subject:(.)\\(.)\*/i)) { $out = trim($2); } print $out; exit;` [download] hope this helps... Note: Sorry, Remove '&trim' or use an appropriate trim subroutine	[reply] [d/l]
Re: Regex Question by tmharish (Friar) on Jan 22, 2013 at 14:44 UTC
I agree with BillKSmith and ww. but ... use 5.010 ; use strict ; use warnings ; use Data::Dump qw( dump ) ; my @data = @{ _get_data() } ; my @clean_data ; my @removed_data ; foreach my $data_elem ( @data ) { # "Subject: 4 Details - MICHAEL4 NICHOLSON4 - *Senior S4 (4)", if( $data_elem =~ /^(.?):(.?)-(.?)-.?\\(.?)\.?$(.?)$$/ + ) { my $subject = $1 ; my $item = $2 ; my $name = $3 ; my $position = $4 ; my $id = $5 ; unless( $subject eq 'Subject' ) { warn( "Subject not 'Subject', Ignoring ... \n" ) ; next ; } push @clean_data, [ $subject, $item, $name, $id ] ; push @removed_data, $position ; } else { warn( "Unknown data format - Ignoring\n" ) ; } } dump( \@data ) ; dump( \@clean_data ) ; dump( \@removed_data ) ; exit() ; sub _get_data { my @data ; foreach my $data_item ( 0 .. 10 ) { if( rand() < .5 ) { push @data, "Subject: $data_item Details - MICHAEL$data_item N +ICHOLSON$data_item - *Senior S$data_item ($data_item)" ; } else { push @data, "NoSub: $data_item Details - MICHAEL$data_item N +ICHOLSON$data_item - *Senior S$data_item ($data_item)" ; } } return \@data ; } [download] OUTPUT ( Note there is a 'rand' in there so you might not see this ) Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... Subject not 'Subject', Ignoring ... [ "NoSub: 0 Details - MICHAEL0 NICHOLSON0 - *Senior S0 (0)", "Subject: 1 Details - MICHAEL1 NICHOLSON1 - *Senior S1 (1)", "Subject: 2 Details - MICHAEL2 NICHOLSON2 - *Senior S2 (2)", "Subject: 3 Details - MICHAEL3 NICHOLSON3 - *Senior S3 (3)", "Subject: 4 Details - MICHAEL4 NICHOLSON4 - *Senior S4 (4)", "NoSub: 5 Details - MICHAEL5 NICHOLSON5 - *Senior S5 (5)", "NoSub: 6 Details - MICHAEL6 NICHOLSON6 - *Senior S6 (6)", "NoSub: 7 Details - MICHAEL7 NICHOLSON7 - *Senior S7 (7)", "Subject: 8 Details - MICHAEL8 NICHOLSON8 - *Senior S8 (8)", "Subject: 9 Details - MICHAEL9 NICHOLSON9 - *Senior S9 (9)", "Subject: 10 Details - MICHAEL10 NICHOLSON10 - *Senior S10 (10)", ] [ ["Subject", " 1 Details ", " MICHAEL1 NICHOLSON1 ", 1], ["Subject", " 2 Details ", " MICHAEL2 NICHOLSON2 ", 2], ["Subject", " 3 Details ", " MICHAEL3 NICHOLSON3 ", 3], ["Subject", " 4 Details ", " MICHAEL4 NICHOLSON4 ", 4], ["Subject", " 8 Details ", " MICHAEL8 NICHOLSON8 ", 8], ["Subject", " 9 Details ", " MICHAEL9 NICHOLSON9 ", 9], ["Subject", " 10 Details ", " MICHAEL10 NICHOLSON10 ", 10], ] [ "Senior S1", "Senior S2", "Senior S3", "Senior S4", "Senior S8", "Senior S9", "Senior S10", ] [download]	[reply] [d/l] [select]
Re: Regex Question by ww (Archbishop) on Jan 22, 2013 at 14:18 UTC
I'm with Bill -- I'm not able to be certain about what you want as output (where "output" is more or less synonymous with "take out" which is -- maybe -- roughly equivalent to "capture" where the word "capture" is a hint, if you choose to read perlretut and company, to try to answer your own (lazy) question). Correct that guess, please, or clarify your requirements -- with a more coherent description and an example. And please read The Perl Monks Guide to the Monastery and On asking for help & How do I post a question effectively? -- for guidance on when and how best to obtain help here.	[reply]
Re: Regex Question by BillKSmith (Monsignor) on Jan 22, 2013 at 14:01 UTC
What do you mean by "take out"? Do you wish to remove the matching characters from the original string or do you want to assign them to another variable? Bill	[reply]
Re: Regex Question by sundialsvc4 (Abbot) on Jan 22, 2013 at 16:14 UTC
One thing that you need to keep in mind when building regexes like these is the subject of greedy. By default, the regex will (“greedily ...”) find the longest available string that matches. You can specify that it should, instead, opt for the shortest one, and sometimes (depending on the nature of the string that is to be processed) you must do that. I also strongly advocate that your programming should be suspicious of its input files. If you expect 5 strings, check each time that you have them. In short, if you can be certain of anything in the correctly-running program given correct data, “be from MIssouri ... show me.” Quite frankly, most of the time, I’ve encountered broken data. The supplier of the data didn’t know it was broken. “Inexplicable bugs” turned out to be from that cause. Only the computer itself is in the necessary position to recognize the existence of these issues ... take the slight extra time to make it do so.	[reply]


go ahead... be a heretic
	PerlMonks