multiple OR match fails

zzgulu has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multiple OR match fails by jwkrahn (Abbot) on Jan 31, 2012 at 03:33 UTC
`while(<IN>) { undef ($/); $string=$_;` [download] Because you undef $/ inside the loop that means that the first time through the loop $_ will contain only the first line of the file and the second time through the loop $_ will contain all the rest of the file. Did you really want to process the file in two chunks like that?	[reply] [d/l]
Re^2: multiple OR match fails by bimleshsharma (Beadle) on Jan 31, 2012 at 10:19 UTC
yes, The input record separator, newline by default. $/ may be set to a value longer than one character in order to match a multi-character delimiter. If $/ is undefined, no record separator is matched, and <FILEHANDLE> will read everything to the end of the current file in one line.	[reply]
Re: multiple OR match fails by JavaFan (Canon) on Jan 31, 2012 at 02:52 UTC
If only finds one match, because you only ask it to match once. Use a while loop if you want to find all the matches.	[reply]
Re: multiple OR match fails by InfiniteSilence (Curate) on Jan 31, 2012 at 03:16 UTC
Would have been nice to include at least some piece of your input file. Your regex looks like it is missing some parentheses or something. Also, if your code is so short you might consider one-lining it: `~linux> perl -ne 'if(m/((FINDINGS\|COMPLICATIONS):(.?)([A-Z]+))/sgm){p +rint qq\|$2$3\t$4\n\|};' foo7.txt` [download] Celebrate Intellectual Diversity*	[reply] [d/l]
Re^2: multiple OR match fails by ikegami (Patriarch) on Jan 31, 2012 at 06:16 UTC
`if (//g)` should be `while (//g)`.	[reply] [d/l] [select]
Re: multiple OR match fails by lune (Pilgrim) on Jan 31, 2012 at 13:33 UTC
The obvious part of your question refers to return all matches from a regex match. That can easily done like this (I simplified your regex, as the missing parenthesis makes it unclear, what you really want): `while(<STDIN>) { # see previous answer #undef ($/); $string=$_; my @matches = ($string =~ m/(FINDINGS\|COMPLICATIONS\|:.*)/g); print STDOUT "@matches \n"; } echo "FINDINGS COMPLICATIONS :something" \| t.pl` [download] However from your question it seems, what you really want is not just to get a list of matches, but some sort of parsing. eg. extract the text from the section "FINDINGS" etc. To answer this, it would be necessary to know, where a section ends. If this is not, what you wanted, please clarify.	[reply] [d/l]
Re^2: multiple OR match fails by zzgulu (Novice) on Jan 31, 2012 at 15:36 UTC
Thank you very much for your inputs and sorry for the typo; one parenthesis was missing from the code. My text files are operative notes and each note consists of sections that start with a title at the beginning of a line, all in upper case and end in colon. Sections are usually separated by an empty line, although this may not be always the case. The input directory contains 1000 files and my intention is to write the files back to an output directory but with only designated matched sections (title + content). Per recommendation, it seems adding a while loop to my matching RegEx fixed the issue but please do advise me if you find other issues in the code. I seldom do codes but since I am working with text files the RegEx is very powerful helping me for occasional data extraction.I am sure there are much easier ways to code what I coded below. This is a sample input file: PREOPERATIVE DIAGNOSIS: Left invasive cancer, positive margins. TITLE OF OPERATION: 1. Left needle-localized segmental mastectomy. 2. intraoperative axillary lymphatic mapping. 3. lymphadenectomy. ANESTHESIA: General. INDICATIONS FOR SURGERY: Invasive carcinoma with positive margins and residual calcifications. COMPLICATIONS : None. #!/usr/bin/perl use strict; use warnings; my $indir; my $file; my $new; my $string; my $outdir; $indir = 'C:/input'; $outdir ='C:/output'; if(-d $indir) { opendir(DIR, $indir) or die "can't open $!"; } while ($file=readdir(DIR)) { my $fullpath=$indir.'/'.$file; open IN, "$indir/$file"; $new= "$outdir/$file"; open OUT, ">$new"; while(<IN>) { undef ($/); $string=$_; while ($string =~m/(FINDINGS\|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR); exit; [download]	[reply] [d/l]
Re^3: multiple OR match fails by Marshall (Canon) on Jan 31, 2012 at 22:41 UTC
Since you asked for comments, I'll make a few: - main improvement is to make better indenting - if(-d $indir) was unnecessary - when you do a readdir, this returns only the names (not full paths) and this will include any directories (including the . and .. ones!). It is common to use a grep to filter out the stuff that you don't want. - always check whether any kind of file operation succeeded or not - declare variables when you actually use them the first time. I didn't actually run this so excuse me if I made a mistake. #!/usr/bin/perl use strict; use warnings; my $indir = 'C:/input'; my $outdir ='C:/output'; opendir(DIR, $indir) or die "can't open directory $indir $!"; foreach my $file (grep{-f "$indir/$_"}readdir DIR) { open IN, '<', "$indir/$file" or die "can't open $indir/$file $!"; my $new= "$outdir/$file"; open OUT, '>', $new or die "can't open $new for output $!"; while (my $string = <IN>) { undef ($/); while ($string =~m/(FINDINGS\|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm +) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR); [download] update: these "close" statements aren't strictly necessary, all file handles will get closed when your program exists. When you open IN for the next file, this automatically closes the current IN file (if there is one). exit() wasn't necessary, so I took it out.	[reply] [d/l]


"be consistent"
	PerlMonks