Your script would be significantly more efficient if you detected the start and end of each extraction region while you are reading in the file, something like this (assuming MEDICAL HISTORY: begins a line):
#!/usr/bin/perl
use warnings;
use strict;
open IN, "input.txt" or die;
open OUT, ">output.out" or die;
my $sHistory = '';
my $bInHistory = 0;
while (my $line=<IN>) {
if ($line =~ /^MEDICAL HISTORY:(.*)$/) {
$bInHistory=1;
$sHistory = $1;
} elsif ($line =~ /^[A-Z]/) {
$bInHistory=0;
print OUT $sHistory if $sHistory;
} elsif ($bInHistory) {
$sHistory .= $line;
}
}
print OUT $sHistory if $bInHistory;
Also it is a very good idea to start your script with the two lines:
use strict;
use warnings;
as I did above. You will save yourself a world of debugging pain by doing so.
Another point: the variables $a and $b have special meaning in perl (they are used for sorting algorithms), so it is best to stay away from those variable names as well and name your variables something else.
And another point: always check for errors when you open file handles. Sometimes they don't open like you expect. If you don't check, you'll get strange results without any proper warning.
Best, beth
Update: fixed some bugs (including one pointed out in private msg by almut.) |