Re: Help with negative look ahed

in reply to Help with negative look ahed

In your particular case, regexes seem to be overkill. Since the text has "ITEM " only in headings and nowhere else, and that seems to be where you want to split on, it's easier to use split:

use File::Slurp qw<slurp>;

my $text = slurp("a5927574.txt");
my @items = split "ITEM ", $text;

# print first item, as it is
print shift @items;

# prepend "ITEM " to the rest and print
print "ITEM $_" for @items;
[download]

Comment on Re: Help with negative look ahed Select or Download Code

Replies are listed 'Best First'.
Re^2: Help with negative look ahed by davido (Cardinal) on Oct 19, 2012 at 06:57 UTC
It seems likely, looking at his regex, that he's insecure about the notion of "ITEM" being capitalized uniformly over his full data set. The sample he provided us is uniform, but why else would he go to all the trouble of creating alternations like `(?:Item\|ITEM)` several times? If he's unable to depend on an all-caps "ITEM" as a delimiter, your solution won't be any more robust than his current one. Dave	[reply] [d/l]
Re^2: Help with negative look ahed by eversuhoshin (Sexton) on Oct 18, 2012 at 19:55 UTC
Hey PrakashK, thanx for the quick reply. The issue is that I still don't know whether I matched the real end of Item 1. Also, I have bunch of other SEC files that mention Item 3 inside of Item 1. This is the hardest part because I am matching pure text almost. The ones with html, I was able to match more easily. Anyway, thank you for your reply!	[reply]

In Section Seekers of Perl Wisdom