Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Extract Tags between Two strings

by Athanasius (Chancellor)
on Jun 11, 2012 at 14:58 UTC ( #975586=note: print w/replies, xml ) Need Help??

in reply to Extract Tags between Two strings

The following makes a single pass over the input file (I have used in-file DATA for convenience. You will, of course, have to change this to read from your data file):

use strict; use warnings; my (%tags_to_match, @extracted_tags); my $in_matching = 0; my $tag_prefix = 'bbc_'; my $tag_regex = qr{ ( $tag_prefix \w+ _ \d+ ) }x; while (my $line = <DATA>) { if ($in_matching) { if ($line =~ / ^ \s* \[ end \] \s* $ /x) { $in_matching = 0; } elsif ($line =~ $tag_regex) { $tags_to_match{ $1 }++; } } elsif ($line =~ / ^ \s* \[ start \] \s* $ /x) { $in_matching = 1; } elsif ($line =~ $tag_regex) { my $tag = $1; foreach (keys %tags_to_match) { if ($tag eq $_) { push @extracted_tags, $tag; last; } } } } say "\@extracted_tags = ", join(', ', @extracted_tags); __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

This should work provided the tags to be extracted always appear after the start/end block in which they are specified. If this is not the case for your input file, you will need to make two passes over the file: the first to read the contents of the start/end block(s), the second to extract the specified tags.

Also note that your regex may not be doing what you wanted. [a-zA-Z]*[0-9]*_* means: zero or more letters, followed by zero or more digits, followed by zero or more underscores. In my code I use a regex which is a guess at what was intended.


Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975586]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2018-06-22 23:32 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.