Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Extract Tags between Two strings

by Athanasius (Chancellor)
on Jun 11, 2012 at 14:58 UTC ( #975586=note: print w/replies, xml ) Need Help??

in reply to Extract Tags between Two strings

The following makes a single pass over the input file (I have used in-file DATA for convenience. You will, of course, have to change this to read from your data file):

use strict; use warnings; my (%tags_to_match, @extracted_tags); my $in_matching = 0; my $tag_prefix = 'bbc_'; my $tag_regex = qr{ ( $tag_prefix \w+ _ \d+ ) }x; while (my $line = <DATA>) { if ($in_matching) { if ($line =~ / ^ \s* \[ end \] \s* $ /x) { $in_matching = 0; } elsif ($line =~ $tag_regex) { $tags_to_match{ $1 }++; } } elsif ($line =~ / ^ \s* \[ start \] \s* $ /x) { $in_matching = 1; } elsif ($line =~ $tag_regex) { my $tag = $1; foreach (keys %tags_to_match) { if ($tag eq $_) { push @extracted_tags, $tag; last; } } } } say "\@extracted_tags = ", join(', ', @extracted_tags); __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

This should work provided the tags to be extracted always appear after the start/end block in which they are specified. If this is not the case for your input file, you will need to make two passes over the file: the first to read the contents of the start/end block(s), the second to extract the specified tags.

Also note that your regex may not be doing what you wanted. [a-zA-Z]*[0-9]*_* means: zero or more letters, followed by zero or more digits, followed by zero or more underscores. In my code I use a regex which is a guess at what was intended.


Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975586]
[atcroft]: james28909: What about October 5, 1582?
[stevieb]: atcroft: "Make both hands into fists..."... is something my Ma taught me in our native lang, but I was to ignorant and young to pay attention. Thanks for that :)
[atcroft]: stevieb: Sad to say that I only recently learned that particular trick, but I have since found it very useful.... :)
[james28909]: ill be back with a solution eventually
[stevieb]: it's a reminder to re-inforce it :P
[atcroft]: james28909: That particular questions was a bit of trick, actually (depending on the country you are in). More interesting is, if you are trying to subtract from an epoch time, for instance, you might have to consider when/if DST occurs for a location,
[atcroft]: because you may have to adjust the number of seconds you change from an epoch from 86400 (not to mention leap seconds)....

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2017-04-29 04:35 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (531 votes). Check out past polls.