Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re: Automatic Subsequent Indexing.

by muba (Priest)
on Jun 22, 2012 at 15:28 UTC ( #977862=note: print w/replies, xml ) Need Help??

in reply to Automatic Subsequent Indexing.

Something like this, you mean?

use strict; use warnings; my $offset = 0; # Where are we in the string? my $string = do { # Grab the string. local $/; <DATA>; }; my $numResults = 0; while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; if ($idxSummary > -1) { $offset = $idxSummary + length("SUMMARY"); my $idxDescription = index($string, "DESCRIPTION", $offset); if ($idxDescription == -1) { print "(Data malformed: missing DESCRIPTION line.)\n"; last; } my $length = $idxDescription - $offset; $result = substr($string, $offset, $length); $offset = $idxDescription + length("DESCRIPTION"); $result =~ s/^\s+|\s+$//g ; # Strip leading and trailing white +space, # includng newlines. $numResults++; } else { print "(All done. $numResults result(s) found.)\n"; last; } print " <$result>\n"; } __DATA__ This is bogus data SUMMARY Event 1 DESCRIPTION Lorem ipsum etc etc This is bogus data SUMMARY Event 2 DESCRIPTION Lorem ipsum This is bogus data SUMMARY The Third Event DESCRIPTION Lorem ipsum This is bogus data SUMMARY Event Number Four DESCRIPTION Lorem ipsum

It gives this output:

<Event 1> <Event 2> <The Third Event> <Event Number Four> (All done. 4 result(s) found.)

Update: Alternatively, if it's not really the indexes you care about, but only capturing those titles, how about this?

my @results = $string =~ m/SUMMARY\s*(.+?)\s*DESCRIPTION/g; print map { " <$_>\n"} @results;

Replies are listed 'Best First'.
Re^2: Automatic Subsequent Indexing.
by MiriamH (Novice) on Jun 22, 2012 at 16:07 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977862]
[LanX]: (not a high priority bug because I can use some HTML entities in the second string)
[Corion]: base64 is padded to a multiple of 4 chars (or something)
[LanX]: misunderstanding, I joined them before converting to base64
[Corion]: Also, I would be wary of encodings and try to make really sure that both input strings are UTF-8. Maybe join the input strings from one source together to see whether they decode as bad or not
[Corion]: LanX: Then the problem should persist without encoding to base64 too ;)
[LanX]: I think it's a flag problem ... I'll produce a reprodocable example for SOPW
[Corion]: "flag problem" to me sounds like "contains UTF-8 bytes but was never properly decoded to an UTF-8 string"
[LanX]: not my code ...
[choroba]: yeah, sounds like one of the strings is not flagged as UTF-8
[choroba]: which usually means its input wasn't handled correctly

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2017-01-16 13:55 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (150 votes). Check out past polls.