comment on

i'm trying to create a string tokenizer for a config file parser and the best that i've managed to think of is this:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my $line = q[keyword1 value keyword2 "value with spaces" keyword3 valu
+e];

print Dumper tokenize_line($line);

sub tokenize_line {
    my $line = shift;

    my @tokens;
    while ($line =~ /(\S+)/g) {
        # every non-space match is a token
        push @tokens, $1;

        # anything in double-quotes is a single token
        if ($line =~ /\G\s*"(.+?)"/) {
            push @tokens, $1;
            # continue from this last match
            $line = $';
        }
    }

    return \@tokens;
}
[download]

wich outputs this:

$VAR1 = [
          'keyword1',
          'value',
          'keyword2',
          'value with spaces',
          'keyword3',
          'value'
        ];
[download]

i know it's an ugly hack, trying to substitute the original string with the rest of the matched pattern ($line = $';), but in my previous attempts i would use split and substr to achieve the same results... and it was very ugly :)
what would be a better way to write this? i will be parsing some hundred lines from a config file, so i don't think i want a performance penalty. thank you all for your time and advice!

:)))))

In reply to Re^2: use English; and performance by asz
in thread use English; and performance by asz

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks