Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Reinventing wheels: query string parsing.

by BUU (Prior)
on Jul 27, 2003 at 19:48 UTC ( [id://278262]=perlquestion: print w/replies, xml ) Need Help??

BUU has asked for the wisdom of the Perl Monks concerning the following question:

When designing my latest web application thingy, I decided on rather unique ( as far as I know ) query string parsing requirements. These requirements are basically that each 'o' arguement is associated with the value of all the following 'd' values, but using the value of the 'o' arguement. That is, instead of this query string: o=foo&d=baz&d=bar&o=one&d=uno&o=none&totallyrandom=foo being parsed like so:
{ 'totallyrandom' => 'foo', 'd' => [ 'baz', 'bar', 'uno' ], 'o' => [ 'foo', 'one', 'none' ] };
Which is the default for CGI.pm and others, I want it to be parsed as follows:
{ 'none' => undef, 'one' => 'uno', 'foo' => [ 'baz', 'bar' ] };
I realize that these requirements are a little odd, but I feel that they happen to suit my application very well, and will be easily usable and so on. However, this requires that I write my own querystring parser which I have done, and now I present it to you in the hopes that some monk can find an error or something thats not optimal so I can fix it now before its 'in production'. It's used in the form of  my $u = new QueryParse; $u->handle( $ENV{QUERY_STRING} );

package QueryParse; use strict; sub new { return bless {}; } sub handle { my $self = shift; my $query_string = shift; if( $query_string eq '') { local $/ = undef; $query_string = <STDIN>; } #o=foo & d=baz & d=qux & o=n & d=o & d=f my @query_string = split/[;&]/,$query_string; my %arg; for( my $i = 0; $i < @query_string; $i++ ) { $_ = $query_string[ $i ]; next unless /^[oO]=/; my $o = ( split/=/ )[ 1 ]; my $j = $i + 1; my @opts; while( $j < @query_string ) { $_ = $query_string[ $j ]; last unless /^[dD]=/; my $dat = ( split/=/ )[ 1 ]; push @opts,$dat; $j++; } for( @opts ) { s / % ( [0-9A-Fa-f]{2} # match hex escapes ( %2b ) ) / chr( hex( $1 ) ) # convert in to ascii chars /xeg; # ignore whitespace, execute code, repeat } $o =~ s/%([0-9A-Fa-f]{2})/chr( hex( $1 ) )/eg; # same as above $arg{ $o } = @opts > 1 ? [ @opts ] : $opts[ 0 ]; } return %arg; } 1;

Two notes, one is that I'm deliberately ignoring options passed via POST in options are already being passed by GET, this is so I can use options in the query_string to specify that say, a file is being uploaded and use some other module to upload the file and interface with it.

The other note is that I deliberately 'inlined' the regex for unescaping the hex codes in the options rather then using the module URI::escape as the author of URI::escape mentions in the docs that 'unescape' does the exact same thing as the regex and calling the function adds anywheres from 40% to 70% slow down.

Replies are listed 'Best First'.
Re: Reinventing wheels: query string parsing.
by Abigail-II (Bishop) on Jul 27, 2003 at 20:36 UTC
    It seems like you will inspect each 'd=' part twice. After finding an 'o=' thing, you start with a new index, $j = $i + 1, scanning for 'd=' thingies. If you find one or more, you increment $j, but not $i. Personally, I'd make two nested loops, both of the form while (@query_string), and both shifting off elements.

    Also, it seems that if you have a string of the form:

    o=foo&d=one&d=two&o=foo&d=three

    you just end up with foo => 'three'.

    Personally, I would write the final assignment as:

    $args {$o} = \@opts;

    so that my values are always array refs, and I don't have to make cases in the rest of the code.

    Abigail

      >> It seems like you will inspect each 'd=' part twice...
      Yeah, I think your right. To be honest I hadn't really looked at optimizing the code very much, I played around a bit with just setting $i to $j after the inner while loop finishes, but that didn't appear to change the speed one whit ( Probably needs more benchmarks there ) and I was a little uncertain of what exactly I might break by doing that so I took it back out.

      >>you just end up with foo => 'three'
      Hrm. Thats a case I hadn't thought of. Hm. Now I can't decide exactly what should happen. I'm favoring the idea of turning it into an array of arrays so your case o=foo&d=one&d=two&o=foo&d=three would turn in to $args{o}=[[one,two,three],[three]] only problem is that seems kind of icky, especially the derefencing but I like the idea of being able to call a certain option multiple times.

      >>Personally, I would write the final assignment as:
      Well, I considered that idea but my mind rebelled against the idea of having to write $arg{foo}->[0] every single time I wanted to access a variable. I'm thinking I'll just leave it how it is at the moment, operating under the assumption that most options will only accept one scalar and if a certain option needs multiple values or can accept them then it will check.
Re: Reinventing wheels: query string parsing.
by diotalevi (Canon) on Jul 27, 2003 at 20:00 UTC

    Disregarding your actual code, why don't you copy from CGI.pm if you want to do something differently? Instead of making your own gueses at what works, take some known good code and alter it to fit.

    Oh yeah, and you forgot to localize $_. You stomp on it right now - that's unfriendly.

      >>Disregarding your actual code, why don't you copy from CGI.pm if you want to do something differently?
      Copy *what* from CGI.pm?

      >>Oh yeah, and you forgot to localize $_. You stomp on it right now - that's unfriendly.
      Good point, I fixed it in my code but I can't update my root node so..

        I once read the query parsing code in CGI.pm. If I found it, so can you.

Re: Reinventing wheels: query string parsing.
by fergal (Chaplain) on Jul 27, 2003 at 20:10 UTC

    Can you guarantee that the fields will occur in the query string in the same order as they appear in the page? I think most browsers do this but you may get a nasty surprise in the future if one of them changes.

      hrm. Thats a good, if nasty, point. I was under the impression that most will do it that way, but your quite right that if they arbitraily inserted form elements then bad things could happen. Fortunately that will only affect <form>s so I can still use manual links to set the query string or something..

        If you're only using manual links, you can construct the link such that the parameters submitted use the "standard scheme." This may mean doing more processing before generating the URL, but the code will probably end up more understandable that way -- the translation code will make clear exacly how your variables will be used. Just a thought...

        perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: Reinventing wheels: query string parsing.
by saintbrie (Scribe) on Jul 28, 2003 at 02:53 UTC

    I don't get it. What does this make easier? A query string like

    ?one=uno&foo=baz&foo=bar

    Would take care of the problem for you wouldn't it?

Re: Reinventing wheels: query string parsing.
by mr_stru (Sexton) on Jul 28, 2003 at 04:53 UTC
    Mmmm, I look at that code and I'm really not sure what it's doing without really having to think about it. Maybe it's my aversion to for (;;) style loops but something more like this seems much more understandable to me:
    my (%values, $key); foreach my $value ( split /[&;]/, $query_string ) { my ($name, $data) = split /=/, $value; if ( lc($name) eq 'o' ) { $key = $data; $values{$key} = undef unless exists $values{$key}; } elsif ( lc($name) eq 'd' and defined $key ) { # anything you need to do to $data happens here if ( ref $values{$key} eq 'ARRAY' ) { push @{$values{$key}}, $data; } elsif ( defined $values{$key} ) { $values{$key} = [ $values{$key}, $data ]; } else { $values{$key} = $data; } } }

    It does throw away anything that isn't either an o or d param and any d that occurs before the first o is also thrown away but as your code also does that it seemed the thing to do.

    And if you always use array refs then the second half of the d/o if statement becomes:

    push @{$values{$key}}, $data;

    which seems to me all the clearer.

    Struan

Re: Reinventing wheels: query string parsing.
by TomDLux (Vicar) on Jul 28, 2003 at 17:17 UTC

    If you want your arguments to read foo=baz why do you have your arguments reading o=foo&d=baz. Change your inputs to match your requirements.

    Arguments are by nature unordered, so there are no guarantees about their order, unless you hard-code the URL.

    But if you hard-code the URL, why not just use foo=baz?

    Help! How do I get out of this infinite loop?

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://278262]
Approved by fglock
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-12-08 05:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which IDE have you been most impressed by?













    Results (50 votes). Check out past polls.