Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Regex to Array lookup question

by johnfl68 (Beadle)
on Apr 05, 2020 at 21:23 UTC ( #11115102=perlquestion: print w/replies, xml ) Need Help??

johnfl68 has asked for the wisdom of the Perl Monks concerning the following question:

Hello:

Looking for a suggestion of a better way to do this, instead of doing about 30 regex's in a row.

I have NWS API data for icons that references their long list of a wide array of icons with extra data that I do not need.

https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40?size=me +dium https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi,30?size +=medium https://api.weather.gov/icons/land/night/rain_showers,30/rain_showers? +size=medium", https://api.weather.gov/icons/land/day/bkn?size=medium

All I really need is the modifier (tsra, rain, sleet, bkn, skc, few, ovc, etc). I don't really need anything else. Because the format somewhat changes with each response, it's a bit hard to regex down to just the modifier, as some times there are 2, and no established list of all the possibilities. At this point I figure possibly use just the first modifier listed. I am going to try and regex it down to just the modifier and see how that works, but I am afraid they will through a wrench in the works at some point that will trip up that regex.

Instead of doing a separate regex for each modifier, is there a way to use an array with a single regex to do a look up table to get the new icon reference? Or another way to see if any modifier is anywhere in the string, then return referenced new icon name?

{ 'skc' => "clear-day", 'few' => "partly-cloudy-day", 'sct' => "partly-cloudy-day", 'bkn' => "cloudy", 'wind' => "wind" }

Any suggestions would be appreciated. Once pointed in the right direction, usually I can figure the rest out. I just can't think of a better way to do this. Too many things on my mind as well, like many others right now. Thanks you!

Replies are listed 'Best First'.
Re: Regex to Array lookup question
by dave_the_m (Monsignor) on Apr 05, 2020 at 21:51 UTC
    Something like the following should do it efficiently:
    my %keys = ( 'skc' => "clear-day", 'few' => "partly-cloudy-day", 'sct' => "partly-cloudy-day", 'bkn' => "cloudy", 'wind' => "wind", ); my $keys = join '|', map quotemeta, sort keys %keys; my $pattern = qr/(....)($keys)(...)/; while (<DATA>) { s/$pattern/$1$keys{$2}$3/; print; }

    Dave.

Re: Regex to Array lookup question
by AnomalousMonk (Bishop) on Apr 05, 2020 at 22:59 UTC
    ... the format somewhat changes with each response ... no established list of all the possibilities. ... they will through a wrench in the works ...

    Kinda hard to do pattern matching when there's no pattern. :) Maybe something like the following. Also, please see How to ask better questions using Test::More and sample data and Short, Self-Contained, Correct Example for useful ways of asking questions — both of the monks and of yourself! (Update: Also also, see haukex's Building Regex Alternations Dynamically article for a discussion of building the  $rx_modifier regex.)

    c:\@Work\Perl\monks>perl use strict; use warnings; use Test::More 'no_plan'; use Test::NoWarnings; my @Tests = ( [ 'https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40?si +ze=medium', 'tsra', 'tsra', ], [ 'https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi,30 +?size=medium', 'rain', 'tsra', ], [ 'https://api.weather.gov/icons/land/night/rain_showers,30/rain_sho +wers?size=medium', 'rain', 'rain', ], [ 'https://api.weather.gov/icons/land/day/bkn?size=medium', 'bkn', ] +, [ 'https://api.weather.gov/icons/land/day/size=medium', ], ); my $rx_pre = qr{ (?<= /) }xms; my $rx_post = qr{ (?= [_?]) }xms; my $rx_generic_modifier = qr{ [[:lower:]]+ }xms; my @modifiers = qw(tsra rain sleet bkn skc few ovc); my ($rx_modifier) = map qr{ $rx_pre (?: $_ | $rx_generic_modifier) $rx_post }xms, join '|', map quotemeta, reverse sort @modifiers ; # print "modifiers regex: $rx_modifier \n"; # for debug VECTOR: for my $ar_vector (@Tests) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } my ($string, @expected_modifiers) = @$ar_vector; my @got_modifiers = $string =~ m{ $rx_modifier }xmsg; is_deeply \@got_modifiers, \@expected_modifiers, "'$string' -> (@got_modifiers)" ; } # end for VECTOR done_testing; __END__ ok 1 - 'https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40 +?size=medium' -> (tsra tsra) ok 2 - 'https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi +,30?size=medium' -> (rain tsra) ok 3 - 'https://api.weather.gov/icons/land/night/rain_showers,30/rain_ +showers?size=medium' -> (rain rain) ok 4 - 'https://api.weather.gov/icons/land/day/bkn?size=medium' -> (bk +n) ok 5 - 'https://api.weather.gov/icons/land/day/size=medium' -> () 1..5 ok 6 - no warnings 1..6

    Update: Moved link to haukex's article into lexical contiguity with other documentation links.


    Give a man a fish:  <%-{-{-{-<

Re: Regex to Array lookup question
by LanX (Archbishop) on Apr 05, 2020 at 21:52 UTC
    Sorry I have trouble understanding your question, could you provide us with a table of desired in- and output?

    At first sight it looks like you want to match an or'ed list of keywords m/skc|few|sct|.../ and translate them with your hash. That's it?

    If I were you, I'd start by stripping the unneeded parts of the URLs out, this should facilitate it considerably and help identifying gaps.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Regex to Array lookup question
by Marshall (Abbot) on Apr 07, 2020 at 03:20 UTC
    I am having a hard time understanding the problem. You write, "no established list of all the possibilities". I looked at the JSON output of https://api.weather.gov/icons. That looks like the possibilities to me. I decoded the JSON and simplified that into a more straightforward translation table. I do not show the LWP code, but I guess you know how to do that.

    In general for things like this, I have found that using the textual description on the website is better than "rolling your own". If say, you need a different text for tsra_sct than the website provides, perhaps you want "tsra_sct" and "tsra_hi" to translate into the same thing? Then we are into a different discussion about how to maintain such a thing. That is a significant ongoing hassle that I don't recommend.

    This gets the textual defs of each abbreviation from the website and translates the first "word" argument of the last path in the URL to that textual definition. I translated each of the URL's you provided. Please explain what else you need...

    #!/usr/bin/perl use strict; use warnings; use JSON::Parse 'parse_json'; my $json = do{local $/ = undef;<DATA>}; my $out = parse_json $json; my %xlated_abbrev; #simple abbreviation table => description foreach my $key (keys %{$out->{icons}}) #gen simple xlate table { $xlated_abbrev{$key} = $out->{icons}{$key}{description}; } my @urls = ( 'https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40?size=m +edium', 'https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi,30?siz +e=medium', 'https://api.weather.gov/icons/land/night/rain_showers,30/rain_showers +?size=medium', 'https://api.weather.gov/icons/land/day/bkn?size=medium' ); foreach my $url (@urls) { my $last_path = (split('/',$url))[-1]; my ($abbrev_to_xlate) = $last_path =~ /^(\w+)/; print "URL = $url\n"; print " $abbrev_to_xlate => \'$xlated_abbrev{$abbrev_to_xlate}\'\ +n\n"; } =PRINTS: URL = https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40?s +ize=medium tsra_sct => 'Thunderstorm (medium cloud cover)' URL = https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi,3 +0?size=medium tsra_hi => 'Thunderstorm (low cloud cover)' URL = https://api.weather.gov/icons/land/night/rain_showers,30/rain_sh +owers?size=medium rain_showers => 'Rain showers (high cloud cover)' URL = https://api.weather.gov/icons/land/day/bkn?size=medium bkn => 'Mostly cloudy' =cut #Data returned from: https://api.weather.gov/icons __DATA__ { "@context": [], "icons": { "skc": { "description": "Fair/clear" }, "few": { "description": "A few clouds" }, "sct": { "description": "Partly cloudy" }, "bkn": { "description": "Mostly cloudy" }, "ovc": { "description": "Overcast" }, "wind_skc": { "description": "Fair/clear and windy" }, "wind_few": { "description": "A few clouds and windy" }, "wind_sct": { "description": "Partly cloudy and windy" }, "wind_bkn": { "description": "Mostly cloudy and windy" }, "wind_ovc": { "description": "Overcast and windy" }, "snow": { "description": "Snow" }, "rain_snow": { "description": "Rain/snow" }, "rain_sleet": { "description": "Rain/sleet" }, "snow_sleet": { "description": "Rain/sleet" }, "fzra": { "description": "Freezing rain" }, "rain_fzra": { "description": "Rain/freezing rain" }, "snow_fzra": { "description": "Freezing rain/snow" }, "sleet": { "description": "Sleet" }, "rain": { "description": "Rain" }, "rain_showers": { "description": "Rain showers (high cloud cover)" }, "rain_showers_hi": { "description": "Rain showers (low cloud cover)" }, "tsra": { "description": "Thunderstorm (high cloud cover)" }, "tsra_sct": { "description": "Thunderstorm (medium cloud cover)" }, "tsra_hi": { "description": "Thunderstorm (low cloud cover)" }, "tornado": { "description": "Tornado" }, "hurricane": { "description": "Hurricane conditions" }, "tropical_storm": { "description": "Tropical storm conditions" }, "dust": { "description": "Dust" }, "smoke": { "description": "Smoke" }, "haze": { "description": "Haze" }, "hot": { "description": "Hot" }, "cold": { "description": "Cold" }, "blizzard": { "description": "Blizzard" }, "fog": { "description": "Fog/mist" } } }
      I am having a hard time understanding the problem. You write, "no established list of all the possibilities".

      We didn't hear back from OP, but we can investigate the problem it suggests in our minds.

      I looked at the JSON output of https://api.weather.gov/icons. That looks like the possibilities to me. I decoded the JSON and simplified that into a more straightforward translation table. I do not show the LWP code, but I guess you know how to do that.

      Marshall's post broke this open for me in a way that I wanted to pursue in first the LWP direction and then with some means to see these things that we're talking about. I'm taking WWW::Mechanize::Chrome through its paces. I ended up actually being able to see these things: 5 WMC screenshots. This gets a bit verbose, so I'll use readmore tags:

      Anyways, I find using perl to access these APIs very interesting.

      Update: Cropped screenshots and typo fixed here.

      This gets the textual defs of each abbreviation from the website and translates the first "word" argument of the last path in the URL to that textual definition. I translated each of the URL's you provided. Please explain what else you need...

      Shoot, Marshall, I want to replicate this interesting script, but I can't see any braces or underscores out of place. I did snip off the documentation to try to shake this error, but it remains unchanged:

      $ ./1.marshall.pl JSON error at line 110, byte 2639/2647: Unexpected character '_' parsi +ng initial state: expecting whitespace: 'n', '\r', '\t', ' ' at ./1.m +arshall.pl line 8, <DATA> line 1. $ cat 1.marshall.pl #!/usr/bin/perl -w use 5.016; use JSON::Parse 'parse_json'; my $json = do{local $/ = undef;<DATA>}; my $out = parse_json $json; my %xlated_abbrev; #simple abbreviation table => description foreach my $key (keys %{$out->{icons}}) #gen simple xlate table { $xlated_abbrev{$key} = $out->{icons}{$key}{description}; } my @urls = ( 'https://api.weather.gov/icons/land/day/tsra_sct,20/tsra_sct,40?size=m +edium', 'https://api.weather.gov/icons/land/day/rain_showers,30/tsra_hi,30?siz +e=medium', 'https://api.weather.gov/icons/land/night/rain_showers,30/rain_showers +?size=medium', 'https://api.weather.gov/icons/land/day/bkn?size=medium' ); foreach my $url (@urls) { my $last_path = (split('/',$url))[-1]; my ($abbrev_to_xlate) = $last_path =~ /^(\w+)/; print "URL = $url\n"; print " $abbrev_to_xlate => \'$xlated_abbrev{$abbrev_to_xlate}\'\ +n\n"; } __DATA__ { "@context": [], "icons": { "skc": { "description": "Fair/clear" }, "few": { "description": "A few clouds" }, "sct": { "description": "Partly cloudy" }, "bkn": { "description": "Mostly cloudy" }, "ovc": { "description": "Overcast" }, "wind_skc": { "description": "Fair/clear and windy" }, "wind_few": { "description": "A few clouds and windy" }, "wind_sct": { "description": "Partly cloudy and windy" }, "wind_bkn": { "description": "Mostly cloudy and windy" }, "wind_ovc": { "description": "Overcast and windy" }, "snow": { "description": "Snow" }, "rain_snow": { "description": "Rain/snow" }, "rain_sleet": { "description": "Rain/sleet" }, "snow_sleet": { "description": "Rain/sleet" }, "fzra": { "description": "Freezing rain" }, "rain_fzra": { "description": "Rain/freezing rain" }, "snow_fzra": { "description": "Freezing rain/snow" }, "sleet": { "description": "Sleet" }, "rain": { "description": "Rain" }, "rain_showers": { "description": "Rain showers (high cloud cover)" }, "rain_showers_hi": { "description": "Rain showers (low cloud cover)" }, "tsra": { "description": "Thunderstorm (high cloud cover)" }, "tsra_sct": { "description": "Thunderstorm (medium cloud cover)" }, "tsra_hi": { "description": "Thunderstorm (low cloud cover)" }, "tornado": { "description": "Tornado" }, "hurricane": { "description": "Hurricane conditions" }, "tropical_storm": { "description": "Tropical storm conditions" }, "dust": { "description": "Dust" }, "smoke": { "description": "Smoke" }, "haze": { "description": "Haze" }, "hot": { "description": "Hot" }, "cold": { "description": "Cold" }, "blizzard": { "description": "Blizzard" }, "fog": { "description": "Fog/mist" } } } __END__ $
        When using a __DATA__ segment, you can't use an __END__ segment. Delete that __END__ line that you added. This is the reason that I embedded the output as a Perldoc instead of attaching the output after an __END__ segment.

        Also, add "use strict;" to the code like I did. This will help you as you experiment with the code.

        Update: this error: "JSON error at line 110, byte 2639/2647: Unexpected character '_' parsing initial state: expecting whitespace: 'n', '\r', '\t', ' ' at ./1.marshall.pl line 8, <DATA> line 1." Is complaining about the first underscore in the __END__ line that you added. The result is invalid JSON syntax. Error messages are often hard to figure out.

Re: Regex to Array lookup question
by johnfl68 (Beadle) on Apr 13, 2020 at 03:59 UTC

    Thank you everyone for your responses. Sorry for my delayed response.

    I've looked at your examples and I have tried a few different things.

    I think I am going to go with a bit simpler approach looking at some other similar code, that so far in testing seems to be working for what I need.

    # Remove leading url $icon =~ s/https:\/\/api.weather.gov\/icons\/land\///; # Split at / my @iconSplit = split(/\//, $icon); my $dayNight = $iconSplit[0]; # Split at ? to remove size info $icon = $iconSplit[1]; my @iconSplit2 = split(/\?/, $icon); $icon = $iconSplit2[0]; # Split at , for % of precip if present my @iconSplit3 = split(/\,/, $icon); $icon = $iconSplit3[0]; my $poP = $iconSplit3[1]; if (not defined $poP) { $poP = 0; # null } my %newIcons= ( 'skc' => "clear-day.png", 'few' => "partly-cloudy-day.png", 'sct' => "partly-cloudy-day.png", 'bkn' => "cloudy.png", 'ovc' => "cloudy.png", 'wind_skc' => "wind.png", 'wind_few' => "wind.png", 'wind_sct' => "wind.png", 'wind_bkn' => "wind.png", 'wind_ovc' => "wind.png", 'snow' => "snow.png", 'rain_snow' => "sleet.png", 'rain_sleet' => "sleet.png", 'snow_sleet' => "sleet.png", 'fzra' => "sleet.png", 'rain_fzra' => "sleet.png", 'snow_fzra' => "sleet.png", 'sleet' => "sleet.png", 'rain' => "rain.png", 'rain_showers' => "rain.png", 'rain_showers_hi' => "rain.png", 'tsra' => "thunderstorm.png", 'tsra_sct' => "thunderstorm.png", 'tsra_hi' => "thunderstorm.png", 'tornado' => "tornado.png", 'hurricane' => "thunderstorm.png", 'tropical_storm' => "thunderstorm.png", 'dust' => "fog.png", 'smoke' => "fog.png", 'haze' => "fog.png", 'hot' => "clear-day.png", 'cold' => "clear-day.png", 'blizzard' => "snow.png", 'fog' => "fog.png" ); $icon = $newIcons{$icon};

    With DarkSky shutting down their API later this year, I need to move over to NWS API, and match current build parameters and icon sets as close as possible.

    Then later I can go back and add more icons to the existing sets, and clean up some of the other smaller details that are changing because of this move.

    Thanks again!

      $icon =~ s/https:\/\/api.weather.gov\/icons\/land\///;

      This substitution needs more backslashes (escapes)! (Never thought I'd say that.) The regex  . (dot) operator matches, by default, any character except a newline. E.g.:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $icon = 'https://apiXweatherXgov/icons/land/fooble'; $icon =~ s/https:\/\/api.weather.gov\/icons\/land\///; print qq{'$icon'}; " 'fooble'
      To match only a period, escape the dot metacharacters:
          $icon =~ s/https:\/\/api\.weather\.gov\/icons\/land\///;
      Of course, a dot also matches a period and that's all that seems to be in those particular positions in your strings so you might never have known the difference, but just for future reference...

      BTW: One way to cut down on escapes in a regex is by the wise choice of a regex delimiter. E.g.:
          $icon =~ s{https://api\.weather\.gov/icons/land/}{};

      Please see perlre, perlretut, and perlrequick.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11115102]
Approved by LanX
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2020-05-25 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (146 votes). Check out past polls.

    Notices?