Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regex problem

by dazz (Beadle)
on May 25, 2017 at 09:35 UTC ( [id://1191188]=perlquestion: print w/replies, xml ) Need Help??

dazz has asked for the wisdom of the Perl Monks concerning the following question:

Hello
I download a web page that includes data used and remaining data for a mobile connection.
I want to find the two values and convert them from strings to values.

A snippet of the html page is:
<span class="remaining-data">54MB used</span> <span class="expires-data-right-align">1.44GB remaining</span>
I use the regex to get a matching substring with the 1st value within
(my $DataUsed) = $stgUsed =~ /"remaining-data">([+-]?(\d*\.)?\d+)(MB +|GB) used/; # trying to get just the digits. (my $unit) = $stgUsed =~ /(MB|GB)/; # match either MB or GB if ( $unit eq "GB"){ $DataUsed *= 1000; }
The output in $stgUsed is what I expect:
DB<5> x $stgUsed 0 'class="remaining-data">315MB'
In want to capture just the number (315) and the units (MB) but $DataUsed in undef.
I have tried using $1,$2 ... but they are undef as well.

How to I get the digits substring and the MB/GB substring????

Dazz

Replies are listed 'Best First'.
Re: Regex problem
by haukex (Archbishop) on May 25, 2017 at 09:49 UTC

    You will be doing yourself a huge favor if you use a module to parse HTML instead of regexes. I discussed some of the options for parsing HTML and gave some example code here: "Two classic modules are HTML::Parser and HTML::TreeBuilder, but there are several others, such as Mojo::DOM. If the input is always XHTML, there's XML::Twig and many more XML-based modules."

    use warnings; use strict; use Data::Dump; use Mojo::DOM; my $html = <<'END_HTML'; <span class="remaining-data">54MB used</span> <span class="expires-data-right-align">1.44GB remaining</span> END_HTML my $dom = Mojo::DOM->new($html); for my $e ($dom->find('span[class="remaining-data"]')->each) { dd $e->text; my ($val,$unit) = $e->text =~ /([+-]?(?:\d*\.)?\d+)(MB|GB) used/ or die "Couldn't parse '".$e->text."'"; dd $val, $unit; } __END__ "54MB used" (54, "MB")
Re: Regex problem
by Corion (Patriarch) on May 25, 2017 at 09:42 UTC

    I would restructure the code to be less clever but easier to debug:

    if( $html !~ m!<span class="remaining-data">\s*(.*?)used\s*</span>! ) +{ die "Couldn't find 'remaining-data' span in html!"; }; my $remaining = $1; if( $remaining !~ m!^(\d+(?:\.\d+))(MB|GB)$! ) { die "Weirdo string instead of remaining data: [$remaining]"; }; my ($DataUsed, $unit) = ($1,$2);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1191188]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-16 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found