Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^4: Parse html file

by TonyNY (Beadle)
on Aug 01, 2018 at 14:15 UTC ( [id://1219651]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Parse html file
in thread Parse html file

I was able to use the HTML::TreeBuilder module to get some structure to the output.

# Parse all of the contents of $file. my $parser = HTML::TreeBuilder->new (); $parser->parse_file ($file); # Now display the contents of $parser. recurse ($parser, 0); exit; # This displays the contents of $node and any children it may # have. The variable $depth is the indentation used. sub recurse { my ($node, $depth) = @_; # Print indentation according to the level of recursion. print " " x $depth; # If $node is a reference, then it is an HTML::Element. if (ref $node) { # Print the tag associated with $node, for example "html" or # "li". print $node->tag (), "\n"; # $node->content_list () returns a list of child nodes of # $node, which we store in @children. my @children = $node->content_list (); for my $child_node (@children) { recurse ($child_node, $depth + 1); } } else { # If $node is not a reference, then it is just a piece of text # from the HTML file. print $node, "\n"; } }

How can I extract the data from the following tags?

div div FillDB File Size Limit: div 0.0% ( 0 / 3145728 Bytes ) div div FillDB File Count Limit: div 0.0% ( 0 / 10000 Files )

Replies are listed 'Best First'.
Re^5: Parse html file
by marto (Cardinal) on Aug 01, 2018 at 14:28 UTC

    this worked, even on the fragment. If you really wanted to capture 'FillDB File Size Limit:' it'd be trivial to add the required code.

      Thanks marto...but I don't have the Mojo::DOM module installed and I cannot install it in my working environment so I am limited to what I can use...

        Sorry this reply has taken so long, it fell off my radar. Technically Mojolicious should install without issue, if this is not the case please report back and I'll try to help out. However I'm guessing you can't install whatever you want because you're working in a restrictive environment, perhaps in terms of some existing business policy. For that there is Yes, even you can use CPAN. My day job is totally disconnected from the internet in a restrictive environment. minicpan is great for creating/maintaining a custom mirror you can transfer into your environment.

        In addition to this I'm lucky enough to be attending TPC Glasgow 2018, ovid gave a fantastic talk 'Rescuing a Legacy Codebase' (slides) in which he discusses building a business case and how to gain management buy in in terms that managers understand. There's way more in the talk and I'd highly recommend watching all of it. haukex links to the streams here, I overheard that they will be edited down into individual YouTube videos per talk later.

        Thanks marto...but I don't have the Mojo::DOM module installed and I cannot install it in my working environment so I am limited to what I can use...

        If you can't copy code from perlmonks, can't copy code from cpan, cant write the code ... what then?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1219651]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-19 21:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found