Re^4: Parse html file

I was able to use the HTML::TreeBuilder module to get some structure to the output.

# Parse all of the contents of $file.

my $parser = HTML::TreeBuilder->new ();
$parser->parse_file ($file);

# Now display the contents of $parser.

recurse ($parser, 0);

exit;

# This displays the contents of $node and any children it may
# have. The variable $depth is the indentation used.

sub recurse
{
    my ($node, $depth) = @_;

    # Print indentation according to the level of recursion.

    print "  " x $depth;

    # If $node is a reference, then it is an HTML::Element.

    if (ref $node) {

        # Print the tag associated with $node, for example "html" or
        # "li".

        print $node->tag (), "\n";

        # $node->content_list () returns a list of child nodes of
        # $node, which we store in @children.

        my @children = $node->content_list ();
        for my $child_node (@children) {
            recurse ($child_node, $depth + 1);
        }
    }
    else {

        # If $node is not a reference, then it is just a piece of text
        # from the HTML file.

        print $node, "\n"; 
    }
}
[download]

How can I extract the data from the following tags?

div
      div
        FillDB File Size Limit:
      div
        0.0% ( 0 / 3145728 Bytes )
    div
      div
        FillDB File Count Limit:
      div
        0.0% ( 0 / 10000 Files )
[download]

Comment on Re^4: Parse html file Select or Download Code

Replies are listed 'Best First'.
Re^5: Parse html file by marto (Cardinal) on Aug 01, 2018 at 14:28 UTC
this worked, even on the fragment. If you really wanted to capture 'FillDB File Size Limit:' it'd be trivial to add the required code.	[reply]
Re^6: Parse html file by TonyNY (Beadle) on Aug 01, 2018 at 14:41 UTC
Thanks marto...but I don't have the Mojo::DOM module installed and I cannot install it in my working environment so I am limited to what I can use...	[reply]
Re^7: Parse html file by marto (Cardinal) on Aug 17, 2018 at 06:22 UTC
Sorry this reply has taken so long, it fell off my radar. Technically Mojolicious should install without issue, if this is not the case please report back and I'll try to help out. However I'm guessing you can't install whatever you want because you're working in a restrictive environment, perhaps in terms of some existing business policy. For that there is Yes, even you can use CPAN. My day job is totally disconnected from the internet in a restrictive environment. minicpan is great for creating/maintaining a custom mirror you can transfer into your environment. In addition to this I'm lucky enough to be attending TPC Glasgow 2018, ovid gave a fantastic talk 'Rescuing a Legacy Codebase' (slides) in which he discusses building a business case and how to gain management buy in in terms that managers understand. There's way more in the talk and I'd highly recommend watching all of it. haukex links to the streams here, I overheard that they will be edited down into individual YouTube videos per talk later.	[reply]
Re^7: Parse html file by Anonymous Monk on Aug 17, 2018 at 00:02 UTC
Thanks marto...but I don't have the Mojo::DOM module installed and I cannot install it in my working environment so I am limited to what I can use... If you can't copy code from perlmonks, can't copy code from cpan, cant write the code ... what then?	[reply]
Re^8: Parse html file by marto (Cardinal) on Aug 17, 2018 at 06:23 UTC


Your skill will accomplish what the force of many cannot
	PerlMonks