Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Updated: Looking for something like DBD::HTML::Table

by talexb (Canon)
on Feb 26, 2021 at 21:07 UTC ( #11128862=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking into a couple of different solutions for a problem, and one solution involves using an imaginary module called DBD::HTML::Table to load up a web page containing a big table. It would be smart enough to look at the top row for the column names, and the first column for each row's index value.

I've just had a stroll through http://metacpan.org, and I didn't see anything like that. Is it cunningly hidden, or does it not exist at all?

Update: Thanks for all of your thoughtful replies. I had a look at the HTML that was being generated by our internal CGI, and found that it was really, really easy to just write a very simple parser. Each opening and closing tr was on a line by itself, and t[dh] elements were either on a line by themselves (open, element, close) or they were in an easily grabbable format (open, elements, and close each on their own separate lines).

I understand that my initial question was vague -- I was still working out what my solution might look like. I now have a much better idea of what the process is going to look like. Ideally, it's going to be something that will be as automated as possible. Sorry if this all sounds vague, it's work related, so I need to be a little circumspect about how I describe the problem. :)

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Replies are listed 'Best First'.
Re: Looking for something like DBD::HTML::Table
by Tux (Canon) on Feb 27, 2021 at 09:12 UTC

    AnyData::Format::HTMLtable claims to support DBI directly:

    use DBI; my $dbh = DBI->connect ("dbi:AnyData:"); $dbh->func ("table1", "HTMLtable", $filename, "ad_catalog"); my $hits = $dbh->selectall_arrayref ("select name from foo where bar = + 42"); # ... other DBI/SQL operations

    I never used it, but it sounds more or less what you are looking for.


    Enjoy, Have FUN! H.Merijn
Re: Looking for something like DBD::HTML::Table
by marto (Cardinal) on Feb 26, 2021 at 21:24 UTC

    What I've done is use a mojolicious backend, rendering a template which includes the JavaScript datatables (the data source being JSON delivered by Mojolicious) this both renders and scales well, and has the benefit of users being able to search within the results, sort by column etc. Perhaps this is along the lines of what you had in mind?

Re: Looking for something like DBD::HTML::Table
by no longer just digit (Beadle) on Feb 26, 2021 at 23:02 UTC

      Oooh .. this looks like the very thing. Thanks!

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Looking for something like DBD::HTML::Table
by Fletch (Chancellor) on Feb 26, 2021 at 21:41 UTC

    Question fuzzy, but random thought: maybe use Mojo::DOM (or whatever HTML parser you know) to scrape the table in question into a CSV format then pull things out with DBD::CSV behind DBI?

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Looking for something like DBD::HTML::Table
by erix (Prior) on Feb 27, 2021 at 08:37 UTC

    I have no solution really. In my experience html is a bit too variable. I can't find anything DBI-ey (also looked in https://pgxn.org/ - no luck ). Of course, to get database-access you could slurp either html (via curl) or cleaned-up text (via links -dump) into a table but they'd just be 'raw' lines that you'd still have to select the correct table rows from. Still, for well recognizable/greppable rows it might work. And anyway, it is a reminder that postgresql's COPY knows how to read input from another program's STDOUT.

    create table temp_slurps (line text); copy temp_slurps ( line ) from program 'links -dump -width 512 ${url}' +; select * from temp_slurps ; -- where ...

    As they say, YMMV. I'm sure if you write a postgres extension (for pgxn.org) to extract-'read' html-tables from source it will be popular ;)

Re: Looking for something like DBD::HTML::Table
by LanX (Sage) on Feb 26, 2021 at 21:31 UTC
    It's not clear to me if you just want to extract data from a HTML page or use a HTML-file as a database (i.e. with write updates)

    If the latter is the case, you might be looking for XML based solutions.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11128862]
Approved by marto
Front-paged by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2021-09-19 13:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?