Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

last hour of cb

by Tanktalus (Canon)
on Jan 26, 2007 at 20:59 UTC ( #596792=document: print w/ replies, xml ) Need Help??

Updates more-or-less every 5 minutes (except when there is no activity). Extracted from Tanktalus' CB Stats' database. Feedback
Shows the last hour or so, but never more than will fit in 64k, nor over two hours. Other sources of cb history
Last update: Jun 24, 2016 at 23:45 UTC

Previous message (not shown): Jun 24, 2016 at 22:06 UTC
[Lady_Aleena]What is the best/easiest to understand web scraping module?
[Lady_Aleena]s/web/html/;
[choroba]you can imagine WWW::Mechanize as a simple "browser"
[Lady_Aleena]I'm trying to get data out of tables on html pages.
[choroba]HTML::TableExtract then?
[Lady_Aleena]This is a big learning curve.
[Lady_Aleena]Lady_Aleena growls at the lack of info in the doc.
[runrig]LWP::Simple if you just have a web page to 'get', and pass the contents to HTML::TableExtract
[Lady_Aleena]I have not looked at a new to me module in about a year.
[Lady_Aleena]I forogt just how sparse the documentation usually is, and how much I would have to trace out to find out what things do.
[Lady_Aleena]Is it just me or is the code in the synopsis not strict?
[runrig]Synopsis code is often not strict
[choroba]yes, it's not. It's from 2000, anyway
[runrig]it's up to you to put in the 'use strict' and the my's
[Lady_Aleena]I hope HTML::TableExtract installs all its dependencies.
[$h4X4_|=73}{]Hmmmm....
[runrig]it's only prereq is HTML::Parser, so it's then up to HTML::Parser to list all it's dependencies
[runrig]Assuming you're using cpan or cpanm to install
[Marshall]On Actiee State HTML::Parser comes with the distribution. You dist may already have it installed?
[runrig]cpanm++
[Lady_Aleena]I use cpan.
[Marshall]just installed this thing and it need HTML-Element-Extended-1.18 also.
[Lady_Aleena]Lady_Aleena head desks.

↑Previous Hour↑
↓Current Hour↓

[Lady_Aleena]Do I have to open the file?
[Lady_Aleena]It doesn't say.
[runrig]That seems to be an optional dependency
[Lady_Aleena]I can't get it to work.
[runrig]If the html is in a file, you pass the file to the 'parse_file' method
[runrig]the file name, that is
[Lady_Aleena]It doesn't die when the path to the file is wrong.
[runrig]I never noticed that...
[Lady_Aleena]no file should mean death.
[runrig]you can <code>$p->parse_file($file) or die "Error parsing file: $!";
[Lady_Aleena]Oh, I had to use the exact path to the file, so now I'm trying to figure out how to get the data now.
[runrig]and then, if you call $p->eof to abort parsing, then parse_file will return false anyway, in which case you probably don't want to die.
[Lady_Aleena]This module collects more data than I need.
[runrig]but then, you're using HTML::TableExtract and so probably wouldn't be calling eof() anyway, so nevermind
[runrig]You can configure it to only return selected columns from selected tables.
[runrig]So, anybody up late celebrating/mourning the Brexit?
[Lady_Aleena]I know the stock markets around the world are suffering because of Brexit. (goes back to data diving for the rows)
[Marshall]The pound is at a 30 year low. Maybe time for a holiday in England?
[Lady_Aleena]I can't figure what is being escaped in the returned data on my scratchpad.
[Marshall]These folks may not be so happy once the economic reality sets in. UK was far better off in the EU.
[runrig]should've sold all your pounds for gold...
[RonW]Has Article 50 been invoked, yet?
[Lady_Aleena]Looks like I have a lot of chomping to do with the returned data too.
[Marshall]Article 50 is a next step - this all takes time. The vote was just advisory. Now the implementation must start.
[RonW]I often find s/[\r\n]+$// more useful than chomp
[RonW]I recall Cammeron stating he was going to invoke Article 50 before resigning, but have not heard of he actually did that
[Marshall]Cammeron says he's staying until ~Oct.
[runrig]I usually just s/\s+$//
[Marshall]whether he lasts that long, remains to be seen.
[RonW]All I heard was he resigned. Didn't hear the detaills
[Lady_Aleena]The data format returned is confusing
[RonW]Sometimes I want the spaces/tabs but not the line endings
[Marshall]Calling for this referendum was a HUGE mistake on his part. Should have never even allowed the vote.
[Marshall]Yep, Mr Cameron is now gone. this Oct idea didn't last long!
[Lady_Aleena]WTH?!?!?! Why is the content of the rows being escaped like \'Cyrus&#65533;',?
[Marshall]Lady_Aleena good luck! Been years since I messed with LWP stuff. It can get hairy.
[runrig]LA: Did you read the docs? Are you looping through the tables and/or rows? Or are you just dumping the result of parse_file()?
[Lady_Aleena]$row->[0] returns something which looks like SCALAR(0x9212464)
[runrig]There are no references if you loop through tables() and rows() as it says in the docs.
[Lady_Aleena]runrig, I"m looping through the grids.
[Lady_Aleena]runrig, it looks like I have tables in tables instead of one big table.
[Lady_Aleena]The documentation is very murky.
[Lady_Aleena]The top of my scratchpad is as far as I've gotten.
[RonW]Aleena, looks like you need to run the strings you get through HTML entity decoding (don't remember how to do that, though)
[Lady_Aleena]I don't think this is going to work. The tables I'm trying to scrape are a mess.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2016-06-25 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred method of making French fries (chips) is in a ...











    Results (323 votes). Check out past polls.