Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Targetted Web Searching on the Client Side: A Little Programming Knowledge Can Save a Lot of Time

by jonadab (Parson)
on Oct 23, 2010 at 00:34 UTC ( #866907=CUFP: print w/ replies, xml ) Need Help??

Okay, here's the background: there's a website that I use, which in general is quite good and very useful. It's called Lang-8. The basic idea is, you write journal entries in the language you're studying, and native speakers post comments and corrections. In turn, you post comments and corrections to entries they've written in your native language. The idea is good, and the site has a lot of really useful features.

One feature it doesn't have, unfortunately, is a really good search capability...

In particular, I wanted to be able to search through the comments and corrections I've made in the past. When you're working with people coming to English from the same liguistic background, they tend to make some of the same mistakes (e.g., Japanese people seem to have trouble learning the correct use of the English phrase "after all", which, admittedly, is somewhat idiomatic), so several times I've run into situations where I remembered having explained a particular thing in some detail before, with examples. Being the lazy person that I am, I wanted to have a look at that previous explanation and possibly copy and paste some or all of it in response to someone else who was asking about the same thing, or who made the same mistake.

So I wanted to search my past corrections and comments, but the site doesn't seem to have a way to do that. I can search my own journal entries, but that doesn't solve my problem. I thought about Google's site-specific search, but privacy features prevent most of the journal entries, and the comments on them, from being visible to the world; Google, from the site's perspective, is the world.

So I used my virtue of laziness to create a way to quickly search through my past comments and corrections...

#!/usr/bin/perl # -*- cperl -*- use Data::Dumper; use WWW::Mechanize; use HTML::TreeBuilder; my $email = 'username' . '@' . 'example.net'; my $pass = 'censored'; my (@substring) = @ARGV; if (scalar @substring) { print "Looking for " . @substring . " strings.\n"; } else { die "You must specify one or more strings to look for.\n"; } my $mech = WWW::Mechanize->new(); $mech->get('http://lang-8.com/login'); $mech->submit_form(form_number => 2, fields => {username => $email, password => $pass,}); my ($page, $done, @pagetosearch) = (1, 0); while (not $done) { print "Fetching page $page...\n"; $mech->get("http://lang-8.com/journals/joined?page=$page"); my $content = $mech->content(); open OUT, '>', 'tempfile.html'; print OUT Dumper($content); close OUT; my $tree = HTML::TreeBuilder->new(); $tree->parse_file('tempfile.html'); my @entry = $tree->look_down('_tag' => 'h3', "class" => 'journal_title',); my @url = map { $_->look_down('_tag' => 'a')->attr('href'); } @entry +; if (scalar @url) { print " * Found " . @url . " journal entries.\n"; push @pagetosearch, @url; sleep 1; ++$page; } else { ++$done; }} for my $url (@pagetosearch) { print "Checking $url\n"; $mech->get($url); my $content = $mech->content(); for my $str (@substring) { my @match = $content =~ /([^<>]*${str}[^<>]*)/sg; print " * Found $str: $_\n" for @match; } select undef, undef, undef, 0.1; }

One screenfull of easy code, and my computer is pointing me right to my previous explanation. The first time I used it, it saved me more time than it took to write it, and I know I'll be using this one again and again and again.

-- 
We're working on a multi-year set of freely redistributable Vacation Bible School materials.

Comment on Targetted Web Searching on the Client Side: A Little Programming Knowledge Can Save a Lot of Time
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://866907]
Approved by McDarren
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (11)
As of 2014-07-23 07:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (135 votes), past polls