alternative you should be able to use the mojo based solution from earlier as a starting point to get just what you need. If you have any problems with that just post and I'll take a look.
OP seems to have found what he wanted, so I thought I might use the opportunity to ask marto (or anyone else who can bake from scratch with mojo) to further explore the script he posted in Re^5: polishing up a json fetching script for weather data. It might be an improvement to a script that marto characterized as sub optimal. I certainly hope that we don't optimize away the comments and break up the logic as opposed to having just a train of arrows that online sources may have, with words whose provenance is unknown, like top in this example:
# JSON POST (application/json) with TLS certificate authentication
my $tx = $ua->cert('tls.crt')->key('tls.key')->post('https://example.c
+om' => json => {top => 'secret'});
or json, there's nothing that makes keywords stand out, and where does one go to determine their provenance? How exactly are you going to disambiguate 'json'? The above came from link to Mojo/UserAgent. I understand that examples are selected for brevity. I would love to see a cache of them with many authors.
It seemed to me that having to hardcode the movie title like this was an area that can be improved.
my $imdburl = 'http://www.imdb.com/search/title?title=Caddyshack';
I couldn't get titles with multiple words to work at all. The search replaces spaces with plusses in the url, but interpolation with a lexical variable is just beneath mojo, even if it worked, which it doesn't. What I want is a script that shows me what's at this site from a mojo point of view, and this does so naively:
#!/usr/bin/perl
use strict;
use warnings;
use Mojo::URL;
use Mojo::Util qw(dumper);
use Mojo::UserAgent;
use Data::Dump;
use Log::Log4perl;
use 5.016;
use Mojo::DOM;
my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf";
my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf";
#Log::Log4perl::init($log_conf3); #debug
Log::Log4perl::init($log_conf4); #info
my $logger = Log::Log4perl->get_logger();
$logger->info("$0");
# pretend to be a browser
my $uaname =
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G
+ecko) Chrome/40.0.2214.93 Safari/537.36';
my $ua = Mojo::UserAgent->new;
$ua->max_redirects(5)->connect_timeout(20)->request_timeout(20);
$ua->transactor->name($uaname);
my $first_title = 'Virgin+River';
my $imdburl = "http://www.imdb.com/search/title?title=$first_title";
say "imdburl is $imdburl";
# find search results
my $dom = $ua->get($imdburl)->res->dom;
my @nodes = @$dom;
# c-style for is good for array output with index
for ( my $i = 0 ; $i < @nodes ; $i++ ) {
$logger->info("i is $i ==============");
$logger->info("$nodes[$i]");
}
sleep 2; #good hygiene
__END__
What does it show?
2020/12/31 13:53:39 INFO i is 1 ==============
2020/12/31 13:53:39 INFO <!DOCTYPE html>
2020/12/31 13:53:39 INFO i is 2 ==============
2020/12/31 13:53:39 INFO
First looks right...second is empty...
The 3rd contains 61 k of javascript hell. The 4th and ultimate was empty. Javascript isn't meant for human eyes, or let me be specific, I find it illegible, so I used the browser tools to look closer. I realize that I simply don't understand the javascript, and that's not mojo's fault. The browser tools give me this upon inspection and right click inside the search box:
<input type="text" value="" autocomplete="off" aria-autocomplete="list
+" aria-controls="react-autowhatever-1" class="imdb-header-search__inp
+ut GVtrp0cCs2HZCo7E2L5UU react-autosuggest__input" id="suggestion-sea
+rch" name="q" placeholder="Search IMDb" autocapitalize="none" autocor
+rect="off"
Then I remembered that you can use mojo to do this instead:
$ mojo get https://www.imdb.com/ '*' attr id >1.txt
$ grep search 1.txt
navSearch-searchState
suggestion-search-container
nav-search-form
navbar-search-category-select
navbar-search-category-select-contents
suggestion-search
suggestion-search-button
imdbHeader-searchClose
imdbHeader-searchOpen
$
Now I thought I was really in hot pursuit. I thought, "aha, I can find this id and post to it." So I go to find find in Mojo::Dom, and I don't really understand the examples until I can work them myself and see them:
$ ./1.dom.pl
./1.dom.pl
123
Test
123
a
b
b
a
a:Test
b:123
<p id="a">Test</p><p id="b">123</p><p id="d">789</p><p id="c">456</p>
$ cat 1.dom.pl
#!/usr/bin/perl
use strict;
use warnings;
use Mojo::URL;
use Mojo::Util qw(dumper);
use Mojo::UserAgent;
use Data::Dump;
use Log::Log4perl;
use 5.016;
use Mojo::DOM;
my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf";
my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf";
#Log::Log4perl::init($log_conf3); #debug
Log::Log4perl::init($log_conf4); #info
my $logger = Log::Log4perl->get_logger();
$logger->info("$0");
# pretend to be a browser
my $uaname =
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G
+ecko) Chrome/40.0.2214.93 Safari/537.36';
my $ua = Mojo::UserAgent->new;
$ua->max_redirects(5)->connect_timeout(20)->request_timeout(20);
$ua->transactor->name($uaname);
## example from https://docs.mojolicious.org/Mojo/DOM
#use Mojo::DOM;
# Parse
my $dom = Mojo::DOM->new('<div><p id="a">Test</p><p id="b">123</p></di
+v>');
# Find
say $dom->at('#b')->text;
say $dom->find('p')->map('text')->join("\n");
say $dom->find('[id]')->map( attr => 'id' )->join("\n");
# Iterate
$dom->find('p[id]')->reverse->each( sub { say $_->{id} } );
# Loop
for my $e ( $dom->find('p[id]')->each ) {
say $e->{id}, ':', $e->text;
}
# Modify
$dom->find('div p')->last->append('<p id="c">456</p>');
$dom->at('#c')->prepend( $dom->new_tag( 'p', id => 'd', '789' ) );
$dom->find(':not(p)')->map('strip');
# Render
say "$dom";
__END__
$ ./4.dom.pl
./4.dom.pl
<h1>Test</h1>
bar
bar
foo
baz
=====
comment
doctype
pi
text
root
tag
text
$ cat 4.dom.pl
#!/usr/bin/perl
use strict;
use warnings;
use Mojo::URL;
use Mojo::Util qw(dumper);
use Mojo::UserAgent;
use Data::Dump;
use Log::Log4perl;
use 5.016;
use Mojo::DOM;
my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf";
my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf";
#Log::Log4perl::init($log_conf3); #debug
Log::Log4perl::init($log_conf4); #info
my $logger = Log::Log4perl->get_logger();
$logger->info("$0");
## examples from https://docs.mojolicious.org/Mojo/DOM
my $dom7 = Mojo::DOM->new();
my $str7 =
$dom7->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->prev
+ious;
$logger->info($str7);
# "bar"
my $dom8 = Mojo::DOM->new();
my $str8 = $dom8->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;
say "$str8";
$logger->info($str8);
# "foo\nbaz\n"
my $dom9 = Mojo::DOM->new();
my $str9 = $dom9->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')-
+>text;
$logger->info($str9);
$logger->info('=====');
my $dom1 = Mojo::DOM->new();
my $str1 = $dom1->parse('<!-- Test -->')->child_nodes->first->type;
$logger->info($str1);
# "doctype"
$str1 = $dom1->parse('<!DOCTYPE html>')->child_nodes->first->type;
$logger->info($str1);
# "pi"
$str1 = $dom1->parse('<?xml version="1.0"?>')->child_nodes->first->typ
+e;
$logger->info($str1);
$str1 =
$dom1->parse('<title>Test</title>')->at('title')->child_nodes->first
+->type;
$logger->info($str1);
$str1 = $dom1->parse('<p>Test</p>')->type;
$logger->info($str1);
$str1 = $dom1->parse('<p>Test</p>')->at('p')->type;
$logger->info($str1);
$str1 = $dom1->parse('<p>Test</p>')->at('p')->child_nodes->first->type
+;
$logger->info($str1);
__END__
$
Finally, I got a usage for find that worked:
$ ./2.dom.pl
./2.dom.pl
ads_tarnhelm ads_doWithAds ads_monitoring_setup ads_safeframe_setup ad
+s_general_setup IMDbHomepageSiteReactViews imdbHeader nblogin imdbHea
+der-navDrawerOpen imdbHeader-navDrawerOpen--desktop imdbHeader-navDra
+wer nav-link-categories-mov nav-link-categories-tvshows nav-link-cate
+gories-video nav-link-categories-awards nav-link-categories-celebs na
+v-link-categories-comm home_img_holder home_img navSearch-searchState
+ suggestion-search-container nav-search-form navbar-search-category-s
+elect navbar-search-category-select-contents suggestion-search sugges
+tion-search-button imdbHeader-searchClose imdbHeader-searchOpen ipc-s
+vg-gradient-tv-logo-t ipc-svg-gradient-tv-logo-v ipc-wrap-background-
+id inline20_wrapper placeholderPattern b a b a b a b a b a b a b a in
+line40_wrapper placeholderPattern from-your-watchlist fan-picks tecon
+sent ftr__a ftr__c ftr__e ftr__g ftr__i ftr__k ftr__m ftr__o ftr__q f
+tr__s ftr__u ftr__w ftr__y ftr__A ftr__C ftr__E ftr__G ftr__b ftr__d
+ftr__f ftr__h ftr__j ftr__l ftr__n ftr__p ftr__r ftr__t ftr__v ftr__x
+ ftr__z ftr__B ftr__D ftr__F ftr__H ipc-svg-gradient-tv-logo-t ipc-sv
+g-gradient-tv-logo-v ipc-svg-gradient-tv-logo-t ipc-svg-gradient-tv-l
+ogo-v be
$ cat 2.dom.pl
#!/usr/bin/perl
use strict;
use warnings;
use Log::Log4perl;
use 5.016;
use Mojo::DOM;
use Mojo::UserAgent;
my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf";
my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf";
#Log::Log4perl::init($log_conf3); #debug
Log::Log4perl::init($log_conf4); #info
my $logger = Log::Log4perl->get_logger();
$logger->info("$0");
# represent $0 as browser to server
my $uaname =
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G
+ecko) Chrome/40.0.2214.93 Safari/537.36';
my $ua = Mojo::UserAgent->new;
$ua->max_redirects(5)->connect_timeout(20)->request_timeout(20);
$ua->transactor->name($uaname);
## main page of imdb contains search box
my $imdburl = "http://www.imdb.com/";
## example from https://docs.mojolicious.org/Mojo/DOM
my $dom = $ua->get($imdburl)->res->dom;
# say "$dom"; works
#
my @ids= $dom->find('[id]')->map(attr => 'id')->each;
$logger->info("@ids");
__END__
$
Anyways, this was my final push and I seem to come up short:
$ ./2.1.dom.pl
./2.1.dom.pl
navSearch-searchState suggestion-search-container nav-search-form navb
+ar-search-category-select navbar-search-category-select-contents sugg
+estion-search suggestion-search-button imdbHeader-searchClose imdbHea
+der-searchOpen
Can't locate object method "find" via package "Mojo::UserAgent" at ./2
+.1.dom.pl line 48.
$ cat 2.1.dom.pl
#!/usr/bin/perl
use strict;
use warnings;
use Log::Log4perl;
use 5.016;
use Mojo::DOM;
use Mojo::UserAgent;
use Mojo::URL;
use Mojo::Util qw(trim);
my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf";
my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf";
#Log::Log4perl::init($log_conf3); #debug
Log::Log4perl::init($log_conf4); #info
my $logger = Log::Log4perl->get_logger();
$logger->info("$0");
# represent $0 as browser to server
my $uaname =
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G
+ecko) Chrome/40.0.2214.93 Safari/537.36';
my $ua = Mojo::UserAgent->new;
$ua->max_redirects(5)->connect_timeout(20)->request_timeout(20);
$ua->transactor->name($uaname);
## main page of imdb contains search box
my $imdburl = "http://www.imdb.com/";
## example from https://docs.mojolicious.org/Mojo/DOM
my $dom = $ua->get($imdburl)->res->dom;
# say "$dom"; works
#
my @ids = $dom->find('[id]')->map( attr => 'id' )->each;
#$logger->info("@ids");
my @matches = grep { /search/ } @ids;
$logger->info("@matches");
my $vid = 'Virgin River';
$ua->post( $imdburl => form => { 'suggestion-search' => $vid } );
# assume first match
my $filmurl = $ua->find('a[href^=/title]')->first->attr('href');
# extract film id
my $filmid = Mojo::URL->new($filmurl)->path->parts->[-1];
# get details of film
$dom = $ua->get("https://www.imdb.com/title/$filmid/")->res->dom;
# print film details
say trim( $dom->at('div.title_wrapper > h1')->text ) . ' ('
. trim( $dom->at('#titleYear > a')->text ) . ')';
# print actor/character names
foreach my $cast ( $dom->find('table.cast_list > tr:not(:first-child)'
+)->each )
{
say trim ( $cast->at('td:nth-of-type(2) > a')->text ) . ' as '
. trim( $cast->at('td.character')->all_text );
}
__END__
$
These are resources I drew from:
Thanks for comments,