Re^2: getting content of an https website

Thanks, tangent, that's got it. With a little help from HTML::Tree, this suffices:

  use strict;
  use warnings;
  use feature 'say';
  use LWP::UserAgent;
  use HTML::Tree;

  my $url = 'https://berniesanders.com/issues/racial-justice/';
  my $ua  = LWP::UserAgent->new();
  $ua->agent(
    'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Fire
+fox/31.0'
  );
  my $response = $ua->get($url);
  my $content  = $response->content;
  if ( $content =~ m/enemy/i ) {
    say "enemy found";
  }
  else {
    my $tree = HTML::Tree->new();
    $tree->parse($content);
    print $tree->as_text;
  }
[download]

I've seen code like this before, and I thought I actually needed to have the browser in question, but apparently not. Am I correct to think that that string need to have nothing to do with the actual machine it runs on? Does the string you used make a good overall choice for such queries?

I'd like to consider a related question, given that we're barely warmed up here. I've always wanted the funtionality of having mechanized events happen and then having an actual browser opened. I don't know if one browser works better than another for this, but I use Chrome for most of my day-in and day-out surfing, viewing or whatever. Clearly, I would have to define a path to the executable, which I believe is here:

 Directory of C:\Program Files (x86)\Google\Chrome\Application

08/22/2015  03:42 AM    <DIR>          .
08/22/2015  03:42 AM    <DIR>          ..
08/14/2015  12:43 PM    <DIR>          44.0.2403.155
08/22/2015  03:42 AM    <DIR>          44.0.2403.157
08/17/2015  10:23 PM           813,896 chrome.exe
06/03/2013  04:26 PM            18,546 master_preferences
06/19/2014  02:37 AM    <DIR>          Plugins
08/22/2015  03:42 AM               399 VisualElementsManifest.xml
[download]

How might I open the url from the original post in this browser?

Comment on Re^2: getting content of an https website Select or Download Code

Replies are listed 'Best First'.
Re^3: getting content of an https website by Anonymous Monk on Sep 01, 2015 at 03:40 UTC
HTML::Display https://metacpan.org/pod/WWW::Mechanize#mech-agent_alias-alias WWW::UserAgent::Random - Perl extension to generate random User Agent / List of User-Agents (Spiders, Robots, Browser)	[reply]
Re^4: getting content of an https website by Aldebaran (Curate) on Sep 01, 2015 at 07:54 UTC
Thanks AM, I got pretty far with this: use strict; use warnings; use feature 'say'; use HTML::Display; use LWP::UserAgent; my $url = 'https://berniesanders.com/issues/racial-justice/'; my $ua = LWP::UserAgent->new(); $ua->agent( 'Windows Mozilla'); my $response = $ua->get($url); my $content = $response->content; $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Googl +e\Chrome\Application\chrome.exe" %s'; my $browser=HTML::Display->new(); if (defined($browser)) { $browser->display(html=>$content); } else { print("Unable to open browser: $@\n"); } [download] Almost everything gets displayed except the big banner on top and some stylized words at the bottom. The links with absolute urls work, but there seems to be some clunkiness in the forward and back arrows on the browser, when it comes back to the original. And what is the original? In the url it looks like this: `file:///C:/cygwin64/tmp/9EQdRdu_5w.html` I have trouble deciding how "real" this is at all. Tomorrow, I'll try a different site and see what happens. Thank you.	[reply] [d/l] [select]
Re^3: getting content of an https website by Anonymous Monk on Sep 07, 2015 at 17:46 UTC
system($url); will usually do it, depending on how paranoid you are, you might want to ensure that only properly encoded strings are executed.	[reply]


"be consistent"
	PerlMonks