http://www.perlmonks.org?node_id=1021285


in reply to RegEx: Detecting the certain cyrillic words

#!/usr/bin/perl
use utf8;
use strict;
use warnings FATAL => 'all';
use WWW::Mechanize qw();

my $mech = WWW::Mechanize->new;
$mech->credentials('user' => 'password');
$mech->get('http://www.rambler.ru/');

my ($Кремль) = $mech->content =~ /(Кремль)/i;
You very likely want to use Web::Query to dissect your HTML instead of regex, or at least match against the HTML-stripped text version of the document.
  • Comment on Re: RegEx: Detecting the certain cyrillic words

Replies are listed 'Best First'.
Re^2: RegEx: Detecting the certain cyrillic words
by programmer.perl (Beadle) on Mar 01, 2013 at 17:10 UTC

    My whole code is, but there is no result:

    #!usr/bin/perl -w
    use utf8;
    use strict;
    use warnings FATAL => 'all';
    use WWW::Mechanize qw();

    my $mech = WWW::Mechanize->new;
    $mech->credentials('user' => 'pass');
    $mech->get('http://example.ru/');

    my $content = $mech->text();
    $content =~ s/\n\r//g;

    print $1,"\n" if $content =~ /(\bЖелаю.*\!\b)(.*)/i;

    Enough codes make shapes.
      Have you saved the script as utf-8?
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Yes, I saved the script as utf-8. I'm using gedit 3.4.1 and character encoding is "Current Locale UTF-8"
        Enough codes make shapes. (Hamidjon)
Re^2: RegEx: Detecting the certain cyrillic words
by programmer.perl (Beadle) on Mar 01, 2013 at 16:55 UTC
    I wrote code as you show, but command line didn't give any result... instead of my ($&#1050;&#1088;&#1077;&#1084;&#1083;&#1100;) = $mech->content =~ /(&#1050;&#1088;&#1077;&#1084;&#1083;&#1100;)/i; I wrote print $1,"\n" if $mech->content =~ /(&#1046;&#1077;&#1083;&#1072;&#1102;.*)(\<.*)/i; Characters here not show as a Cyrillic.
    Enough codes make shapes.