Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

RegEx: Detecting the certain cyrillic words

by programmer.perl (Beadle)
on Mar 01, 2013 at 15:19 UTC ( #1021282=perlquestion: print w/ replies, xml ) Need Help??
programmer.perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm having trouble with matching the Cyrillic words. I want to find some needed words (words written by cyrillic fonts) in a website. I don't know how to write word 'желаю' to the regex // brackets (see my code example). I want to match whole word between the HTML brackets (<>). Source look like this: <br />CYRILLIC TEXT<br /> My code is:
!usr/bin/perl -w use LWP::UserAgent; $ua = LWP::UserAgent->new; $req = HTTP::Request->new(GET => 'http://anyrussiansite.ru/'); $req->authorization_basic('user', 'password'); $content_of_cpasar = $ua->request($req)->as_string; $content_of_cpasar =~ s/[\n\r]//g; print "Found ",$&,"\n" if $content_of_cpasar =~ //i;
Enough codes make shapes.

Comment on RegEx: Detecting the certain cyrillic words
Select or Download Code
Re: RegEx: Detecting the certain cyrillic words
by daxim (Chaplain) on Mar 01, 2013 at 15:38 UTC
    #!/usr/bin/perl
    use utf8;
    use strict;
    use warnings FATAL => 'all';
    use WWW::Mechanize qw();
    
    my $mech = WWW::Mechanize->new;
    $mech->credentials('user' => 'password');
    $mech->get('http://www.rambler.ru/');
    
    my ($Кремль) = $mech->content =~ /(Кремль)/i;
    
    You very likely want to use Web::Query to dissect your HTML instead of regex, or at least match against the HTML-stripped text version of the document.
      I wrote code as you show, but command line didn't give any result... instead of my ($&#1050;&#1088;&#1077;&#1084;&#1083;&#1100;) = $mech->content =~ /(&#1050;&#1088;&#1077;&#1084;&#1083;&#1100;)/i; I wrote print $1,"\n" if $mech->content =~ /(&#1046;&#1077;&#1083;&#1072;&#1102;.*)(\<.*)/i; Characters here not show as a Cyrillic.
      Enough codes make shapes.

      My whole code is, but there is no result:

      #!usr/bin/perl -w
      use utf8;
      use strict;
      use warnings FATAL => 'all';
      use WWW::Mechanize qw();

      my $mech = WWW::Mechanize->new;
      $mech->credentials('user' => 'pass');
      $mech->get('http://example.ru/');

      my $content = $mech->text();
      $content =~ s/\n\r//g;

      print $1,"\n" if $content =~ /(\bЖелаю.*\!\b)(.*)/i;

      Enough codes make shapes.
        Have you saved the script as utf-8?
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021282]
Approved by tmharish
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2015-07-06 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (84 votes), past polls