Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

replace all numerics to words in a txt file

by imhacked (Initiate)
on Sep 22, 2017 at 06:22 UTC ( [id://1199880]=perlquestion: print w/replies, xml ) Need Help??

imhacked has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to replace all numerics to words in a very large txt file. Example as follows Sentence : I am going to 44 avenue. Result : I am going to forty four avenue. I have written as follows :

use Lingua::EN::Numbers qw(num2en num2en_ordinal); if(isdigit (\w) s/(\w)/num2en(\w)/g;

But I get following error :

Unquoted string "w" may clash with future reserved word at tocorpus.pl + line 21. syntax error at tocorpus.pl line 22, near ") s/(\w)/num2en(\w)/g" Execution of tocorpus.pl aborted due to compilation errors

Replies are listed 'Best First'.
Re: replace all numerics to words in a txt file
by choroba (Cardinal) on Sep 22, 2017 at 06:33 UTC
    1. When using capture groups, refer to them via $1, $2 etc.
    2. To evaluate code in the replacement part of substitution, use /e .
    3. You probably need + to match numbers > 9:
      s/(\w+)/num2en($1)/ge
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: replace all numerics to words in a txt file
by Laurent_R (Canon) on Sep 22, 2017 at 09:26 UTC
    In addition to choroba's comments, I would say that this code line:
    if(isdigit (\w)
    has several syntax mistakes. First, there is no closing parenthesis for the if conditional. Then, there is no opening curly brace for the code to execute if the conditional returns a true value. And finally, the (\w) argument doesn't make much sense outside of a regex.

    But you probably don't need this conditional anyway, since your substitution on the next line can be modified to match only numbers, with something like this (untested):

    s/(\d+)/num2en($1)/ge;
      Indeed! And at first sight I was surprised not to see the usual unmatched bracket message, though most likely it is giving up further checking before the match brackets check when finding \w to be a bareword.

      One world, one people

Re: replace all numerics to words in a txt file -- oneliner
by Discipulus (Canon) on Sep 22, 2017 at 08:21 UTC
    hello imhacked and welcome to the monastery and to the wonderful world of Perl!

    choroba gave you the solution: but why use isdigit when you can already match directly digits with \d classe in regexes?

    I gave a try to this never used by me module ending with the below oneliner: For sure is not the best thing ever seen ( PS indeed! see tremendously wise AnomalousMonk below) but you can play with it (be aware of the win32 double quotes!):

    echo 1st rule: 2nd, 3rd and 4th floor must be free in 5 min. Not in 6. + Call 911 for emergencies | perl -MLingua::EN::Numbers="num2en,num2en_ordinal" -pne "s/(\d+)([st|n +d|rd|th])+/num2en_ordinal($1)/ge;s/(\d+)/num2en($1)/ge" first rule: second, third and fourth floor must be free in five min. N +ot in six. Call nine hundred and eleven for emergencies

    You can use MO=Deparse to see the above a bit expanded:

    perl -MO=Deparse -pne "s/(\d+)([st|nd|rd|th])+/num2en_ordinal($1)/ge; +s/(\d+)/num2en($1)/ge" LINE: while (defined($_ = <ARGV>)) { s/(\d+)([st|nd|rd|th])+/num2en_ordinal($1);/eg; s/(\d+)/num2en($1);/eg; } continue { die "-p destination: $!\n" unless print $_; } -e syntax OK

    PPS maybe this is better regex (i'm so rusty..), the char class [..] was totally misplaced, but being all look around assertions zero width ones ( $& not filled ) a simple capturing group seems to work:

    s/(\d+)(st|nd|rd|th)/num2en_ordinal($1)/ge

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      s/(\d+)([st|nd|rd|th])+/num2en_ordinal($1)/ge;

      The  | ordered alternation metacharacter is not meta in a character class. The character class [st|nd|rd|th] is equivalent to [stndrh|] (update: because repeated characters have no special significance in a class). The quantified capture group ([st|nd|rd|th])+ matches any of the characters in the [stndrh|] class one or more times and captures the last such character matched. Try matching against  '2ds' '33hn' '123|s' and a few other such strings. You may want something like the true alternation (?:st|nd|rd|th) possibly supported by some look-around assertions.

      c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '1ds rule: 22hn, 3rd and 123|sdnfloor'; ;; printf qq{'$&' } while $s =~ m{(\d+)([st|nd|rd|th])+}xmsg; " '1ds' '22hn' '3rd' '123|sdn'

      Update: Maybe something like this:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $rx_ordinal_indicator = qr{ (?<! [[:alpha:]]) (?: st | nd | rd | th) (?! [[:alpha:]]) }xms; ;; my $s = '1ds rule: 22hn, 2 then 3rd and 44 th, 2 nd of the 123|sdnfl +oor'; ;; printf qq{'$&' } while $s =~ m{ (\d+) \s* $rx_ordinal_indicator }xms +g; " '3rd' '44 th' '2 nd'
      (Ordinarily, I wouldn't use  $& and friends, but it's convenient for this example.)


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1199880]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-23 07:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found