Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

A little golfing challenge: Replacing letters with numbers

by haukex (Chancellor)
on Feb 21, 2019 at 08:48 UTC ( #1230284=perlquestion: print w/replies, xml ) Need Help??

haukex has asked for the wisdom of the Perl Monks concerning the following question:

In this StackOverflow question, user "Learner" asked how to replace a set of certain letters in the first column of a file with numbers. In other words, given this set of replacements:

A=>1, B=>5, C=>6, D=>4, E=>7, F=>16, G=>10, H=>11, I=>12, K=>14, L=>13, M=>15, N=>3, P=>17, Q=>8, R=>2, S=>18, T=>19, V=>22, W=>20, Y=>21, Z=>9

And this input:

NDDDDTSVCLGTRQCSWFAGCTNRTWNSSA 0 VCLGTRQCSWFAGCTNRTWNSSAVPLIGLP 0 LTWSGNDTCLYSCQNQTKGLLYQLFRNLFC 0 CQNQTKGLLYQLFRNLFCSYGLTEAHGKWR 0 ITNDKGHDGHRTPTWWLTGSNLTLSVNNSG 0 GHRTPTWWLTGSNLTLSVNNSGLFFLCGNG 0 FLCGNGVYKGFPPKWSGRCGLGYLVPSLTR 0 KGFPPKWSGRCGLGYLVPSLTRYLTLNASQ 0 QSVCMECQGHGERISPKDRCKSCNGRKIVR 1

The expected output is:

3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 +18 1 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1 22 17 13 12 + 10 13 17 13 19 20 18 10 3 4 19 6 13 21 18 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 + 13 16 6 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6 18 21 10 13 19 7 1 11 10 + 14 20 2 12 19 3 4 14 10 11 4 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 +22 3 3 18 10 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10 13 16 16 + 13 6 10 3 10 16 13 6 10 3 10 22 21 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 + 17 18 13 19 2 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2 21 13 1 +9 13 3 1 18 8 8 18 22 6 15 7 6 8 10 11 10 7 2 12 18 17 14 4 2 6 14 18 6 3 10 2 14 12 + 22 2

Here are my two solutions:

$ perl '-M5;%h=map{$_,++$i}split//,"ARNDBCEQZGHILKMFPSTWYV"' -alpe ' ($_=$F[0])=~s/[A-Z]/$h{$&} /g' $ perl -alpe ' ($_=$F[0])=~s/[A-Z]/(index("ARNDBCEQZGHILKMFPSTWYV",$&)+1)." "/ge'

WebPerl link

I personally consider the extra space character at the end of the line produced by my solutions acceptable (I think diff -b is probably ok too). Unfortunately the OP didn't specify what would happen in case the input strings contained letters that aren't in the set, so I guess "bonus points" for solutions that only affect [A-IK-NP-TV-WY-Z] instead of [A-Z] like my solution does. Bonus question: Can anyone come up with a short, preferably pure Perl, solution to produce such a regex character set for any given list of letters?

$ echo "ARNDBCEQZGHILKMFPSTWYV" | perl -MSet::IntSpan -ple ' $_=Set::IntSpan->new([map{ord}split//,$_])->run_list; s/\d+/chr$&/eg;s/,//g;$_="[$_]"'

Have at it ;-)

Update: Thank you for the inspired and inspiring responses so far, Discipulus, Eily, rsFalse, Veltro, and vr! I really enjoy the creativity in the solutions :-)

Replies are listed 'Best First'.
Re: A little golfing challenge: Replacing letters with numbers (edited)
by Eily (Monsignor) on Feb 21, 2019 at 09:46 UTC

    The trick with golfing is to wait for Discipulus to have a good idea and steal it from him :P.

    perl -nF -E '@h{split"",ARNDBCEQZGHILKMFPSTWYV}=1..22;say"@h{@F}"'
    Only works on Linux because it needs double quote context (so you would need two extra chars on windows).

    Edit: with the corresponding link

    Edit 2: removed the -la options. I don't use -l with say, and -a is implied by -F

    Edit 3: shorter than split:

    perl -nF -E '@h{ARNDBCEQZGHILKMFPSTWYV=~/./g}=1..22;say"@h{@F}"'

    Edit 4: -a is not implied by -F in perl v5.10 though

      One longer variant, also with extra spaces, but with less than two of them on each line:
      perl -pe 'ARNDBCEQZGHILKMFPSTWYV=~/$_(?{print pos.$"})/ for/./g,$_=$/'
      Corresponding link. (upd. corrected a link)

      And with similar idea but a bit shorter:
      perl -pe 's/./ARNDBCEQZGHILKMFPSTWYV=~m!$&!&&"@+ "/ge'
        perl -pe 's/./ARNDBCEQZGHILKMFPSTWYV=~m!$&!&&"@+ "/ge'

        Really nice! Well done! Going just a little further:

        perl -pe 's/./ARNDBCEQZGHILKMFPSTWYV=~$& &&"@+ "/ge'

Re: A little golfing challenge: Replacing letters with numbers -- oneliner
by Discipulus (Abbot) on Feb 21, 2019 at 09:40 UTC
    Hello haukex

    similar?

    perl -M"0;@h{split'','ARNDBCEQZGHILKMFPSTWYV'}=(1..22)"-pne "s/([A-Z]) +/$h{$1} /g"

    L*

    update (also edited the title for personal indexing purpose)

    perl -pe 's/[A-Z]/(index" ARNDBCEQZGHILKMFPSTWYV",$&).$"/ge' input.txt

    shorter..

    perl -pe 's/\S/(index" ARNDBCEQZGHILKMFPSTWYV",$&).$"/ge' input.txt

    the above is wrong.. but..

    perl -pe 's/\w/(index"0ARNDBCEQZGHILKMFPSTWYV",$&).$"/ge' input.txt




    > The trick with golfing is to wait for Discipulus to have a good idea and steal it from him :P.

    best perl compliment I received!

    PS

    meh.. I had a -1 for not coresponding string in index also it seems that second column is not wanted..

    ..so a longer version is needed

    perl -pe 's/\s.//,s/\w/(index"0ARNDBCEQZGHILKMFPSTWYV",$&).$"/eg' input.txt

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: A little golfing challenge: Replacing letters with numbers (edit)
by Veltro (Friar) on Feb 21, 2019 at 19:37 UTC
    use strict ; use warnings ; my $input = "NDDDDTSVCLGTRQCSWFAGCTNRTWNSSA 0" ; my @F = split'',$input ; my @ar ; @ar[map { ord } split '','ARNDBCEQZGHILKMFPSTWYV'] = 1..127 ; foreach ( @F ) { print( ($ar[ord $_] // $_ ) . " " ) ; }

    But I can't figure out how to put this in a one-liner, does anyone want to try?

    edit Feb 22, 2019 at 10:20 UTC:

    Up till now I have:

    perl '-M5;@ar[map { ord } split "","\0ARNDBCEQZGHILKMFPSTWYV"] = 0..12 +7' -alpe '($_=$F[0])=~s/(.)/($ar[ord $1])." "/ge' input.txt # or perl '-M5;@ar[map { ord } split "","\0ARNDBCEQZGHILKMFPSTWYV"] = 0..12 +7' -alpe '($_=$F[0])=~s/(.)/(defined $ar[ord $1] ? $ar[ord $1] : $1 ) +." "/ge' input.txt

    As rsFalse indicated in his reply below there is a trailing space. The other thing is that everything after a space is gone missing and I can't figure out why.

    edit Feb 22, 2019 at 10:49 UTC:

    Thanks Eily and Discipulus for pointing out -MO=Deparse to me. The mistake that I made was: $_=$F[0] So this should work now:

    # 'defined' is optional: perl '-M5;@ar[map { ord } split "","\0ARNDBCEQZGHILKMFPSTWYV"] = 0..12 +7' -alpe '$_=~s/(.)/(defined $ar[ord $1] ? $ar[ord $1]." " : $1 )/ge' + input.txt

    One more edit: Also note that I specifically crafted this in this manner so that this would also work:

    perl '-M5;@ar[map { ord } split "","\0V\0B\0L\0\0\0Q\0\0\0\0\0\0\0E\0\ +0\0\0\0\0\0\0\0\0P\0\0I\0\0\0\0\0\0Z\0H\0\0\0\0\0\0\0\0Y\0J\0\0S\0\0\ +0\0\0D\0\0\0X\0\0\0\0\0K\0\0\0R\0\0\0\0W\0\0\0\0\0U\0\0\0\0A\0\0M\0\0 +\0F\0\0\0C\0\0T\0\0\0\0O\0\0\0\0\0\0\0\0\0\0N\0G\0\0\0\0"] = 0..127' +-alpe '$_=~s/(.)/($ar[ord $1] ? $ar[ord $1]." " : $1 )/ge' input.txt
      Hello, Veltro,

      Nice idea with 'ord'!
      I've tried your code, and it outputs:
      3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 +18 1 0
      Here the newline is absent, and one extra trailing space, these are not problems I think. But some extra characters at the ending does solution wrong. (upd.:) Here I would suggest to modify printing line to one which converts undefined array elements to zero-length string, e.g.:
      print( $ar[ord $_] =~ s/.\K$/ /r );
      Which outputs (with extra space and without newline):
      3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 +18 1
      upd. ...or back to '//' op (to avoid uninitialized value warnings):
      print( ( $ar[ord $_] // '' ) =~ s/.\K$/ /r );
Re: A little golfing challenge: Replacing letters with numbers
by rsFalse (Hermit) on Feb 21, 2019 at 17:27 UTC
    haukex: "... solution to produce such a regex character set for any given list of letters."

    E.g.
    perl -0777 -ne 'print "" . ( join "", "A" .. "Z" ) =~ s/[^$_]/ /gr =~ +s/\B\w+\B/-/gr =~ s/ +//gr'
    OUTPUT:
    A-IK-NP-TVWYZ
    Corresponding link.

    For having dashes between VW and YZ I tried to use this:
    perl -0777 -ne 'print "" . ( join "", "A" .. "Z" ) =~ s/[^$_]/ /gr =~ +s/(?<=\w)\w*(?=\w)/-/gr =~ s/ +/ /gr'
    ..but somehow got: A--IK--NP--TV-WY-Z :(

    And these worked fine:  s/(?<=\b\w)\w*(?=\w\b)/-/gr ,  s/(\w)\w*(\w)/$1-$2/gr --> A-IK-NP-TV-WY-Z :)
Re: A little golfing challenge: Replacing letters with numbers
by vr (Deacon) on Feb 22, 2019 at 11:13 UTC

    Here's my answer, FWIW, but it's not to the question in topic :). I was playing with idea of packing encoding information as tightly as possible, bit vectors, as "5 bits of code per letter", this avenue to explore. Of course, in the end it turned out to be longest among all solutions here, so no need to post it.

    But, what if letters are coded as random numbers e.g. up to 1000? With 26 random 3-digit numbers (code to generate this encoding is obvious, not shown), and based on Eily's answer, solution could be:

    $_ = 'NDDDDTSVCLGTRQCSWFAGCTNRTWNSSA 0'; my @F = split ''; @h{A..Z}=('27941492412912434962583369414978036693514585011981639390315 +1901525290851349853' =~/.../g);say"@h{@F}";

    Now, my idea (here with 10 bits per code) is becoming competitive, length-wise:

    @h{A..Z}=map 1023& Math::BigInt->from_base(D3PghSS0AZyLfhkL67kHFiuXWoB +6p9GUkgznoeEO1eNj, 62)>>10*$_,0..25;say"@h{@F}";

    It's still slightly longer, and I need to use the module. I think, if number of symbols grows (e.g., 52 letters and 10 digits i.e. 62), then this solution will scale better and better :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1230284]
Approved by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2020-02-23 15:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (102 votes). Check out past polls.

    Notices?