Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Counting characters within regexp

by Bod (Deacon)
on May 08, 2021 at 22:15 UTC ( #11132285=perlquestion: print w/replies, xml ) Need Help??

Bod has asked for the wisdom of the Perl Monks concerning the following question:

Good evening fellow Monks

I've just been writing a bit of code to partially hide an email address such that the owner of the address would recognise it but anyone else would find it difficult, at least to work out. I tried a simple substitution but hit a problem. How can I determine how many characters there are and replace them with that same number of full stops or asterisks or whatever?

I have come up with a three line solution but it seems like the sort of problem that should be able to be done with a regexp substitution

use strict; use warnings; my @email = ( 'someone@example.co.uk', 'andrew.test@some.company.co.uk', 'jo@abc.com', ); foreach my $em(@email) { print "$em - "; my ($name, $comp) = split /@/, $em; $em =~ /^(\w[\w|\.]).*@(\w\w).*\.(\w+)$/; $em = $1 . '.' x (length($name) - 2) . "\@$2" . '.' x (length($com +p) - length($3) - 2) . $3; print "$em\n"; }
This produces this result:
C:\Users\joolz\Perl>perl regtest.pl someone@example.co.uk - so.....@ex.........uk andrew.test@some.company.co.uk - an.........@so..............uk jo@abc.com - jo@ab..com
This is one of those problems that seems relatively trivial until you actually come to do it! It also suffers as a difficult one to search for as Google wants to tell me how to match a range of lengths {x,y} or count the number times one string occurs in another.

Is there a nice, simple, succinct way to do this?

I guess the nicest solution would change the number of characters properly displayed as a function of the overall length!

Replies are listed 'Best First'.
Re: Counting characters within regexp
by tybalt89 (Prior) on May 08, 2021 at 23:12 UTC

    Close enough ?

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11132285 use warnings; my @email = ( 'someone@example.co.uk', 'andrew.test@some.company.co.uk', 'jo@abc.com', ); foreach my $em(@email) { print "$em - "; $em =~ s/\w{2}\K(\w*)/ $1 =~ tr!!.!cr /ge; print "$em\n"; }

    Outputs:

    someone@example.co.uk - so.....@ex......co.uk andrew.test@some.company.co.uk - an.....te..@so...co......co.uk jo@abc.com - jo@ab..co.
Re: Counting characters within regexp
by LanX (Cardinal) on May 08, 2021 at 23:06 UTC
    The obscured mails I know are normally shortened resp. normalized, may I suggest that you always use ... three dots?

    something like

    DB<98> say "$_:\t", $_ =~ s/(\w\w).*\@.*(\.\w*)/$1...\@...$2/r for q +w/someone@example.co.uk andrew.test@some.company.co.uk jo@abc.com/ someone@example.co.uk: so...@....uk andrew.test@some.company.co.uk: an...@....uk jo@abc.com: jo...@....com

    ?

    YMMV

    edit

    FWIW your code is (mostly) OK, but you should really always do it in an if-block to test for a successful match first

    if( $em =~ /^(\w[\w|\.]).*@(\w\w).*\.(\w+)$/ ) { $em = $1 . '.' x (length($name) - 2) . "\@$2" . '.' x (length($com +p) - length($3) - 2) . $3; }

    and you might get bitten by variable interpolation if you don't escape the "@" in a regex: \@

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      The obscured mails I know are normally shortened resp. normalized, may I suggest that you always use ... three dots?

      Interesting...
      That's pretty much what I put into production last evening. This is the live regexp:

      $user->{'email'} =~ s/([\w|\.][\w|\.])[\w|\.]+\@(\w\w[\w|\.])[\w|\.]+\ +.([\w|\.]+)/\1...@\2.....\3/;
      Although, looking at it again, it could be simplified. Email address don't start with a full stop!

      you should really always do it in an if-block to test for a successful match first

      I always do except for simple tests like the one I posted. I have been bitten by that before and spent a couple of long nights trying to work out what was wrong, only to find my regexp wasn't matching an edge case and my loop was using values from the previous iteration. It was years ago but frustrating enough that it is unforgettable!

      you might get bitten by variable interpolation if you don't escape the "@" in a regex: \@

      Yes - I've been bitten by that before as well.
      You are quite right, even in a quick test I should have escaped that one.

      Thanks for the helpful feedback

Re: Counting characters within regexp
by AnomalousMonk (Bishop) on May 08, 2021 at 23:43 UTC
    $em =~ /^(\w[\w|\.]).*@(\w\w).*\.(\w+)$/;

    Note that in the [\w|\.] character class, the | (pipe) character matches a literal pipe character and is not an alternation. (I'm not sure if you intended a character match or not.) Also, escaping the . (period) character (\.) in the class is not needed, but does no harm.


    Give a man a fish:  <%-{-{-{-<

      I used [\w|\.] to match either a an alphanumeric character or a real full stop - did I get that wrong?

      The beginning of \w[\w|\.] is to allow for cases where an email address consists of the person's initial followed by a full stop followed by their surname - e.g. j.bloggs@...

      EDIT - I guess I wanted (\w|\.) instead of [\w|\.]

        Here is how to use a character class to achieve that.

        use strict; use warnings; use Test::More; my @good = ( 'a', '9', '_', ); my @bad = ( '-', '$', '?', ); my $re = qr/[\w.]/; plan tests => @good + @bad; like $_, $re, "match for '$_'" for @good; unlike $_, $re, "no match for '$_'" for @bad;

        🦛

Re: Counting characters within regexp
by jwkrahn (Monsignor) on May 09, 2021 at 02:59 UTC

    A bit shorter:

    use strict; use warnings; my @email = ( 'someone@example.co.uk', 'andrew.test@some.company.co.uk', 'jo@abc.com', ); foreach my $em(@email) { print "$em - "; $em =~ /^(\w[\w.])(.*)(\@\w\w)(.*)(\.\w+)$/; $em = $1 . '.' x length($2) . $3 . '.' x length($4) . $5; print "$em\n"; }

    Or, you may like this:

    use strict; use warnings; my @email = ( 'someone@example.co.uk', 'andrew.test@some.company.co.uk', 'jo@abc.com', ); for my $em ( @email ) { print "$em - "; ( my $at_pos = rindex $em, '@' ) > 0 or die "$em: invalid email ad +dress.\n"; ( my $dot_pos = rindex $em, '.' ) > $at_pos or die "$em: invalid e +mail address.\n"; tr//./c for substr( $em, 2, $at_pos - 2 ), substr( $em, $at_pos + +3, -( length( $em ) - $dot_pos ) ); print "$em\n"; }
Re: Counting characters within regexp
by kcott (Bishop) on May 10, 2021 at 06:57 UTC

    G'day Bod,

    This looked a short, fun exercise. I threw this together before looking at any reponses.

    $ perl -Mstrict -Mwarnings -E ' my @email = ( q{someone@example.co.uk}, q{andrew.test@some.company.co.uk}, q{jo@abc.com}, ); for my $email (@email) { my ($nm0, $nm1, $at, $ad0, $ad1, $tld) = $email =~ /^(..)([^@]*)(@)(..)(.*?)(\.[^.]+)$/; say $email; say $nm0, "."x length($nm1), $at, $ad0, "."x length($ad1), $tl +d; } ' someone@example.co.uk so.....@ex.........uk andrew.test@some.company.co.uk an.........@so..............uk jo@abc.com jo@ab..com

    There are some similarities to what others have posted.

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11132285]
Approved by hippo
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2021-06-25 10:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (135 votes). Check out past polls.

    Notices?