Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Find pieces of text in a file enclosed by `@` and replace the inside

by kluther (Novice)
on Mar 08, 2013 at 11:13 UTC ( [id://1022394]=perlquestion: print w/replies, xml ) Need Help??

kluther has asked for the wisdom of the Perl Monks concerning the following question:

The problem:
Find pieces of text in a file enclosed by `@` and replace the inside
Input:
@abc@ abc @ABC@
cba @cba CBA@

Deisred output:
абц abc АБЦ
cba цба ЦБА

I have the following:
#!/usr/bin/perl use strict; use warnings; use Encode; my $output; open FILE,"<", 'test.txt'; while (<FILE>) { chomp(my @chars = split(//, $_)); for (@chars) { my @char; $_ =~ s/a/chr(0x430)/eg; $_ =~ s/b/chr(0x431)/eg; $_ =~ s/c/chr(0x446)/eg; $_ =~ s/d/chr(0x434)/eg; $_ =~ s/e/chr(0x435)/eg; $_ =~ s/A/chr(0x410)/eg; $_ =~ s/B/chr(0x411)/eg; $_ =~ s/C/chr(0x426)/eg; push @char, $_; $output = join "", @char; print encode("utf-8",$output);} print "\n"; } close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther

Replies are listed 'Best First'.
Re: Find pieces of text in a file enclosed by `@` and replace the inside
by bart (Canon) on Mar 08, 2013 at 11:44 UTC
    Step one is that you replace text between '@' delimiters. You can do that using
    s/\@(.*?)\@/ ... /g;
    Step two is to replace individual characters on the selected part.

    Using the /e modifier you can use perl code in the substitution part, where you can use $1 as a normal variable. With a pair of "{}" delimiters on the right hand side, it can even look like normal code, as it looks like a block; you have to use similar paired delimiters on the left to make it work, for example using angle brackets "<>":

    s<\@(.*?)\@>{ ... }ge;
    So you might try to do the replacement using code directly in the substitution part. But, to be safe, you'd better call a sub to do the actual replacement, on the selected text. I'd change your code like this:
    while(<>) { s/\@(.*?)\@/ subst($1) /ge; print; } sub subst { my $s = shift; my %r = ( 'a' => chr(0x430), 'b' => chr(0x431), 'c' => chr(0x446), 'd' => chr(0x434), 'e' => chr(0x435), 'A' => chr(0x410), 'B' => chr(0x411), 'C' => chr(0x426) ); $s =~ s/([a-eA-C])/$r{$1}/g; return $s; }

    Caveat: untested.

    update: Tested, and bug fixed, this line was wrong:

    s/\@(.*)\@/ subst($1) /ge;

      Hi,

      one question to the (very nice) solution: I'm not sure whether I'm right. Is it possible that there are perl bugs in older versions with exactly this kind of substitution pattern:

      s/pattern/ func(bla) /ge;

      I'm just curious. Probably it was in another context. Where are the perl core hackers?

      Best regards
      McA

        I'm not sure what you mean. But I know that $1 and friends are block scoped, meaning that it's safe, and has always been safe, to nest substitutions provided the inner substitution was in a block.

        Calling a sub is one safe way to achieve that. It also has the added advantage that you don't have to take care of having to escape any unpaired nested braces: the substitution part may look like a code block to us, but it actually isn't.

      Thanks for your quick response. The code is not the solution. The second word of the first line is replaced also eventhough it is not surrounded by '@''s.
        Yeah, I just tested it. I forgot the question mark. Now it works.
Re: Find pieces of text in a file enclosed by `@` and replace the inside
by daxim (Curate) on Mar 08, 2013 at 12:15 UTC
    The substitution operator is powerful. I'll let the regex engine do the parsing/splitting work.
    use utf8;
    use strict;
    use warnings FATAL => 'all';
    use Data::Munge qw(list2re);
    use File::Slurp qw(read_file write_file);
    
    my %tr = (
        a => 'а',
        b => 'б',
        c => 'ц',
        A => 'А',
        B => 'Б',
        C => 'Ц',
    );
    my $key = list2re keys %tr;
    
    my $text = read_file('test.txt', { binmode => ':encoding(UTF-8)' });
    
    $text =~ s{
        @       # fragment start
        ([^@]+) # capture characters inside (all except @)
        @       # fragment end
    }{
        my $fragment = $1;
        $fragment =~ s{
            ($key)
        }{
            $tr{$1}
        }egmsx;
        $fragment;
    }egmsx;
    
    write_file('output.txt', { binmode => ':encoding(UTF-8)' }, $text);
    

      I give a ++ for this very nice piece of code. This one could be in a book "Learning Perl: The clean and nice way".

      IMHO why: use good modules, format code concise, build nice regexes with comments.

      Aaaahh, this code makes me happy... ;-)

      Best regards
      McA

Re: Find pieces of text in a file enclosed by `@` and replace the inside
by kcott (Archbishop) on Mar 08, 2013 at 13:32 UTC

    G'day kluther,

    Rather than using all those substitutions, you can do a single transliteration. Here's a commandline example:

    $ perl -Mstrict -Mwarnings -Mutf8 -e '
        binmode STDOUT => ":utf8";
        while (<>) {
            s/@([^@]+)@/$_ = $1; y{abcABC}{абцАБЦ}; $_/eg;
            print;
        }
    '
    @abc@ abc @ABC@
    абц abc АБЦ
    cba @cba CBA@
    cba цба ЦБА
    

    -- Ken

      Hi Ken, I on a sco-6 machine and that machine doens't have coding for cyrillic.

        That would have been useful information to have provided upfront. Perhaps you'd care to update your initial post with this limitation. You can still do the transliteration - this code produces the same output as my previous solution:

        $ perl -Mstrict -Mwarnings -e ' binmode STDOUT => ":utf8"; my $subs = join q{} => map chr, 0x430, 0x431, 0x446, 0x410, 0x411, + 0x426; while (<>) { s/@([^@]+)@/"\$_ = \$1; y{abcABC}{$subs}; \$_"/eeg; print; } '

        -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1022394]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-03-19 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found