Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Faster replacement of sed commands..

by johngg (Canon)
on Aug 19, 2014 at 09:36 UTC ( [id://1097949]=note: print w/replies, xml ) Need Help??


in reply to Faster replacement of sed commands..

I think it would be much clearer to use a lookup table (stored in a hash), construct a single capturing regex from the keys of the hash and do a single global substitution operation. It would also be clearer for you and those who come after if you employ a consistent code indentation scheme.

use strict; use warnings; use feature qw{ say }; my %replacementLU = ( QOS_PROFILE_ID => q{x1}, CHARGING_PROFILE_ID => q{x2}, CONTENT_FILTERING_PROFILE_ID => q{x3}, SUBSCRIBERID => q{x4}, RECORD_LENGTH => q{x5}, RECORD_TYPE => q{x6}, EVENT_ID => q{x7}, EVENT_RESULT => q{x8}, CAUSE_PROTOCOL => q{x9}, DEFAULT_BEARER_ID => q{x0}, ARP_PRIORITY_LEVEL => q{y1}, ARP_CAPABILITY => q{y2}, ARP_VULNERABILITY => q{y3}, BEARER_CONTROL_MODE => q{y4}, TRACKING_AREA_CODE => q{y5}, ROUTING_AREA_CODE => q{y7}, SERVICE_AREA_CODE => q{y8}, SYSTEM_IDENTIFIER => q{y9}, NETWORK_IDENTIFIER => q{y0}, GX_RAR_RAA_TRANSACTION => q{TRAR}, GX_CCR_CCA_TRANSACTION => q{TCCA}, QUOTA_GRANTED => q{TQG}, QOS_ASSIGNED_TO_DEFAULT_BEARER => q{TQA}, RULE_INSTALLED => q{TRI}, RULE_REMOVED => q{TRR}, ); my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; }; my $text = q{blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl}; say $text; $text =~ s{$subsRE}{$replacementLU{ $1 }}g; say $text;

The output.

blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl blurfl,y3,17,blargle,x7,4,glogl

I hope this is helpful.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: Faster replacement of sed commands..
by Anonymous Monk on Aug 20, 2014 at 04:30 UTC

    Hi...it worked for me..thanks a lot..but could you do me a little more favor? not able to understand below part of your code..

    my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; };

    please help me understand this...appreciate your efforts...

      The solution by johngg has the elegance that makes Perl such a great language. To understand it, let’s work backwards:

      $text =~ s{$subsRE}{$replacementLU{ $1 }}g;

      This is where all the actual substitutions take place. The first part, {$subsRE}, looks for keyword matches (see below), and the /g modifier keeps looking until no more matches can be found. For each match found, the keyword, referenced by $1, is used as a hash key in the lookup table %replacementLU, and the value corresponding to that key is used for the substitution. So, for example, ARP_VULNERABILITY is replaced by y3.

      OK, you knew all that, but where does $subsRE fit in? Let’s print it out to see what it looks like:

      (?^:(?x) \b ( CONTENT_FILTERING_PROFILE_ID|QUOTA_GRANTED|ARP_VULNERABI +LITY|NETWORK_IDENTIFIER|ARP_PRIORITY_LEVEL|DEFAULT_BEARER_ID|EVENT_RE +SULT|EVENT_ID|QOS_PROFILE_ID|ARP_CAPABILITY|SYSTEM_IDENTIFIER|TRACKIN +G_AREA_CODE|GX_RAR_RAA_TRANSACTION|SERVICE_AREA_CODE|RECORD_TYPE|RECO +RD_LENGTH|CHARGING_PROFILE_ID|QOS_ASSIGNED_TO_DEFAULT_BEARER|GX_CCR_C +CA_TRANSACTION|RULE_REMOVED|BEARER_CONTROL_MODE|ROUTING_AREA_CODE|RUL +E_INSTALLED|CAUSE_PROTOCOL|SUBSCRIBERID ) \b)

      As you can see, this says: match any one of the keywords provided it is preceded and followed by a word boundary (\b). The character | separating the keywords is the metacharacter for alternation; for example, A|B|C means: match either A, or B, or C. (See “Metacharacters” in perlre#Regular-Expressions.) Note the capturing parentheses: if any of the keywords is matched, it is captured into the next available capture variable (which in this case is $1).

      OK, so where did this monster $subsRE come from? It would be no fun constructing this by hand, so johngg harnessed Perl to do the work. Note that qr// is a the Perl regex quote operator: it converts a string into a regular expression (see perlop#Regexp-Quote-Like-Operators). (?x) is the /x modifier in a different form. The string argument to qr// is constructed by interpolating the keys of the hash %replacementLU into the string. But just saying this:

      qr{(?x) \b ( keys %replacementLU ) \b};

      wouldn’t work because Perl would think you want to match the literal characters keys %replacementLU. Perl will interpolate when it sees a $ (for a scalar) or an @ (for an array), so we need to give Perl a construct like this: @{ ... }. But that says, dereference (something) to get an array. So we need to convert keys %replacementLU (which returns a list) into an array reference, which we do by creating an anonymous array with square brackets. So

      @{ [ keys %replacementLU ] }

      is the Perlish idiom for interpolating the contents of the list returned by the keys function into the string.

      Now all we need is to separate the elements of the list with | (alternation) characters. Normally, when a list is interpolated, the elements are separated by spaces. But actually they’re separated by whatever is the contents of the special variable $", for which a space is the default. By changing it to |, we get the elements of the list separated by alternation characters, which gives us the regex we want.

      johngg could have just said:

      $" = q{|}; my $subsRE = qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};

      but that would leave $" set to |, which might interfere with other parts of the script. It’s better practice to localise any temporary changes made to global variables. The syntax:

      my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; };

      uses local to limit the scope of the assignment, and takes advantage of the fact that the do { ... }; syntax (1) provides an enclosing scope for the local $" = q{|}; assignment; and (2) returns the value of its final statement, in this case the regex returned by qr//.

      As Hannibal Smith liked to say: “I love it when a plan comes together.” :-)

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Thank you very much for the explanation...appreciate your efforts. Script is working great for me I have updated the script a little as per my requirements.

        #!/usr/bin/perl use Data::Dumper; use File::Copy; use feature qw{ say }; my %replacementLU = ( QOS_PROFILE_ID => q{x1}, CHARGING_PROFILE_ID => q{x2}, CONTENT_FILTERING_PROFILE_ID => q{x3}, SUBSCRIBERID => q{x4}, RECORD_LENGTH => q{x5}, RECORD_TYPE => q{x6}, EVENT_ID => q{x7}, EVENT_RESULT => q{x8}, CAUSE_PROTOCOL => q{x9}, DEFAULT_BEARER_ID => q{x0}, ARP_PRIORITY_LEVEL => q{y1}, ARP_CAPABILITY => q{y2}, ARP_VULNERABILITY => q{y3}, BEARER_CONTROL_MODE => q{y4}, TRACKING_AREA_CODE => q{y5}, ROUTING_AREA_CODE => q{y7}, SERVICE_AREA_CODE => q{y8}, SYSTEM_IDENTIFIER => q{y9}, NETWORK_IDENTIFIER => q{y0}, GX_RAR_RAA_TRANSACTION => q{TRAR}, GX_CCR_CCA_TRANSACTION => q{TCCA}, QUOTA_GRANTED => q{TQG}, QOS_ASSIGNED_TO_DEFAULT_BEARER => q{TQA}, RULE_INSTALLED => q{TRI}, RULE_REMOVED => q{TRR}, ); my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; }; my $counter = 0; open (WFH1, ">", "counter.txt"); for( ; ; ) { my $fileexists= -e "text.out"; if ($fileexists ne "1") { `touch text.out`; foreach my $file (</data/admin/scripts/SapcmedadpebM/test/*csv>) { chomp; $abc1=`find $file -mmin +10`; chomp($abc1); if ($abc1 eq "") { next; } print " file $abc1 \n"; $dd=`date`; print "$dd\n"; `perl -i -pe 's/[^[:ascii:]]//g; tr/\015//d' $abc1`; print "junk character removed\n"; open (FH, "$abc1"); open (WFH, ">", "abc1.op"); while (<FH>) { $_ =~ s{$subsRE}{$replacementLU{ $1 }}g; print WFH $_; } #`sed -i 's/QOS_PROFILE_ID/x1/g;s/CHARGING_PROFILE_ID/x2/g;s/CONTENT_F +ILTERING_PROFILE_ID/x3/g;s/SUBSCRIBERID/x4/g;s/RECORD_LENGTH/x5/g;s/R +ECORD_TYPE/x6/g;s/EVENT_ID/x7/g;s/EVENT_RESULT/x8/g;s/CAUSE_PROTOCOL/ +x9/g;s/DEFAULT_BEARER_ID/x0/g;s/ARP_PRIORITY_LEVEL/y1/g;s/ARP_CAPABIL +ITY/y2/g;s/ARP_VULNERABILITY/y3/g;s/BEARER_CONTROL_MODE/y4/g;s/TRACKI +NG_AREA_CODE/y5/g;s/ROUTING_AREA_CODE/y7/g;s/SERVICE_AREA_CODE/y8/g;s +/SYSTEM_IDENTIFIER/y9/g;s/NETWORK_IDENTIFIER/y0/g' $abc1`; #`sed -i 's/GX_RAR_RAA_TRANSACTION/TRAR/g;s/GX_CCR_CCA_TRANSACTION/TCC +A/g;s/QUOTA_GRANTED/TQG/g;s/QOS_ASSIGNED_TO_DEFAULT_BEARER/TQA/g;s/RU +LE_INSTALLED/TRI/g;s/RULE_REMOVED/TRR/g' $abc1`; move("abc1.op","./$abc1"); $counter++; print WFH1 $counter; } unlink "text.out"; } else { print "Exiting \n"; exit; } sleep(100); }

        Now when I execute this script to process a 400Mb i/p file (which it the required size) after generating 300Mb of output data it sticks.. doesn't throw any error, or doesn't fail..it just stops generating output data. Then I tried with a 200Mb file and again it stuck at 160Mb.. without any error message I am not able to find the root cause.. could it be a memory issue? could you please suggest any thing which can help us get this issue resolved? Please let me know if you need more info...thnaks

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1097949]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-18 14:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found