Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Grouped Regular Expression not set assign default value

by gbwien (Sexton)
on Feb 20, 2018 at 14:26 UTC ( [id://1209586]=perlquestion: print w/replies, xml ) Need Help??

gbwien has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am learning to write code to parse text files using grouped regular expressions. For the following section of code I am trying to find a way to determine if the entire group $1 is found in the input file. If it is I write group $2 to line. I am interested in modifying the code below to write MSISDN=notSet to $line if the entire line group $1 is not set in the code extract below? Thank your for your help

pseudo code check line for tabMSISDN=digits; if found extract digits and assign to line otherwise assign MSISDN=notSet to the line for printing later

Section of code

if (/^(\t*MSISDN=(\d+));/) { print OUTFILE "Update Command $line\n" if defined $line; $line = "<$2>"; #group 2 #otherwise assign MSISDN=notSet to the line for printing later }

full program below

#!/usr/bin/perl use strict; use warnings; my $HSSIN='D:\testproject\sample-input.txt'; my $ofile = 'D:\testproject\sample-output.txt'; my $add; open (INFILE, $HSSIN) or die "Cant open input file"; open (OUTFILE,"> $ofile" ) or die "Cant open file"; my $line; while (<INFILE>) { if (/^(\t*MSISDN=(\d+));/) { print OUTFILE "Update Command $line\n" if defined $line; $line = "<$2>"; #group 2 } if (/(\t*ODBIC=([\w]+?\w.*));/) { #print OUTFILE "$line\n" if defined $line; #$line = $2; $add = $2; $line .= ",$add"; } if (/(\t*ODBOC=([\w]+?\w.*));/) { $add = $2; $line .= ",$add"; } } print OUTFILE "Update Command $line\n"; close INFILE; close OUTFILE;

input file may contain multiple entries

<SUBBEGIN MSISDN=123476789678; ODBIC=BIC; ODBOC=BAOC; <SUBEND <SUBBEGIN ODBIC=BIC; ODBOC=BAOC; <SUBEND
desired output Update Command <123476789678>,BIC,BAOC Update Command MSISDN=notSet,BIC, BAOC

Replies are listed 'Best First'.
Re: Grouped Regular Expression not set assign default value
by tybalt89 (Monsignor) on Feb 20, 2018 at 20:12 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1209586 use strict; use warnings; local ($/, $\) = ("<SUBEND", "\n"); /SUBEND/ and print "Update Command ", join ',', /MSISDN=(\d+)/ ? "<$1>" :'MSISDN=notSet', /C=(\w+)/g while <DATA>; __DATA__ <SUBBEGIN MSISDN=123476789678; ODBIC=BIC; ODBOC=BAOC; <SUBEND <SUBBEGIN ODBIC=BIC; ODBOC=BAOC; <SUBEND
Re: Grouped Regular Expression not set assign default value
by Laurent_R (Canon) on Feb 20, 2018 at 18:24 UTC
    Hi gbwien,

    your code is based on a suggestion I made a few days ago in another thread that you opened (http://www.perlmonks.org/?node_id=1209051).

    But, in that post, based on the data that you had originally shown, I was using an MSISDN line to detect the beginning of the new record block. You're now showing data where there is not always a line with an MSISDN. So you need to use something else to detect the beginning of a record. See my solution posted today (http://www.perlmonks.org/?node_id=1209603), where <SUBEND is used to detect the end of a block.

    As I already pointed out in that previous thread, you're using too many parentheses in your regex, making your life more difficult than it needs to be.

    For example, replace:

    if (/^(\t*MSISDN=(\d+));/) {
    with:
    if (/^\t*MSISDN=(\d+);/) {
    and use $1 rather than $2.

      Hi Laurent_R

      Thanks again for your help. The code I wrote in http://www.perlmonks.org/?node_id=1209586 is based on the same problem but I made the mistake of not using a consistent input file across posts. I am still working on building up the contents of the input file with additional populated data and I thought it was best to use something like <SUBBEGIN and <SUBEND. Additional populated data below to be decided is represented by ..... and may appear anywhere in the file,but I don't need to worry about this. For the output file I am only interested in extracting and checking the fields MSISDN, CB, CF, ODB based on what is to the right of the equals sign. Also one important point the data may appear in any order.

      My question in the last post was to try to understand how to check if the line MSISDN is not found . I thought I could apply this knowledge to lines CB, CF and ODB too and wirte something similar e.g. CB=BAOC-ALL-PROV; not found write CBBocallProvNotSet, CF=CFU-ALL-PROV-NONE not found write CFUallProvNotSet. Again very sorry about the confusion, I will try to be more concise on the site going forward

      Example record in the input file with all considered values we spoke about

      <SUBBEGIN IMSI=21111111111111; MSISDN=413333333331; ..... ... CB=BAOC-ALL-PROV; CB=BOIC-ALL-PROV; CB=BOICEXHC-ALL-PROV; CB=BICROAM-ALL-PROV; CW=CW-ALL-PROV; CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-N +O-NO-NO; CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO- +NO-NO-NO; CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-N +O-NO-NO-NO; CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO +-NO-NO-NO; CF=CFD-TS10-ACT-91436903000-YES-YES-25-YES-65535-YES-YES-NO-NO-NO +-YES-YES-YES-YES-NO; ODBIC=BAIC; ODBOC=BAOC; ODBROAM=ODBOHC; ODBPRC=ENTER; ODBPRC=INFO; ODBPLMN=NONE; ODBPOS=NOBPOS-BOTH; ODBECT=OdbAllECT; ODBDECT=YES; ODBMECT=YES; ODBPREMSMS=YES; ODBADULTSMS=YES; <SUBEND

      I wrote the following based on your code

      #!/usr/bin/perl use strict; use warnings; use feature 'say'; my $HSSIN='D:\testproject\HSS-export.txt'; my $ofile = 'D:\testproject\HSS-output.txt'; my $add; my $MSISDN; open (INFILE, $HSSIN) or die "Can't open input file"; open (OUTFILE,"> $ofile" ) or die "Cant open file"; my $line; while (<INFILE>) { if (/<SUBEND/) { print OUTFILE "$line\n"; $line = "MSISDN=0"; } $line = $1 if /^\s*MSISDN=(\d+);/; #if (/^\t*MSISDN=(\d+);/) { #find MSISDN in file global search # print OUTFILE "processSingle Update Command MKEY <parameter n +ame> $line, \n" if defined $line; # $line = "<$1>"; #group 1 #blockings if (/\t*CB=([\w-]+?);/) { $add = $1; $line .= ",$add"; } #call forwardings if (/\t*CF=([\w-]+?-(?:NONE|\d+))/ and (!/(\t*CF=CFD-[\w-]+?-\d+)/ +)) { $add = $1; #the entire of group 1 above, next search the line $add =~ s/\t//g; $add =~ s/(91)(\d+)?/$2/; #remove 91 from $1 above CFD-TS10-AC +T-91436903000 #$add =~ s/\b436903000/43660303060/; $add =~ s/(\d+)$/1\/1\/1\/$1/; $add =~ s/NONE/1\/1\/1\/0/; $line .= ",$add"; } #change CFD to 43660303060 for voicemail if (/(\t*CF=(CFD-[\w-]+?-\d+))/ ) { $add = $2; $add =~ s/\t//g; #$add =~ s/(91)(\d+)?/1\/1\/1\/$2/; #remove 91 from $2 above $add =~ s/\b91436903000/1\/1\/1\/43660303060/; $add =~ m/(?:CFD-[\w].*-)(\d)\/(\d)\/(\d)\/(\d*)/; $line .= ",$add"; } #odb stuff #if ($_ =~ m/\t*ODBIC=([\w-]+?\w.*);/) if (/\t*ODBIC=([\w]+?\w.*);/) { #$add = "mappedBICVALUE"; $add = $1, $line .= ",$add"; } if (/\t*ODBOC=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBROAM=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBPRC=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBPLMN=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBPOS=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBECT=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBDECT=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBMECT=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBPREMSMS=([\w]+?\w.*);/) { $add = $1; $line .= ",$add"; } if (/\t*ODBADULTSMS=([\w]+?\w.*);/) { $add = "$1"; $line .= ",$add"; } #if (/\t*ODB(\w+)?=([\w-]+?\w.*);/) #{ # $add = $2; #$add =~ s/\t//g; # $line .= ",$add"; #} } #print OUTFILE "processSingle Update Command MKEY <parameter name> $li +ne, \n"; close INFILE; close OUTFILE;
Re: Grouped Regular Expression not set assign default value
by thanos1983 (Parson) on Feb 20, 2018 at 15:37 UTC

    Hello gbwien,

    Maybe this is not the best solution to your problem but it could be a point of start:

    #!/usr/bin/perl use strict; use warnings; use feature 'say'; open (my $fhIn, '<', "in.txt") or die "Can not open 'in.txt': $!"; open (my $fhOut, '>', "out.txt") or die "Can not open 'out.txt': $!"; my %hash; my $count = 0; while (<$fhIn>) { chomp; if (/<SUBBEGIN/) { $count = 1; } elsif (/<SUBEND/) { $count = 0; } elsif ($count) { my @tmp = split /=/; chop $tmp[1]; if (/MSISDN/) { $hash{MSISDN} = $tmp[1]; } elsif (/ODBIC/) { $hash{ODBIC} = $tmp[1]; } elsif (/ODBOC/) { $hash{ODBOC} = $tmp[1]; } } if ($count == 0) { if (exists $hash{MSISDN}) { say $fhOut "Update Command <".$hash{MSISDN}.">,".$hash{ODBIC}. +",".$hash{ODBOC}.""; } else { say $fhOut "Update Command MSISDN=notSet,".$hash{ODBIC}.",".$h +ash{ODBOC}.""; } delete $hash{MSISDN}; delete $hash{ODBIC}; delete $hash{ODBOC}; } } close ($fhIn) or warn "Could not close 'in.txt': $!"; close ($fhOut) or warn "Could not close 'out.txt': $!"; __END__ $ cat out.txt Update Command <123476789678>,BIC,BAOC Update Command MSISDN=notSet,BIC,BAOC

    If there is something that you can not understand let me know. Maybe you will find other ways to approach your problem at Getting lines in a file between two patterns.

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Hi thanos1983 Thank you for your help, I extended your solution to check for these fields CF and ODB fields which I output to a comma separated file when set and not set. It works fine except for CF fields, I think this is because these lines are not unique in the file. For CF lines example I would like to set CfuNotSet if the line CFU-ALL-PROV-NONE is not found, CFbNoSet if the line CFB-ALL-PROV-NONE is not found and so on. Eventually I would also like to check for CB fields. I have done this without a hash with the help from Laurent_R but I would like to learn your solution too

      This piece of code seems ok code

      #build up the hash CF if (/CF/){ $hash{CF} = $tmp[1]; #say $hash{CF}; }

      I think the problem is here

      #PROBLEM CODE check CF fields need to accommodate for multiple CF l +ines if (exists $hash{CF}) { $add = $hash{CF}; $line .= ",$add"; }
      <BEGINFILE> <SUBBEGIN IMSI=11111111111111; MSISDN=431234567893; CB=BAOC-ALL-PROV; CB=BOIC-ALL-PROV; CB=BOICEXHC-ALL-PROV; CB=BICROAM-ALL-PROV; IMEISV=4565676567576576; CW=CW-ALL-PROV; CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO +-NO-NO; CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-N +O-NO-NO; CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO +-NO-NO-NO; CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO- +NO-NO-NO; CF=CFD-TS10-ACT-91436903000-YES-YES-25-YES-65535-YES-YES-NO-NO-NO- +YES-YES-YES-YES-NO; ODBIC=BAIC; ODBOC=BAOC; ODBROAM=ODBOHC; ODBPRC=ENTER; ODBPRC=INFO; ODBPLMN=NONE; ODBPOS=NOBPOS-BOTH; ODBECT=OdbAllECT; ODBDECT=YES; ODBMECT=YES; ODBPREMSMS=YES; ODBADULTSMS=YES; <SUBEND <SUBBEGIN IMSI=11111111111133; MSISDN=431234567899; CB=BAOC-ALL-PROV; CB=BOIC-ALL-PROV; CB=BOICEXHC-ALL-PROV; CB=BICROAM-ALL-PROV; CW=CW-ALL-PROV; CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO +-NO-NO; CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-N +O-NO-NO; CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO +-NO-NO-NO; CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO- +NO-NO-NO; CF=CFU-TS10-ACT-914369050045021-YES-NO-NONE-YES-65535-YES-YES-NO-N +O-NO-NO-NO-NO-NO-NO; CF=CFD-TS10-REG-91436903000-YES-YES-25-YES-65535-YES-YES-NO-NO-NO- +YES-YES-YES-YES-NO; ODBIC=BICCROSSDOMESTIC; ODBOC=BAOC; ODBROAM=ODBOH; ODBPRC=INFO; ODBPLMN=PLMN1 ODBPLMN=PLMN3; ODBPOS=NOBPOS-BOTH; ODBECT=OdbAllECT; ODBDECT=YES; ODBMECT=YES; ODBPREMSMS=NO; ODBADULTSMS=YES; <SUBEND

      If I understand it correctly you split up the lines of the record based on the equals sign. This seems to work fine until CF entries are encountered

      #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Data::Dumper; my $HSSIN='D:\testproject\HSS-export.txt'; my $ofile = 'D:\testproject\HSS-output-withhash.txt'; open (INFILE, $HSSIN) or die "Can't open input file"; open (OUTFILE,"> $ofile" ) or die "Cant open file"; my %hash; my $count = 0; my $add; my $line; while (<INFILE>) { chomp; next if /^<ENDFILE>/; next if /^<BEGINFILE>/; if (/<SUBBEGIN/) { $count = 1; } elsif (/<SUBEND/) { $count = 0; } elsif ($count) { my @tmp = split /=/; chop $tmp[1]; #build up the hash CF if (/CF/){ $hash{CF} = $tmp[1]; #say $hash{CF}; } if (/MSISDN/) { $hash{MSISDN} = $tmp[1]; } if (/ODBIC/) { $hash{ODBIC} = $tmp[1]; } elsif (/ODBOC/) { $hash{ODBOC} = $tmp[1]; } elsif (/ODBROAM/) { $hash{ODBROAM} = $tmp[1]; } } if ($count == 0) { #check MSISDN field if (exists $hash{MSISDN}) { $line = $hash{MSISDN}; #'say OUTFILE $line; #say OUTFILE "Update Command <".$hash{MSISDN}.">,".$hash{ODBIC +}.",".$hash{ODBOC}.",".$hash{ODBROAM}.""; } else { #say OUTFILE "Update Command MSISDN=notSet,".$hash{ODBIC}.",".$ +hash{ODBOC}.""; $line = 'MSISDN=notSet'; } #PROBLEM CODE check CF fields need to accommodate for multiple CF l +ines if (exists $hash{CF}) { $add = $hash{CF}; $line .= ",$add"; } #check ODBIC field if (exists $hash{ODBIC}) { $add = $hash{ODBIC}; $line .= ",$add"; } else { $add = 'ODBICnotSet'; $line .= ",$add"; } #check ODBOC field if (exists $hash{ODBOC}) { $add = $hash{ODBOC}; $line .= ",$add"; } else { $add = 'ODBOC notSet'; $line .= ",$add"; } #check ODBROAM field if (exists $hash{ODBROAM}) { $add = $hash{ODBROAM}; $line .= ",$add"; } else { $add = 'ODBROAM notSet'; $line .= ",$add"; } #check hash for ODBOC->BAOC is not set set it to ODBOC->BAOCnotSet delete $hash{MSISDN}; delete $hash{ODBIC}; delete $hash{ODBOC}; delete $hash{ODBROAM}; delete $hash{CF}; #build up line say OUTFILE $line; } } close INFILE; close OUTFILE;

        Hello again gbwien,

        I am not 100% sure what is the expected output from your description. I took a guess and I put together an example based on the new input and what I think you mean.

        Having said that I think the only modifications that you need to add is:

        I have not update the concatenation of the string output since I do not know what is the actual output that you are looking for.

        In case that this is not the expected way the string should behave, provide an desired output sample so I can understand approximately based on input which fields you want to keep and which you want to skip.

        Hope this helps, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Grouped Regular Expression not set assign default value
by Laurent_R (Canon) on Feb 20, 2018 at 18:36 UTC
    This is an adaptation of my program in the other thread to your new data:
    use strict; use warnings; my ($msisdn, $odbic, $odboc); while (<DATA>) { if (/<SUBEND/) { print "$msisdn,$odbic,$odboc\n"; $msisdn = "Update Command MSISDN=notSet"; } $msisdn = "Update Command <$1>" if /^\s*MSISDN=(\d+);/; $odbic = $1 if /^\s+ODBIC=(\w+);/; $odboc = $1 if /^\s+ODBOC=(\w+);/; } __DATA__ <SUBBEGIN MSISDN=123476789678; ODBIC=BIC; ODBOC=BAOC; <SUBEND <SUBBEGIN ODBIC=BIC; ODBOC=BAOC; <SUBEND
    Output:
    Update Command <123476789678>,BIC,BAOC Update Command MSISDN=notSet,BIC,BAOC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1209586]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-19 20:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found