Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Need your help in a pattern based text manipulation algorithm

by kaushik9918 (Sexton)
on May 16, 2020 at 15:25 UTC ( #11116842=perlquestion: print w/replies, xml ) Need Help??

kaushik9918 has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am not an expert in Perl, hence requesting your help in this one. I have a user defined input which I need to parse and based on occurence of a pattern , need to replace certain lines on some text files based on the input file. So, for better understanding, I have got some text files , like a_copy.txt , a_paste.txt, a_cut.txt,b_copy.txt , b_paste.txt, b_cut.txt,c_copy.txt , c_paste.txt, c_cut.txt,d_copy.txt , d_paste.txt, d_cut.txt . Basically, my user input file looks like this:

NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier

Now , based on the input file, I first have to parse and if "NAME|VALUE = a" and the corresponding "TASK|VALUE = copy" , it should open a_copy.txt (if a_copy.txt doesn't exist, throw an error and exit, which is easy). Inside a_copy.txt, it has to check if the following lines with starting keywords like the input , i.e,

CAPS|VALUE = PKG_TYPE|VALUE =

exist and replace them in the a_copy.txt file with the input file values of "CAPS|VALUE = 0" and "PKG_TYPE|VALUE = premium". Any empty line in the input file should be ignored.

It should do this for all the text files corresponding to all the "NAME|VALUE = ??" and the corresponding "TASK|VALUE = ??" and replace all lines in the correct text file based on the input file values

.

I have managed to write a code which parses the input file for any format differences etc and report out, but I am stuck in writing the subroutine to do the replace the input file lines in the text files. My code is (although incomplete) is :

use Tk; use IO::Handle; #################### Check for correct usage of script################ +############ $mw = new MainWindow; my $num_args = $#ARGV +1 ; if ($num_args != 0) { $mw -> messageBox(-message=>"\nUsage: perl replay_file_changes.pl\n +"); exit; } ###################################################################### +############ ###################### User input #################################### +############################# print "\nPLEASE SPECIFY INPUT FILE\n"; $ip_file = <>; chomp $ip_file ; #check validity of user input if (-e $ip_file){ } else{ $mw -> messageBox(-message=> "\n$ip_file IS NOT FOUND\n"); exit; } ###################################################################### +############################ ###################### Real deal ##################################### +############################ open(INPUT_FILE, "<$ip_file") || die "\n!!!ERROR OPENING INPUT FILE. +EXITING SCRIPT!!!\n"; while(<INPUT_FILE>){ if ($_=~ /(.*) =\n/ ){ $mw -> messageBox(-message=> "\nFormat not + correct on line $. of input file. Exiting script\n"); exit; } elsif ($_=~ /(.*) =\s+\n/ ){ $mw -> messageBox(-message=> "\nFormat not + correct on line $. of input file. Exiting script\n"); exit; } elsif ($_=~ /(.*) = \s+(.*)/ ){ $mw -> messageBox(-message=> "\nFormat not + correct on line $. of input file. Exiting script\n"); exit; } elsif ($_=~ /^NAME\|VALUE = (.*)/ ){ &check(); } } sub check { #print "\n$cell_name\n"; my $name= $1; chomp $name; while(<INPUT_FILE>){ if($_=~ /^TASK\|VALUE = (.*)/){ $task_value= $1; } elsif($_=~ /^(.*) = (.*)/){ $line=$_; } elsif($_=~ /^NAME\|VALUE = (.*)/){ exit; } print "\n$name $task_value $line\n"; } }

And my output is:

a copy a copy CAPS|VALUE = 0 a copy PKG_TYPE|VALUE = premium a copy PKG_TYPE|VALUE = premium a copy NAME|VALUE = z a cut NAME|VALUE = z a cut CAPS|VALUE = 0 a cut PKG_TYPE|VALUE = premium a cut PKG_TYPE|VALUE = premium a cut NAME|VALUE = c a paste NAME|VALUE = c a paste STACK|VALUE = 2 a paste SHIP|VALUE = lowtier

Can someone please help me with subroutine to do the job here? PLease dont hesitate to ask if you got any further questions on this, Regards

Replies are listed 'Best First'.
Re: Need your help in a pattern based text manipulation algorithm
by AnomalousMonk (Bishop) on May 16, 2020 at 17:42 UTC

    Without bothering with the GUI framework, here's a possible general approach to handling the VALUEs file:

    c:\@Work\Perl\monks>perl use strict; use warnings; use autodie; use Data::Dump qw(dd); my @tags = qw(NAME TASK CAPS PKG_TYPE STACK SHIP); my $rx_value_intro = qr{ \s* [|] \s* VALUE \s+ = \s+ }xms; my $rx_value = qr{ [[:alnum:]]+ }xms; my ($rx_tag) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort @tags ; print "rx_tag $rx_tag \n"; # for debug my $data_file = <<'EOF'; NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier EOF open my $fh_input, '<', \$data_file; local $/ = ''; while (my $record = <$fh_input>) { my $got_params = my %params = $record =~ m{ \G \s* ($rx_tag) $rx_value_intro ($rx_value) \s+ }xmsg; die "bad params record '$record'" unless $got_params; # dd \%params; # for debug process_params(%params); } sub process_params { my (%params, ) = @_; print "processing params: @{[ %params ]} \n"; } __END__ rx_tag (?msx-i: \b (?: TASK | STACK | SHIP | PKG_TYPE | NAME | CAPS) \ +b ) processing params: NAME a PKG_TYPE premium TASK copy CAPS 0 processing params: NAME z PKG_TYPE premium TASK cut CAPS 0 processing params: STACK 2 NAME c SHIP lowtier TASK paste
    See haukex's article Building Regex Alternations Dynamically. Of course, much testing of the code is needed.


    Give a man a fish:  <%-{-{-{-<

Re: Need your help in a pattern based text manipulation algorithm
by AnomalousMonk (Bishop) on May 17, 2020 at 02:04 UTC

    Here's a more developed version of the code here. The subroutine  process_params() is no longer a stub. This code is only very minimally tested. With a params.dat input file that is

    NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium
    (note blank line at end as a test) and a a_copy.txt file that is initially
    the rain CAPS|VALUE = in spain falls PKG_TYPE|VALUE = mainly
    the converted a_copy.txt file
    the rain CAPS|VALUE = 0 in spain falls PKG_TYPE|VALUE = premium mainly
    is produced. A file c_paste.txt also exists and seems to be handled correctly, but no z_cut.txt file exists to test throwing an exception in this case (which happens).

    Latest full script code is

    Note that Perl version 5.10+ is required because a regex extended pattern is used. I'm sorry that this post is a bit rushed; please feel free to ask any questions.

    One thing that would be helpful if you have any further questions is the provision of an example input/parameter file and at least one corresponding short before/after pair of files to be processed.


    Give a man a fish:  <%-{-{-{-<

      Hello ,

      Thanks a lot for your time. To add to my post, my a_copy.txt is something like this:

      sjdsajdsd kjfksls'fsaf sandlksadnksalndop djsndsan;sa sdsakdslkdsa sakdsakdsa ..................... ................. NAME|VALUE = a ksadksad dewidewjrw djsdnsalkd dskdsa;dkjsa .................... ............................. TASK|VALUE = copy dsakdmsald;sadsa dsalkdsldk'dls ................................ dskmdsldsdsda CAPS|VALUE = 34 dksdmlsajdsajdsa .............................. dsdksld;sads PKG_TYPE|VALUE = minimal dsmds.dsa.d lsdksadjsldjsdjpos ........................... ............................ .................................. jndlkjsandlksandsndsa jdnsakjdnkjs;dsa kdjslkdjsa;lkds STACK|VALUE = 4 dfdskjff;sdf ............................. ........................ SHIP|VALUE = junk kdskdsakjdl'sa jdsajd;lksahdsakda .......................... dsjads .........................

      And input file is user based, so its not confined to only the content you see in my example. It can have multiple values under "NAME|VALUE =" and "TASK|VALUE =" . I have to read these two keywords from the input file, and open the correct (NAME|VALUE)_(TASK|VALUE).txt. In my example, the first one happens to be "a_copy.txt". Second one happens to be "z_cut.test" and third one is "c_paste.txt". There can be many more like this based on the user input. So, I have to grep for the lines that follow "NAME|VALUE = a" and "TASK|VALUE = copy" in the input file, and replace those lines in the .txt file For example, in this case, a_copy.txt, I have to replace the lines "CAPS|VALUE = 34" with "CAPS|VALUE = 0" and "PKG_TYPE|VALUE = minimal" with "PKG_TYPE|VALUE = premium". In the c_paste.txt I have to replace "STACK|VALUE = 65" with "STACK|VALUE = 2" and "SHIP|VALUE = medium" with "SHIP|VALUE = lowtier" and so on. And you are right, since z_cut.txt file doesnt exist, the code has to throw an error in the end giving a list of all the non existent files and exit. The exit should happen only after the code has successfully run on the existing files. For example, post processing , a_copy.txt should look like:

      sjdsajdsd kjfksls'fsaf sandlksadnksalndop djsndsan;sa sdsakdslkdsa sakdsakdsa ..................... ................. NAME|VALUE = a ksadksad dewidewjrw djsdnsalkd dskdsa;dkjsa .................... ............................. TASK|VALUE = copy dsakdmsald;sadsa dsalkdsldk'dls ................................ dskmdsldsdsda CAPS|VALUE = 0 dksdmlsajdsajdsa .............................. dsdksld;sads PKG_TYPE|VALUE = premium dsmds.dsa.d lsdksadjsldjsdjpos ........................... ............................ .................................. jndlkjsandlksandsndsa jdnsakjdnkjs;dsa kdjslkdjsa;lkds STACK|VALUE = 4 dfdskjff;sdf ............................. ........................ SHIP|VALUE = junk kdskdsakjdl'sa jdsajd;lksahdsakda .......................... dsjads .........................

        It's a few days since I posted and I don't know if you're still looking for support, but I thought I'd follow up and close the loop. I'd like to mention that the example input/output file pair posted here seems to contain much irrelevant data and thus to be overlong. The goal in composing such example files is to include all that is necessary (including, in this case, a bit of general text that is not subject to alteration) and little else. Please see Short, Self-Contained, Correct Example.

        In any event, here's a version of the full script previously posted, updated to reflect my current understanding of your requirements based upon this. As before, it is only minimally tested and I have made no attempt to provide a GUI wrapper. And if you're using a version of Perl prior to 5.10, a fix to the regex that uses the  \K operator can easily be made.

        PerlMonks exists to provide support and assistance to Perl users at all levels of expertise and is not, in general, a free, on-line code writing service. You say that you are a Perl novice, so I have provided a fair amount of code that, in other circumstances, I would have expected you to have contributed to substantially. I have no hesitation about providing help to you, but in future if you have any qustions, please provide the code with which you are working (or at least a reference to it), and please try to provide short, pertinent example files for development and testing.


        Give a man a fish:  <%-{-{-{-<

Re: Need your help in a pattern based text manipulation algorithm
by clueless newbie (Deacon) on May 17, 2020 at 20:14 UTC

    I hate it when requirements are eked out on "a need to know" basis! So no commentary will be provided for the following:

    #!/usr/bin/env perl use Data::Dumper; use English('-no_match_vars'); use Getopt::Long::Descriptive; use Params::Validate(':all'); use Readonly; use Text::Diff; use Try::Tiny; use 5.01800; use warnings; Readonly my $VALUE_OF=>qr{\| *VALUE *= *}; Readonly my $HASHREF=>{ type=>HASHREF }; use Getopt::Long::Descriptive; my ($opts,$usage)=describe_options( "$0 %o <some-arg>", ['input|i=s','path to input file'], [], ['verbose|v','print extra stuff'], ['help',"print usage message and exit", { shortcircuit => 1 } ], ); print($usage->text), exit if ($opts->help || !$opts->input); warn Data::Dumper->Dump([\$opts],[qw(opts)]),' ' if ($opts->verbose); my @failed; open my $INPUT,'<',$opts->input or die qq{Could NOT open '$opts->input' for reading! $OS_ERROR}; local $INPUT_RECORD_SEPARATOR=qq{\n\n}; while (<$INPUT>) { chomp; warn Data::Dumper->Dump([\$ARG],[qw(*_)]),' ' if ($opts->verbose); my $_href; =strike: Example of input NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium =cut ()=m{^(?<name>.+?)$VALUE_OF(?<value>\S*) *$(?{$_href->{uc $+{name} +}=$+{value}})}gsm; push @failed,treat($_href); warn "\n\n"; }; close $INPUT or die qq{Could NOT close '$opts->input' after reading! $OS_ERROR} +; # Dump warn Data::Dumper->Dump([\@failed],[qw(*failed)]),' '; exit; sub treat { my ($_HREF)=Params::Validate::validate_pos(@_,$HASHREF ); warn Data::Dumper->Dump([\$_HREF],[qw(*_HREF)]),' ' if ($opts->verbose); my @return; # If NAME and TASK are NOT to be replaced use #my $file="@{[delete $_HREF->{NAME}]}_@{[delete $_HREF->{TASK} +]}.txt"; # otherwise use my $file="$_HREF->{NAME}_$_HREF->{TASK}.txt"; # my $keys=join q{|},keys %{$_HREF}; my $_re=qr{(?<name>$keys)$VALUE_OF(?<value>\S*)}i; warn Data::Dumper->Dump([\$_re],[qw(*_re)]),' ' if ($opts->verbose); try { my $before; open my $INPUT,'<',"input/$file" or die qq{Could NOT open 'input/$file' for reading! $O +S_ERROR}; local $INPUT_RECORD_SEPARATOR; $before=<$INPUT>; close $INPUT or die qq{Could NOT close 'input/$file' after reading! + $OS_ERROR}; my $after=$before=~ s{$_re(?{ warn Data::Dumper->Dump([\%+ +],[qw(*+)]),' 'if $opts->verbose; })}{$+{name}|VALUE = $_HREF->{uc $+ +{name}}}gimrs; #my $after=$before=~ s{$_re}{$+{name}|VALUE = $_HREF->{uc +$+{name}}}gimrs; warn Data::Dumper->Dump([\diff(\$before,\$after)],[qw(*dif +ference)]),' ' if ($opts->verbose); open my $OUTPUT,'>',"output/$file" or die qq{Could NOT open 'output/$file' for writing! $ +OS_ERROR}; print {$OUTPUT} $after; close $OUTPUT or die qq{Could NOT close 'output/$file' after writing +! $OS_ERROR}; # success @return=(); } catch { # Record failure @return=("input/$file\t$OS_ERROR"); Carp::cluck $ARG if ($opts->verbose); }; return @return; }; # treat
    ...>perl 11116842_02.t --input 11116842.dat --verbose $opts = \bless( { 'verbose' => 1, 'input' => '11116842.dat' }, 'Getopt::Long::Descriptive::Opts::__OPT__::1' ); at 11116842_02.t line 27. $_ = \'NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium'; at 11116842_02.t line 36, <$INPUT> chunk 1. $_HREF = \{ 'TASK' => 'copy', 'CAPS' => '0', 'PKG_TYPE' => 'premium', 'NAME' => 'a' }; at 11116842_02.t line 58, <$INPUT> chunk 1. $_re = \qr/(?<name>TASK|CAPS|PKG_TYPE|NAME)(?^u:\| *VALUE *= *)(?<valu +e>\S*)/ui; at 11116842_02.t line 69, <$INPUT> chunk 1. %+ = ( 'name' => 'NAME', 'value' => 'a' ); at 11116842_02.t line 80. %+ = ( 'name' => 'TASK', 'value' => 'copy' ); at 11116842_02.t line 80. %+ = ( 'name' => 'CAPS', 'value' => '34' ); at 11116842_02.t line 80. %+ = ( 'name' => 'PKG_TYPE', 'value' => 'minimal' ); at 11116842_02.t line 80. $difference = \'@@ -31,12 +31,12 @@ dskmdsldsdsda -CAPS|VALUE = 34 +CAPS|VALUE = 0 dksdmlsajdsajdsa .............................. dsdksld;sads -PKG_TYPE|VALUE = minimal +PKG_TYPE|VALUE = premium dsmds.dsa.d lsdksadjsldjsdjpos ........................... '; at 11116842_02.t line 83. $_ = \'NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium'; at 11116842_02.t line 36, <$INPUT> chunk 2. $_HREF = \{ 'PKG_TYPE' => 'premium', 'CAPS' => '0', 'NAME' => 'z', 'TASK' => 'cut' }; at 11116842_02.t line 58, <$INPUT> chunk 2. $_re = \qr/(?<name>PKG_TYPE|CAPS|NAME|TASK)(?^u:\| *VALUE *= *)(?<valu +e>\S*)/ui; at 11116842_02.t line 69, <$INPUT> chunk 2. Could NOT open 'input/z_cut.txt' for reading! No such file or director +y at 11116842_02.t line 73, <$INPUT> chunk 2. at 11116842_02.t line 97. main::catch {...} ("Could NOT open 'input/z_cut.txt' for readi +ng! No such file or"...) called at C:/berrybrew/5.18.4_32/perl/site/l +ib/Try/Tiny.pm line 123 Try::Tiny::try(CODE(0x2e8f57c), Try::Tiny::Catch=REF(0x2d57034 +)) called at 11116842_02.t line 99 main::treat(HASH(0x2e8c8fc)) called at 11116842_02.t line 47 $_ = \'NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier'; at 11116842_02.t line 36, <$INPUT> chunk 3. $_HREF = \{ 'NAME' => 'c', 'STACK' => '2', 'TASK' => 'paste', 'SHIP' => 'lowtier' }; at 11116842_02.t line 58, <$INPUT> chunk 3. $_re = \qr/(?<name>NAME|STACK|TASK|SHIP)(?^u:\| *VALUE *= *)(?<value>\ +S*)/ui; at 11116842_02.t line 69, <$INPUT> chunk 3. Could NOT open 'input/c_paste.txt' for reading! No such file or direct +ory at 11116842_02.t line 73, <$INPUT> chunk 3. at 11116842_02.t line 97. main::catch {...} ("Could NOT open 'input/c_paste.txt' for rea +ding! No such file "...) called at C:/berrybrew/5.18.4_32/perl/site/l +ib/Try/Tiny.pm line 123 Try::Tiny::try(CODE(0x2e8cbec), Try::Tiny::Catch=REF(0x2e8cbdc +)) called at 11116842_02.t line 99 main::treat(HASH(0x2e8f57c)) called at 11116842_02.t line 47 @failed = ( 'input/z_cut.txt No such file or directory', 'input/c_paste.txt No such file or directory' ); at 11116842_02.t line 53.

      Hello

      Thanks for your valuable time. Maybe I came across wrong. My bad. All I wanted to be is as clear as possible. Also, my only problem was my subroutine was not exiting at the correct line where I wanted it to. Perhaps, I messed up the whole thing by describing too much. Anyway, I will figure it out using some of your wisdom you have shared with me.

      Thanks a lot.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11116842]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2020-10-23 06:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (235 votes). Check out past polls.

    Notices?