Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: merging a file with a value present in another file

by lakssreedhar (Acolyte)
on Jul 16, 2012 at 07:00 UTC ( [id://981978]=note: print w/replies, xml ) Need Help??


in reply to Re^3: merging a file with a value present in another file
in thread merging a file with a value present in another file

file1 is

{RP}makaravilYakkin Sabarimala ayyappanu cArZwwAnulYlYa wiruviwAMkUrZ rAjAvAyirunna SrI ciwwirawirunnAlYZ bAlarAmavarZmma natakk vacca  420 kilogrAM wUkkamulYlYa wafkayafki{/RP}{MCL} sUkRikkunnaw I kRewrawwilAN.{/MCL}

file2 is

<Sentence id="1"> 1 (( NP 1.1 makaravilYakkin NN <fs af='makaravilYakk,n,any,sg,,d,,kk' + conj="blank" spec="blank" CASE_NAME="dat" dubi="blank"> )) 2 (( NP 2.1 Sabarimala NNP <fs af='Sabarimala,n,any,sg,,d,,0' conj="b +lank" spec="blank" CASE_NAME="nom" dubi="blank"> 2.2 ayyappanu NN <fs af='ayyappanu,unkn,,,,,,' poslcat="NM"> )) 3 (( VGF 3.1 cArZwwAnulYlYa VM <fs af='cArZww,v,any,any,any,,AnulYlYa, +AnulYlYa'> )) 4 (( NP 4.1 wiruviwAMkUrZ QF <fs af='wiruviwAMkUrZ,n,any,sg,,d,,0' co +nj="blank" spec="blank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 4.2 rAjAvAyirunna NN <fs af='rAjAv,n,m,sg,,o,,yAyirunna' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> )) 5 (( NP 5.1 SrI UNK <fs af='SrI,n,any,sg,,d,,0' conj="blank" spec="bl +ank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 5.2 ciwwirawirunnAlYZ NN <fs af='ciwwirawirunnAlYZ,unkn,,,,,, +' poslcat="NM"> 5.3 bAlarAmavarZmma NNP <fs af='bAlarAmavarZmma,unkn,,,,,,' p +oslcat="NM"> 5.4 natakk NN <fs af='nata,n,any,sg,,d,,kk' conj="blank" spec +="blank" CASE_NAME="dat" dubi="blank"> )) 6 (( VGF 6.1 vacca VM <fs af='vaykk,v,any,any,any,,ta,ta' CASE_NAME="n +om"> )) 7 (( NP 7.1 420 QC <fs af='420,num,,,,,,'> 7.2 kilogrAM NN <fs af='kilogrAM,unkn,,,,,,' poslcat="NM"> )) 8 (( NP 8.1 wUkkamulYlYa NN <fs af='wUkkaM,n,any,sg,,d,,yulYlYa' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 8.2 wafkayafki NNP <fs af='wafkayafki,unkn,,,,,,' poslcat="NM +"> )) 9 (( VGNF 9.1 sUkRikkunnaw VM <fs af='sUkRikk,v,any,any,any,,unnaw,unna +w'> )) 10 (( NP 10.1 I DEM <fs af='I,pn,any,sg,,,,0' conj="blank" spec="blank +" CASE_NAME="nom" dubi="blank"> 10.2 kRewrawwilAN NN <fs af='kRewraM,n,any,sg,,d,,yilAN' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 10.3 . SYM <fs af='.,punc,,,,,,' poslcat="NM"> )) </Sentence>

my output file should be

<Sentence id="1"> 1 (( NP 1.1 makaravilYakkin NN <fs af='makaravilYakk,n,any,sg,,d,,kk' + conj="blank" spec="blank" CASE_NAME="dat" dubi="blank" clause_start= +"rp"> )) 2 (( NP 2.1 Sabarimala NNP <fs af='Sabarimala,n,any,sg,,d,,0' conj="b +lank" spec="blank" CASE_NAME="nom" dubi="blank"> 2.2 ayyappanu NN <fs af='ayyappanu,unkn,,,,,,' poslcat="NM"> )) 3 (( VGF 3.1 cArZwwAnulYlYa VM <fs af='cArZww,v,any,any,any,,AnulYlYa, +AnulYlYa'> )) 4 (( NP 4.1 wiruviwAMkUrZ QF <fs af='wiruviwAMkUrZ,n,any,sg,,d,,0' co +nj="blank" spec="blank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 4.2 rAjAvAyirunna NN <fs af='rAjAv,n,m,sg,,o,,yAyirunna' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> )) 5 (( NP 5.1 SrI UNK <fs af='SrI,n,any,sg,,d,,0' conj="blank" spec="bl +ank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 5.2 ciwwirawirunnAlYZ NN <fs af='ciwwirawirunnAlYZ,unkn,,,,,, +' poslcat="NM"> 5.3 bAlarAmavarZmma NNP <fs af='bAlarAmavarZmma,unkn,,,,,,' p +oslcat="NM"> 5.4 natakk NN <fs af='nata,n,any,sg,,d,,kk' conj="blank" spec +="blank" CASE_NAME="dat" dubi="blank"> )) 6 (( VGF 6.1 vacca VM <fs af='vaykk,v,any,any,any,,ta,ta' CASE_NAME="n +om"> )) 7 (( NP 7.1 420 QC <fs af='420,num,,,,,,'> 7.2 kilogrAM NN <fs af='kilogrAM,unkn,,,,,,' poslcat="NM"> )) 8 (( NP 8.1 wUkkamulYlYa NN <fs af='wUkkaM,n,any,sg,,d,,yulYlYa' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 8.2 wafkayafki NNP <fs af='wafkayafki,unkn,,,,,,' poslcat="NM +" clause_end="rp"> )) 9 (( VGNF 9.1 sUkRikkunnaw VM <fs af='sUkRikk,v,any,any,any,,unnaw,unna +w'> )) 10 (( NP 10.1 I DEM <fs af='I,pn,any,sg,,,,0' conj="blank" spec="blank +" CASE_NAME="nom" dubi="blank"> 10.2 kRewrawwilAN NN <fs af='kRewraM,n,any,sg,,d,,yilAN' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 10.3 . SYM <fs af='.,punc,,,,,,' poslcat="NM"> )) </Sentence>

Replies are listed 'Best First'.
Re^5: merging a file with a value present in another file
by aaron_baugher (Curate) on Jul 16, 2012 at 20:25 UTC

    I'm not sure why the MCL clause doesn't show up in your output sample. But I'd say you have a two-step process, possibly involving two hashes:

    1. Go through file1, parsing out the beginning and end word in each clause, putting them in a %start hash and an %end hash respectively, with the tag (RP, MCL) as the keys' values.
    2. Go through file2, checking the first word of each line to see if it exists in one of these hashes, and if so, add the appropriate tag to the end of the line.

    The rest is just implementation.

    Aaron B.
    Available for small or large Perl jobs; see my home node.

      I want the MCL tag also.I am new to perl,is it using regular expression that i can parse the start and end of each clause.Also i want the program to run for many such files.

        Yes, you can use a regex to get the words from file1 that you're going to search for and insert into file2. There are many different ways it could be written, but here's one example:

        my $str = q|{RP}makaravilYakkin Sabarimala ayyappanu cArZwwAnulYlYa wi +ruviwAMkUrZ rAjAvAyirunna SrI ciwwirawirunnAlYZ bAlarAmavarZmma natak +k vacca 420 kilogrAM wUkkamulYlYa wafkayafki{/RP}{MCL} sUkRikkunnaw +I kRewrawwilAN.{/MCL}|; while ($str =~ m[ {(\w+)} # a word within {}, capture it \W* # maybe non-word chars (\w+) # first word after tag, capture it [^{]+ # anything but a {, up to... \W # a non-word character (\w+) # last word before tag, capture it \W* # maybe non-word chars {/\1} # ending tag matching captured one above ]xg){ print "$1 $2 $3\n"; # print captured values }

        Aaron B.
        Available for small or large Perl jobs; see my home node.

        Use XML::Parser to get the hashes as mentioned above for RP and MCL tags from f1 file. Then start with f2 file and update the lines by comparing them or checking them in hashes. I believe you need to add end_clause where you find the value of any hash once read again - for this thing keep adding the read keys in some separate data structure so that you can keep a record of read names and add end_clause

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://981978]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-04-19 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found