Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to tokenize a RTF file and print it to another file

by mdavies23 (Acolyte)
on Jul 06, 2017 at 20:16 UTC ( [id://1194401]=perlquestion: print w/replies, xml ) Need Help??

mdavies23 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to tokenize a RTF file while keeping the formatting of the font. I can not seem to figure out RTF::Writer to write to another file Any help would be appreciated Heres my code so far

#!/usr/local/bin/perl -w use strict; use Data::Dumper 'Dumper'; use RTF::Writer; use RTF::Tokenizer; use RTF::Parser; die "usage: $0 input output\n" unless @ARGV == 2; my $infile = shift; my $outfile = shift; open my $fh, "<", $infile; open my $output, ">", $outfile; my $tokenizer = RTF::Tokenizer->new(); $tokenizer->read_file($fh); my $writer = RTF::Writer->new_to_file($output); my ( $token_type, $argument, $parameter ); { # reduce bogus warnings no warnings 'uninitialized'; # get past the header ( $token_type, $argument, $parameter ) = $tokenizer->get_token() +until ($token_type eq 'control' and $argument eq 'pard'); } while ($token_type ne 'eof'){ ( $token_type, $argument, $parameter ) = $tokenizer->get_token(); print "$argument " if $token_type eq 'text'; }

Replies are listed 'Best First'.
Re: How to tokenize a RTF file and print it to another file (Update with solution)
by thanos1983 (Parson) on Jul 06, 2017 at 20:37 UTC

    Hello mdavies23,

    Welcome to the monastery. Very I have not spend time to try to resolve your question but with a quick look that I see I can recommend a few things.

    First of all please provide us sample of Input file/data and also the expected format that you want to have. By doing this we can play around and we can match your expectations.

    Regarding the code its self. Every time you open a file handle or close one use die or warn.

    I do not see on your code closing the file handles.

    Provide us all the requested information and we will be more than happy to assist you.

    Update: It seems that your solution comes from question rtf to txt conversion. So I combine it with the module RTF::Writer and the full code is provided bellow. I used an input file as the <DATA> that I provide bellow.

    #!usr/bin/perl use strict; use warnings; use RTF::Writer; use Data::Dumper; use RTF::Tokenizer; die "usage: $0 input output\n" unless @ARGV == 2; my $infile = shift; my $outfile = shift; my $tokenizer = RTF::Tokenizer->new(); $tokenizer->read_file($infile); my ( $token_type, $argument, $parameter ); { # reduce bogus warnings no warnings 'uninitialized'; # get past the header ( $token_type, $argument, $parameter ) = $tokenizer->get_token() until ($token_type eq 'control' and $argument eq 'pard'); } my @final; while ($token_type ne 'eof'){ ( $token_type, $argument, $parameter ) = $tokenizer->get_token(); push @final, $argument if $token_type eq 'text'; } my $rtf = RTF::Writer->new_to_file($outfile); $rtf->print(\@final); $rtf->close; __END__ {La dame p\'2eLa dameToc toc Il a ferm la porteLes lys du jardin sont +fltrisQuel est donc ce mort qu'on emporteTu viens de toquer sa porte +Et trotte trotteTrotte la petite souris Guillaume Apollinaire, Alcool +sVocabularytoc (n\'2em\'2e) tap; knocklys (n\'2em\'2e) lilyfltrir ( +v\'2eitr\'2e) to wilt; for a flower or beauty to fade;for a plant to + withermort (adj\'2e, here used as a masc\'2e noun) deademporter (v\ +'2etr\'2e) to take a person or thing [somewhere];to take[out/away/et +c\'2e] or carry[away] a thingtoquer (v\'2eitr\'2e) to tap; to knockt +rotter (v\'2eitr\'2e) to trot; to scurrysouris (n\'2ef\'2e) mouse\' +46ree TranslationClick click He closed the door / Garden lilies faded + / Which body is today// You just tapped on the door / And tip toe / Taps the little + mouse Translation Sean M\'2e Burke, 2001}

    Of course if you want to get a formatted output you have to spend some time and use the functions mentioned on the RTF::Writer/FUNCTIONS module.

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
      Hi thanos1983,

      while I usually close my files when I am done with them (especially files in write or append mode), if only because I can look at their content even if the program is still running and doing other things (also useful if it is needed to copy them or to change their mode/permissions), closing file is done implicitly by Perl when you reopen the same file, when the handle goes out of lexical scope, or at the latest when the Perl program completes.

      So, IMHO, closing files is indeed good practice, but not doing it explicitly is unlikely to be the OP's problem.

        Hello Laurent_R

        I do agree with you regarding the fact that you might want to access the data again and again, but if this is the case why not load the data into an array and access the array when ever you want instead of keeping the fh open?

        Sample of loading data into an array from file:

        open my $handle, '<', $path_to_file or die "Not able to open file: $!" +; chomp(my @lines = <$handle>); close $handle or warn "Not able to close file: $!";

        Unless if you mean writing to a file. But even though if this was the case I would still prefer to use an array alter the data as much as I want and then when I decide I could use a foreach loop or a join to write the data.

        Sample, pseudo code:

        open my $fh, '>', "output.txt" or die "Cannot open output.txt: $!"; foreach (@lines) { print $fh "$_\n"; } close $fh or warn "Not able to close output.txt: $!";

        An alternative way with join:

        print $fh join ("\n", @lines);

        I am not saying implying that I am correct, I am just trying to learn more. :D

        Let me know what are your thoughts, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!

      Right now I only have the number "40" int RTF file. The program takes in the input file then the output file in the parameters. I will be sure to add code to close my file handle Thanks

        Hello again mdavies23,

        I recommend the closing of file handles because the output of you text might be still on the buffer. Read here why it is important to use close.

        Maybe your code works straight out of the box with the close fh function I do not know yet. I will create an RTF document and I will start playing with it.

        We are here to assist and guide. I hope you are not taking my suggestions in an insulting way. We all where beginners and we are still are in different areas. Just keep reading as much possible documentation provided here or on the network and you will learn really really fast.

        Update: Sample of RTF file provided bellow. In case that someone wants to play with. The source of the code is The_RTF_Cookbook/SAMPLE COMPLETE RTF DOCUMENT.

        {\rtf1\ansi\deff0 {\fonttbl {\f0 Times New Roman;} } \deflang1033\widowctrl {\header\pard\qr\plain\f0{\noproof\i La dame} p.\chpgn\par} \lang1036\fs36 {\pard\qc\f1\fs60\b\i La dame\par} {\pard\sb300\li900 Toc toc Il a ferm\'e9 la porte\line Les lys du jardin sont fl\'e9tris\line Quel est donc ce mort qu'on emporte \par} {\pard\sb300\li900 Tu viens de toquer \'e0 sa porte\line Et trotte trotte\line Trotte la petite souris \par} {\pard\sb900\li900\scaps\fs44 \endash Guillaume Apollinaire, {\i Alcools}\par} \page\lang1033\fs32 {\pard\b\ul\fs40 Vocabulary\par} {\pard\li300\fi-150{\noproof\b toc }{\i(n.m.)} \endash tap; knock\p +ar} {\pard\li300\fi-150{\noproof\b lys }{\i(n.m.)} \endash lily\par} {\pard\li300\fi-150{\noproof\b fl\'e9trir } {\i(v.itr.)} \endash to wilt; for a flower or beauty to fade; for a plant to wither\par} {\pard\li300\fi-150{\noproof\b mort } {\i(adj., here used as a masc. noun)} \endash dead\par} {\pard\li300\fi-150{\noproof\b emporter } {\i(v.tr.)} \endash to take a person or thing [somewhere]; to take\~[out/away/etc.] or carry\~[away] a thing\par} {\pard\li300\fi-150{\noproof\b toquer } {\i(v.itr.)} \endash to tap; to knock\par} {\pard\li300\fi-150{\noproof\b trotter } {\i(v.itr.)} \endash to trot; to scurry\par} {\pard\li300\fi-150{\noproof\b souris }{\i(n.f.)} \endash mouse\par +} {\pard\sb200\b\ul\fs40 Free Translation\par} {\pard Click click He closed the door / Garden lilies faded / Which body is + today // You just tapped on the door / And tip toe / Taps the little mous +e \line \_Translation Sean M. Burke, 2001 \par} }

        Hope this helps, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!

      What functions would I use to get the formatting correct? I tried rtfesc but it didn't really do anything

        Hello mdavies23,

        Take a look at the RTF::Writer/METHODS.

        For example, from the documentation:

        $h->paragraph(...); This makes the items in the list (...) into a paragraph. Basically + just a wrapper for $h->print([ \'{\par', ..., \'\pard}', ])

        Hope this helps, BR

        Seeking for Perl wisdom...on the process of learning...not there...yet!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1194401]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found