Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Thank you for showing your code. To begin, several recommendations:

  • You have several unused variables that don't seem relevant to this example, such as $filename or $workingpath. For a SSCCE, it's better to remove them.
  • The variable name $InputXmlFile is kind of confusing, since it's a .tex file.
  • It's best to use proper indentation and formatting. perltidy can help with that.
  • Regarding your open, see "open" Best Practices: open my $fh, '<', $filename or die "$filename: $!";
  • Reading an entire file at once is better done with the following "slurp" idiom instead of the join: my $inText = do { local $/; <INTXT> };

Now moving on to the question of how to search and replace text. First of all, reading from input files and writing to output files is described in "Files and I/O" in perlintro - I would recommend reading the whole document, since it's not very long and it gives a good overview of Perl.

As for the algorithm, there are several general approaches I'd consider:

  1. You should look for modules that are able to parse and write TeX/LaTeX.
  2. You could read the entire file into memory and then use regular expressions to process it, the disadvantage being that it won't work well for large files.
  3. You could read the file line-by-line, keeping the current state (i.e. whether \section{...\qr{...}} has been seen, storing the text to be added back in), i.e. a "state machine" type approach. While very powerful and flexible, it can sometimes be a bit more verbose.
  4. Another possibility would be to read the file in "chunks" - in this case, your example input appears to have a blank line between each line you want to process. If that really is the case, you could use $/ to read the file in "paragraph mode" (sections are separated by one or more blank lines).

In the following, I'm using a combination of the third and fourth points, but note that this code makes quite a few assumptions about your input file format, which you haven't shown a lot of. In your real script, you'd have to replace DATA with the filehandle you opened.

use warnings; use strict; local $/ = ""; my $buffer; while (<DATA>) { if ( s/\\section\{.*?\K(\\qr\{.*?\})(?=\})//i ) { $buffer .= $1; } elsif (defined $buffer) { s/^\S+\s+\S+\K/$buffer/; undef $buffer; } print; } print $buffer if defined $buffer; __DATA__ \section{Results\qr{text ... text}} Normal paragraph text here... \section{Funding\qr{text ... text}} Funding text here...

Output:

\section{Results} Normal paragraph\qr{text ... text} text here... \section{Funding} Funding text\qr{text ... text} here...

I'm really not sure how you choose the insertion point for the \qr{} - in the original question, it seems you wanted it inserted after the fourth word in the paragraph, while in this example, it looks like you wanted it inserted after the second word.


In reply to Re^5: String manipulation in a document by haukex
in thread Grep a particular text by ponni

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (6)
    As of 2020-04-06 23:01 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      The most amusing oxymoron is:
















      Results (42 votes). Check out past polls.

      Notices?