http://www.perlmonks.org?node_id=11102692


in reply to Re^4: String manipulation in a document
in thread Grep a particular text

Thank you for showing your code. To begin, several recommendations:

Now moving on to the question of how to search and replace text. First of all, reading from input files and writing to output files is described in "Files and I/O" in perlintro - I would recommend reading the whole document, since it's not very long and it gives a good overview of Perl.

As for the algorithm, there are several general approaches I'd consider:

  1. You should look for modules that are able to parse and write TeX/LaTeX.
  2. You could read the entire file into memory and then use regular expressions to process it, the disadvantage being that it won't work well for large files.
  3. You could read the file line-by-line, keeping the current state (i.e. whether \section{...\qr{...}} has been seen, storing the text to be added back in), i.e. a "state machine" type approach. While very powerful and flexible, it can sometimes be a bit more verbose.
  4. Another possibility would be to read the file in "chunks" - in this case, your example input appears to have a blank line between each line you want to process. If that really is the case, you could use $/ to read the file in "paragraph mode" (sections are separated by one or more blank lines).

In the following, I'm using a combination of the third and fourth points, but note that this code makes quite a few assumptions about your input file format, which you haven't shown a lot of. In your real script, you'd have to replace DATA with the filehandle you opened.

use warnings; use strict; local $/ = ""; my $buffer; while (<DATA>) { if ( s/\\section\{.*?\K(\\qr\{.*?\})(?=\})//i ) { $buffer .= $1; } elsif (defined $buffer) { s/^\S+\s+\S+\K/$buffer/; undef $buffer; } print; } print $buffer if defined $buffer; __DATA__ \section{Results\qr{text ... text}} Normal paragraph text here... \section{Funding\qr{text ... text}} Funding text here...

Output:

\section{Results} Normal paragraph\qr{text ... text} text here... \section{Funding} Funding text\qr{text ... text} here...

I'm really not sure how you choose the insertion point for the \qr{} - in the original question, it seems you wanted it inserted after the fourth word in the paragraph, while in this example, it looks like you wanted it inserted after the second word.