Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: String manipulation in a document

by haukex (Chancellor)
on Jul 11, 2019 at 15:50 UTC ( #11102692=note: print w/replies, xml ) Need Help??


in reply to Re^4: String manipulation in a document
in thread Grep a particular text

Thank you for showing your code. To begin, several recommendations:

  • You have several unused variables that don't seem relevant to this example, such as $filename or $workingpath. For a SSCCE, it's better to remove them.
  • The variable name $InputXmlFile is kind of confusing, since it's a .tex file.
  • It's best to use proper indentation and formatting. perltidy can help with that.
  • Regarding your open, see "open" Best Practices: open my $fh, '<', $filename or die "$filename: $!";
  • Reading an entire file at once is better done with the following "slurp" idiom instead of the join: my $inText = do { local $/; <INTXT> };

Now moving on to the question of how to search and replace text. First of all, reading from input files and writing to output files is described in "Files and I/O" in perlintro - I would recommend reading the whole document, since it's not very long and it gives a good overview of Perl.

As for the algorithm, there are several general approaches I'd consider:

  1. You should look for modules that are able to parse and write TeX/LaTeX.
  2. You could read the entire file into memory and then use regular expressions to process it, the disadvantage being that it won't work well for large files.
  3. You could read the file line-by-line, keeping the current state (i.e. whether \section{...\qr{...}} has been seen, storing the text to be added back in), i.e. a "state machine" type approach. While very powerful and flexible, it can sometimes be a bit more verbose.
  4. Another possibility would be to read the file in "chunks" - in this case, your example input appears to have a blank line between each line you want to process. If that really is the case, you could use $/ to read the file in "paragraph mode" (sections are separated by one or more blank lines).

In the following, I'm using a combination of the third and fourth points, but note that this code makes quite a few assumptions about your input file format, which you haven't shown a lot of. In your real script, you'd have to replace DATA with the filehandle you opened.

use warnings; use strict; local $/ = ""; my $buffer; while (<DATA>) { if ( s/\\section\{.*?\K(\\qr\{.*?\})(?=\})//i ) { $buffer .= $1; } elsif (defined $buffer) { s/^\S+\s+\S+\K/$buffer/; undef $buffer; } print; } print $buffer if defined $buffer; __DATA__ \section{Results\qr{text ... text}} Normal paragraph text here... \section{Funding\qr{text ... text}} Funding text here...

Output:

\section{Results} Normal paragraph\qr{text ... text} text here... \section{Funding} Funding text\qr{text ... text} here...

I'm really not sure how you choose the insertion point for the \qr{} - in the original question, it seems you wanted it inserted after the fourth word in the paragraph, while in this example, it looks like you wanted it inserted after the second word.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11102692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2020-02-17 04:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (70 votes). Check out past polls.

    Notices?