Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Greetings to all,

I asked in the chat window several days ago about how to accomplish this, and tye provided me a good answer using map and sort. Unfortunately, my laptop crashed shortly thereafter, and I lost his answer. (That'll teach me, ha!) However, there are a couple of complicating factors that tye may not have addressed even then, and I'm looking for wisdom on a succinct and safe way of accomplishing this.

Here's what I have:

  1. A file containing a tab-delimited list of words to exchange for modern spellings/equivalents, followed by a third column for any stopwords which should not have substitutions done in them.
  2. A file containing a list of files in which substitutions must be made.
  3. Over a hundred such files needing to be updated.
  4. The target language is Asian, where 1) there are no spaces between words; and 2) the encoding will be UTF-8. (This is significant, because any regexp must be sensitive to this, or it will fail.)
Here's an "English-ised" example of the words list file:

hasn'thas not 

So, what I need to do is substitute each word in the first column for the word(s) in the second column, except where the word in the stopwords column is matched. While this seems like a simple scenario, I'm struggling to wrap my brain around it. I'm just beginning to grasp the concepts of map and join, and their syntax, but would much appreciate some ideas for how to accomplish this.



In reply to Efficient selective substitution on list of words by Polyglot

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [erix]: Mozart on github
    [Corion]: erix: Heh ;) Transcribing/ writing notes is a good thing, at least for the stuff out of copyright!

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (5)
    As of 2018-06-24 07:29 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (126 votes). Check out past polls.