Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

If I were to generate a sequence of all possible alphanumeric ASCII characters, it would be trivially simple. I would create an array as follows:

 my @chars = qw( 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z );

I would then use the mersenne twister random number generator to select a character from the array to add onto one of the random sequences in the sample of such sequences I am attempting to construct. In fact, my favourite password generator uses a modification of this to produce very strong, memorable, passwords. The question is, though, what would I add to @chars in order to be able to generate a set of random sequences that, together, are certain to contain all possible valid UTF-8 characters (some of which, I understand, can need as many as 6 bytes to represent them). Or is there a better way to generate samples of random sequences in which the sample is certain to completely cover the sample space? This is with the caveat that I need only alpha-numeric characters as the purpose involves testing the ability of my code to handle text and numbers entered by a user on a UTF-8 encoded web page. Thus, non-printable characters, control characters, &c, while they may be well defined, are not of interest. I will need to untaint this data, and store it in my DB.

The real problem I need to address is how best to manage a transition of our system (which is in production), from a state in which everything is encoded as latin1 to a state in which everything is encoded in utf-8. I thought, until I have sufficient time to test everything from the transformation of our tables from latin1 through binary to utf-8, and how well our code behaves when it is dealing only with utf-8, I'd first convert one form to utf-8 and then, on the server side, convert the text received from the form from utf-8 to latin1 (and then, when a user wants to see it, convert it back from latin1 to utf-8). But I want a good sample to test to determine whether or not such conversions are reversable.

I'd welcome suggestions for handling either the random sequences of UTF-8 characters or a transaition of a data driven web applcation from latin1 to utf-8, or both, well.

Thanks

Ted


In reply to How to generate random sequence of UTF-8 characters by ted.byers

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (5)
    As of 2015-07-03 21:07 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (56 votes), past polls