Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
"be consistent"
 
PerlMonks  

Re: VarStructor II -- Abbreviation tool

by jeffa (Bishop)
on Jun 01, 2004 at 10:18 UTC ( [id://358082]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to VarStructor II -- Abbreviation tool

Since you give no usage what so ever, i can only speculate just exactly what you want to output to look like. And more importantly, what the "rules" are. But i can say that you have a lot of needless code there. Try this instead:

use Data::Dumper; my (%symbol,@old,@new); my $max = 10; my $ws_replace = '_'; my $letters_only = 1; while (<DATA>) { chomp; push @old, $_; $_ = substr $_,0,10 if $max; s/\s+/$ws_replace/g if $ws_replace; s/\W+//g if $letters_only; $_ .= $. if $symbol{$_}++; push @new, $_; } print Dumper \@old, \@new; my %compare; @compare{@old} = @new; print Dumper \@compare; __DATA__ Line one Line two xxxxxxxxxxx Another line xxxxxxxxxxx Lines end here not xxxxxxxxxxx Fourth line (used to be) Line five Lines end here Lines end here too
And read The Dynamic Duo --or-- Holy Getopt::Long, Pod::UsageMan!

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: Re: VarStructor II -- Abbreviation tool
by Wassercrats (Initiate) on Jun 01, 2004 at 11:04 UTC
    No need to speculate, since the script was set up as a demo, as I said. If you ran it, you would see:
    Line_o Line_t x1 A x2 L1 x3 F Line_f L2 L3

    Your script produces:

    'Line_one', 'Line_two', 'xxxxxxxxxx', 'Another_li', 'xxxxxxxxxx5', 'Lines_end_', 'xxxxxxxxxx7', 'Fourth_lin', 'Line_five', 'Lines_end_10', 'Lines_end_11',

    Which follows a different numbering scheme, using higher numbers. Higher numbers increases the chances of them having more digits, which would require eliminating more of the semantically meaningful letters if the string length is to stay the same. This probably wouldn't matter much to me, but I already have my arguably better way, which doesn't require a module.

      No. There is every need to speculate with your code, Wassercrats. My code was not meant to replace yours, or even reproduce 100% of the functionality. It was to show you that you can do what you are trying to do with much less code! How about this instead: it truncates the string to the first two letters (lowercased) and adds incremented numbers for each collision. And this time i won't require you to use a module!

      my (%symbol,@old,@new); while (<DATA>) { chomp($_ = substr lc($_),0,2); push @old, $_; $symbol{$_}++; push @new, $_ . $symbol{$_}; } print "$_\n" for @new; __DATA__ Line one Line two xxxxxxxxxxx Another line xxxxxxxxxxx Lines end here not xxxxxxxxxxx Fourth line (used to be) Line five Lines end here Lines end here too
      Of course, while i find this to be quite silly -- it is more intuitive than what you have. Besides, you already have your arguable better way.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        I didn't try it, but it sounds like it would work fine for me. In fact, I haven't even implemented my version into the script I want to use it in, but what I'm currently using works too. It's just that what I'm currently using puts no limit on the length of the string (used as a variable name), which could look sloppy, and I tried creating something a little "smarter" than your version. But since my version needs only a slight tweak to function as Text::Abbrev (with added features), I think I'll eventually add that functionality and make it a module.

        By the way, if I made my script a module and didn't include its code in the line count, my code could be even shorter than your module using code.

      It's not clear to me why you have the lines
      Line one Line two Line five
      reduce to
      L1 L2 L3
      but the lines
      Lines end here not Lines end here Lines end here too
      reduce to
      Line_o Line_t Line_f
      I mean, I understand that the latter are treated differently because they're identical within the Max_Length span, whereas the shorter ones aren't. But still... why should that matter? Wouldn't it have been acceptable -- aye, preferable -- to have the latter reduce to (e.g.)
      L4 L5 L6

      ???

      Thanks,
      jdporter
        You got the output example backwards.

        Line one
        Line two
        Line five

        reduces to:

        Line_o
        Line_t
        Line_f

        and:

        Lines end here not
        Lines end here
        Lines end here too

        reduces to:

        L1
        L2
        L3

        The script is intended to create the shortest possible unique string, with a given maximum number of characters from the beginning of the string. When two strings are identical, a number must be appended and since the text is of no use in distinguishing those strings, and the shortest possible unique string contains just one (actually, zero) characters, that's how many characters are kept.

        Whether sticking to that produces the most useful function is another issue, but it does produce the shortest string and puts value on the inclusion of text when the text isn't identical.

        I will be adding an option to have identical text that's appended with a number contain the maximum number of characters instead of just one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://358082]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.