http://www.perlmonks.org?node_id=11118380


in reply to Divide a list of string into substrings

Another option is to split and join the substrings randomly, trying to improve the score at least somehow. If we deviate too much from the best score found so far, we return to the corresponding solution and continue from there.

It doesn't always found the best solution, but it usually gets pretty close to it. Also, it found another solution with the same score:

'set', 'abcde-', 'efghi', ' ', '123', '45', 'ijkl', 'clr', '+'
Update: this is not a valid solution, as " ", "+", and "45" are too short.

And here's the code:

#! /usr/bin/perl use strict; use warnings; use feature qw{ say }; use Data::Dumper; use List::Util qw{ sum }; use Storable qw{ dclone }; my @list=('set abcde-efghi 12345', 'set abcde-ijkl 12345', 'clr abcde-efghi+123', 'clr abcde-ijkl 12345'); my @divided = map [split //], @list; sub score { my %freq; @freq{@$_} = () for @divided; my $score = (2 * keys %freq) + sum(map length, keys %freq); my $valid = ! grep 3 > length, keys %freq; return $score, $valid } sub concat { my @keys = @_; my $concat = join "", @keys; for my $d (@divided) { for my $i (1 .. $#$d) { splice @$d, $i - 1, 2, $concat if $d->[ $i - 1 ] eq $keys[0] && $d->[$i] eq $keys[1]; } } } sub divide { my @keys = @_; my $concat = join "", @keys; for my $d (@divided) { for my $i (0 .. $#$d) { splice @$d, $i, 1, @keys if $d->[$i] eq $concat; } } } my $best = 'INF'; my $remember = dclone(\@divided); my $count = 0; while (1) { my ($score, $valid) = score(); ++$count if $valid; last if $count > 2000; if ($score < $best && $valid) { $best = $score; $remember = dclone(\@divided); } elsif (! int rand 10) { @divided = @{ dclone($remember) }; } if (int rand 4) { my $i = int rand @divided; my $j = int rand $#{ $divided[$i] }; next unless @{ $divided[$i] } > 1; my @keys = @{ $divided[$i] }[$j, $j + 1]; concat(@keys); my ($after) = score(); divide(@keys) if $after > $score + int rand 3; } else { my $i = int rand @divided; my $j = int rand @{ $divided[$i] }; my $l = 1 + int rand(length($divided[$i][$j]) - 1); next unless $l > 1; my @keys = (substr($divided[$i][$j], 0, $l), substr($divided[$i][$j], $l)); divide(@keys); my ($after) = score(); concat(@keys) if $after > $score + int rand 3; } } say "Score $best"; print Dumper($remember);

Update: Fixed to only report solutions with substrings of length at least 3.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: Divide a list of string into substrings
by tybalt89 (Monsignor) on Jun 23, 2020 at 15:51 UTC

    Violates the second rule, "* The length of each substring should be at least 3 characters."

      True, I forgot about that! Fixed, thanks.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]