qazwart has asked for the wisdom of the Perl Monks concerning the following question:
I have two strings:
//depot/Efp/Celox/CELOX-3.5.0/CeloxStart.sln
//depot/Efp/Celox/MAIN/CeloxStart.sln
What I want to do is remove the part from both the beginning and end of this these two strings that match, and only leave the part that doesn't match.
In other words, I want to remove <font//depot/Efp/Celox from the front of the two strings and /CeloxStart.sln from the rear of these two strings. Leaving string #1 as CELOX-3.5.0 and the second string as MAIN
Now, I know I can parse the strings one character at a time, but I was hoping for a method of mashing the two strings together in one or two steps -- maybe using regular expressions or maybe ANDing or ORing the two strings together, but how?
Any ideas of an efficient way of doing this? I have thousands of these records to process, and I want to do it as efficiently as possible.
Re: Selecting the difference between two strings
by GrandFather (Saint) on Sep 26, 2006 at 22:26 UTC
|
Not And or Or, but Xor (^):
use strict;
use warnings;
my $str1 = '//depot/Efp/Celox/CELOX-3.5.0/CeloxStart.sln';
my $str2 = '//depot/Efp/Celox/MAIN/CeloxStart.sln';
my $str1r = $str1;
my $str2r = $str2;
# Find and strip common prefix
my $match = ($str1 ^ $str2) =~ /^(\x00*)/;
substr $str1r, 0, $+[1], '' if $match;
substr $str2r, 0, $+[1], '' if $match;
# Find and strip common suffix
$match = (reverse ($str1) ^ reverse ($str2)) =~ /^(\x00*)/;
substr $str1r, -$+[1], length ($str1r), '' if $match;
substr $str2r, -$+[1], length ($str2r), '' if $match;
print "$str1r\n$str2r\n";
Prints:
CELOX-3.5.0
MAIN
Update: process both strings
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
There we go!
I figured out that it was XOR on my way home, but I wasn't quite sure how to apply it. Plus, I forgot about the reverse function which is what I need in order to strip the end of the string. Instead, I was going to pad left the strings to make them the same length.
Thank you for your answer. I'll try it out when I get to work tomorrow.
| [reply] |
Re: Selecting the difference between two strings
by hgolden (Pilgrim) on Sep 26, 2006 at 21:06 UTC
|
HeyI'm sure that someone will come up with a better way of doing this, but my first thought is to split those strings by "/" into arrays, and then use List::Compare to find the intersections, which you can then easily remove. Hays Update: Or much better, split then use Algorithim::Diff. Update question inspired by the response below: Are the comparisons just pairwise, or are you doing many at a time? The array solution above works for an arbitrary number, but if the comparisons are just two at a time, the below code is much slicker. | [reply] |
Re: Selecting the difference between two strings
by duff (Parson) on Sep 26, 2006 at 21:17 UTC
|
#!/usr/bin/perl
my $s1 = "//depot/Efp/Celox/CELOX-3.5.0/CeloxStart.sln";
my $s2 = "//depot/Efp/Celox/MAIN/CeloxStart.sln";
@s1 = split "/", $s1;
@s2 = split "/", $s2;
my $i = 0; $i++ while $s1[$i] eq $s2[$i]; # find end of common begin
+ning
my $j = -1; $j-- while $s1[$j] eq $s2[$j]; # find beginning of common
+ end
print join("/", @s1[$i..(@s1+$j)]), "\n";
print join("/", @s2[$i..(@s2+$j)]), "\n";
There are probably better ways :-)
update: Though I have to wonder why you're worrying about efficiency. You only have to do this for "thousands of records". Is that really "thousands of records per second" or something where efficienct execution really matters? Or are you just prematurely optimizing?
| [reply] [d/l] |
|
This problem has to do with the Perforce version control system, and finding the branching structure in the source archive. I am parsing the set of file merges and branches and finding the branching structure of the source archive. There could easily be tens of thousands or hundreds of thousands of these records that have to be parsed.
The truth is I can take a few shortcuts to make this task easier. For example, it is common practice in Perforce to put branch names in all caps, so I could really search for the first instance of m</[A-Z_]+/> and find what I'm looking for. Or, I can simply assume that the fourth set of characters between the slashes is the name of the branch because that's where it is in my particular source archive.
However, I look at this and say These are strings, I want to manipulate them, and that's what you use Perl for. There's gotta be a way to simply say "Take these two strings and snip the matching parts". Maybe not quite that simple, but you get the idea.
I'm really just looking for a Perl solution to this problem. Something that will make Python programmers weep and shell scripters green with envy.
| [reply] |
|
I don't know much about Perforce, but from what you've said, there are probably easier ways to do this than what you're asking for.If all of the information is of the format that you show in your original post, you could pretty easily form an indented list showing the branches. For simplicity, let's call the text between slashes units. Now, sort the list ASCIbetically, and do a very simple loop that preserves the entry above. By comparing the current entry to the above entry, you could determine the number of tabs to print before the current entry. I.E if they have no units in common, we would print no spaces. If they have three units in common, but are both longer than three units, then we insert three tabs. If they have three units in common, but the above entry is only three units long, then we need four tabs. Anyway, you can fit the particulars to what you want. To me, a Perl solution is one that's elegant in Perl, and not one that takes advantage of Perl's string abilities. Hays
| [reply] |
|
|
- branch1 -
/- branch2 -\
/-- branch3 --\
*** ---- branch4 ----[same sub-structure for all branches]
\-- branch5 --/
\- branch6 -/
\ ... /
Or is it more complicated? Do some branches only contain one or another subset of the overall structure (branch FOO contains everything, but branch BAR only contains component X)? Or are there additional versioning branches at lower levels along some paths (branch FOO contains components X, Y and Z, but there are two versions of Y within FOO)?
If it's really as simple as my crude diagram, you only need to look for common strings from one direction (left-to-right, or "top-down"). If it's not that simple, you need to be a little more explicit about what you want to derive from the overall structure. What sort of representation for the (more complicated) branching structure would be desirable?
(Not in the sense of "tell us what you want so we can do it for you", but rather "make sure you really know what you want so you work on solving the right problems".) | [reply] [d/l] |
|
Re: Selecting the difference between two strings
by jwkrahn (Abbot) on Sep 27, 2006 at 03:26 UTC
|
$ perl -le'
my $string1 = q[//depot/Efp/Celox/CELOX-3.5.0/CeloxStart.sln];
my $string2 = q[//depot/Efp/Celox/MAIN/CeloxStart.sln];
my $pre = length( ( ( $string1 ^ $string2 ) =~ /^(\0*)/ )[ 0 ] );
my $post = length( ( ( reverse( $string1 ) ^ reverse( $string2 ) ) =~
+/^(\0*)/ )[ 0 ] );
+
print for substr( $string1, $pre, -$post ), substr( $string2, $pre, -$
+post )
'
CELOX-3.5.0
MAIN
| [reply] [d/l] |
|
| [reply] |
Re: Selecting the difference between two strings
by Fletch (Bishop) on Sep 27, 2006 at 01:59 UTC
|
| [reply] |
|
|