Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Extract string after removing the substring

by kcott (Abbot)
on Oct 25, 2012 at 10:45 UTC ( #1000816=note: print w/ replies, xml ) Need Help??


in reply to Extract string after removing the substring

G'day viktor,

I initially came up with an alternative solution using regex captures:

$ perl -Mstrict -Mwarnings -E ' my ($string, $offset, $length) = qw{ATATTTATATTAT 0 3}; $string =~ /^(.{$offset})(.{$length})(.*)$/; say "Extracted: ", $2 // ""; say "Remainder: ", ($1 // "") . ($3 // ""); ' Extracted: ATA Remainder: TTTATATTAT

I then considered that this could be made into a function that was called like substr (i.e. substr EXPR, OFFSET, LENGTH). This originally looked something like this:

sub split_string ($$$) { my ($string, $offset, $length) = @_; $string =~ /^(.{$offset})(.{$length})(.*)$/; return (($1 // ''), ($2 // ''), ($3 // '')); }

This can be called as split_string EXPR, OFFSET, LENGTH and returns a three-element array consisting of: whatever was on the left of the extracted string; the extracted string itself; and, whatever was on the right of the extracted string. Simple usage would be something like:

my ($left, $extract, $right) = split_string $string, $offset, $length; my $remainder = $left . $right;

Of course, substr also allows a negative OFFSET, a negative LENGTH and the omission of LENGTH altogether. While I realise this is possibly approaching overkill for your requirements, I tinkered with the code to add this functionality. The code, test data and output is quite lengthy: click on the Read More link to view. [Note: unlike substr, split_string does not take a REPLACEMENT argument nor can it be used as an lvalue.]

#!/usr/bin/env perl use 5.010; use strict; use warnings; sub split_string ($$;$) { my ($string, $offset, $length) = @_; die 'Input string is undefined!' unless defined $string; die 'Input string is a reference!' if ref $string; my $str_len = length $string; die 'Input string has zero length!' unless $str_len; die 'Offset is undefined!' unless defined $offset; die 'Offset is a reference!' if ref $offset; die 'Offset not an integer!' unless ''.$offset =~ /^[+-]?\d+$/; $offset = $str_len + $offset if $offset < 0; die 'Offset out of bounds!' if $offset >= $str_len; $length //= $str_len - $offset; die 'Length is a reference!' if ref $length; die 'Length not an integer!' unless $length =~ /^[+-]?\d+$/; if ($length < 0) { die 'Negative length out of bounds!' if abs $length >= $str_le +n; $string = substr $string, 0, $length; $str_len = length $string; die 'Offset out of bounds for negative length!' if $offset >= +$str_len; $length = $str_len - $offset; } else { die 'Length out of bounds!' if $offset + $length > $str_len; } $string =~ /^(.{$offset})(.{$length})(.*)$/; return (($1 // ''), ($2 // ''), ($3 // '')); } my @test_data = ( [ qw{ATATTTATATTAT 0 3} ], [ qw{1234567890 0 4} ], [ qw{1234567890 3 4} ], [ qw{1234567890 6 4} ], [ qw{1234567890 9 1} ], [ undef, 0, 1 ], [ {}, 0, 1 ], [ '', 0, 1 ], [ '1234567890', undef, 1 ], [ '1234567890', [], 1 ], [ '1234567890', 'not a number', 1 ], [ '1234567890', 1.1, 1 ], [ '1234567890', 10, 1 ], [ '1234567890', 1 ], [ '1234567890', 1, sub {1} ], [ '1234567890', 1, 'not a number' ], [ '1234567890', 1, 1.1 ], [ '1234567890', -3, 2 ], [ '1234567890', 0, 10 ], [ '1234567890', 1, 10 ], [ qw{1234567890 0 -10} ], [ qw{1234567890 3 -6} ], [ qw{1234567890 3 -7} ], [ qw{1234567890 5 0} ], ); for (@test_data) { my ($string, $offset, $length) = @$_; my $i_string = $string // '<undef>'; my $i_offset = $offset // '<undef>'; my $i_length = $length // '<undef>'; say "string[$i_string] offset[$i_offset] length[$i_length]"; my ($left, $extract, $right) = eval { split_string $string, $offset, $length; }; if ($@) { warn '! ', $@; say '-' x 72; next; } say "left[$left] extract[$extract] right[$right]", " joined[@{[$left . $right]}]"; say '-' x 72; }

Output:

$ pm_substr_and_remainder.pl string[ATATTTATATTAT] offset[0] length[3] left[] extract[ATA] right[TTTATATTAT] joined[TTTATATTAT] ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[4] left[] extract[1234] right[567890] joined[567890] ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[4] left[123] extract[4567] right[890] joined[123890] ---------------------------------------------------------------------- +-- string[1234567890] offset[6] length[4] left[123456] extract[7890] right[] joined[123456] ---------------------------------------------------------------------- +-- string[1234567890] offset[9] length[1] left[123456789] extract[0] right[] joined[123456789] ---------------------------------------------------------------------- +-- string[<undef>] offset[0] length[1] ! Input string is undefined! at ./pm_substr_and_remainder.pl line 10. ---------------------------------------------------------------------- +-- string[HASH(0x7fbeaa0320a8)] offset[0] length[1] ! Input string is a reference! at ./pm_substr_and_remainder.pl line 11 +. ---------------------------------------------------------------------- +-- string[] offset[0] length[1] ! Input string has zero length! at ./pm_substr_and_remainder.pl line 1 +3. ---------------------------------------------------------------------- +-- string[1234567890] offset[<undef>] length[1] ! Offset is undefined! at ./pm_substr_and_remainder.pl line 14. ---------------------------------------------------------------------- +-- string[1234567890] offset[ARRAY(0x7fbeaa0369d0)] length[1] ! Offset is a reference! at ./pm_substr_and_remainder.pl line 15. ---------------------------------------------------------------------- +-- string[1234567890] offset[not a number] length[1] ! Offset not an integer! at ./pm_substr_and_remainder.pl line 16. ---------------------------------------------------------------------- +-- string[1234567890] offset[1.1] length[1] ! Offset not an integer! at ./pm_substr_and_remainder.pl line 16. ---------------------------------------------------------------------- +-- string[1234567890] offset[10] length[1] ! Offset out of bounds! at ./pm_substr_and_remainder.pl line 18. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[<undef>] left[1] extract[234567890] right[] joined[1] ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[CODE(0x7fbeaa036df0)] ! Length is a reference! at ./pm_substr_and_remainder.pl line 20. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[not a number] ! Length not an integer! at ./pm_substr_and_remainder.pl line 21. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[1.1] ! Length not an integer! at ./pm_substr_and_remainder.pl line 21. ---------------------------------------------------------------------- +-- string[1234567890] offset[-3] length[2] left[1234567] extract[89] right[0] joined[12345670] ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[10] left[] extract[1234567890] right[] joined[] ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[10] ! Length out of bounds! at ./pm_substr_and_remainder.pl line 31. ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[-10] ! Negative length out of bounds! at ./pm_substr_and_remainder.pl line +24. ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[-6] left[123] extract[4] right[] joined[123] ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[-7] ! Offset out of bounds for negative length! at ./pm_substr_and_remaind +er.pl line 27. ---------------------------------------------------------------------- +-- string[1234567890] offset[5] length[0] left[12345] extract[] right[67890] joined[1234567890] ---------------------------------------------------------------------- +--

-- Ken


Comment on Re: Extract string after removing the substring
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1000816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2015-07-02 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (27 votes), past polls