http://www.perlmonks.org?node_id=1000816


in reply to Extract string after removing the substring

G'day viktor,

I initially came up with an alternative solution using regex captures:

$ perl -Mstrict -Mwarnings -E ' my ($string, $offset, $length) = qw{ATATTTATATTAT 0 3}; $string =~ /^(.{$offset})(.{$length})(.*)$/; say "Extracted: ", $2 // ""; say "Remainder: ", ($1 // "") . ($3 // ""); ' Extracted: ATA Remainder: TTTATATTAT

I then considered that this could be made into a function that was called like substr (i.e. substr EXPR, OFFSET, LENGTH). This originally looked something like this:

sub split_string ($$$) { my ($string, $offset, $length) = @_; $string =~ /^(.{$offset})(.{$length})(.*)$/; return (($1 // ''), ($2 // ''), ($3 // '')); }

This can be called as split_string EXPR, OFFSET, LENGTH and returns a three-element array consisting of: whatever was on the left of the extracted string; the extracted string itself; and, whatever was on the right of the extracted string. Simple usage would be something like:

my ($left, $extract, $right) = split_string $string, $offset, $length; my $remainder = $left . $right;

Of course, substr also allows a negative OFFSET, a negative LENGTH and the omission of LENGTH altogether. While I realise this is possibly approaching overkill for your requirements, I tinkered with the code to add this functionality. The code, test data and output is quite lengthy: click on the Read More link to view. [Note: unlike substr, split_string does not take a REPLACEMENT argument nor can it be used as an lvalue.]

#!/usr/bin/env perl use 5.010; use strict; use warnings; sub split_string ($$;$) { my ($string, $offset, $length) = @_; die 'Input string is undefined!' unless defined $string; die 'Input string is a reference!' if ref $string; my $str_len = length $string; die 'Input string has zero length!' unless $str_len; die 'Offset is undefined!' unless defined $offset; die 'Offset is a reference!' if ref $offset; die 'Offset not an integer!' unless ''.$offset =~ /^[+-]?\d+$/; $offset = $str_len + $offset if $offset < 0; die 'Offset out of bounds!' if $offset >= $str_len; $length //= $str_len - $offset; die 'Length is a reference!' if ref $length; die 'Length not an integer!' unless $length =~ /^[+-]?\d+$/; if ($length < 0) { die 'Negative length out of bounds!' if abs $length >= $str_le +n; $string = substr $string, 0, $length; $str_len = length $string; die 'Offset out of bounds for negative length!' if $offset >= +$str_len; $length = $str_len - $offset; } else { die 'Length out of bounds!' if $offset + $length > $str_len; } $string =~ /^(.{$offset})(.{$length})(.*)$/; return (($1 // ''), ($2 // ''), ($3 // '')); } my @test_data = ( [ qw{ATATTTATATTAT 0 3} ], [ qw{1234567890 0 4} ], [ qw{1234567890 3 4} ], [ qw{1234567890 6 4} ], [ qw{1234567890 9 1} ], [ undef, 0, 1 ], [ {}, 0, 1 ], [ '', 0, 1 ], [ '1234567890', undef, 1 ], [ '1234567890', [], 1 ], [ '1234567890', 'not a number', 1 ], [ '1234567890', 1.1, 1 ], [ '1234567890', 10, 1 ], [ '1234567890', 1 ], [ '1234567890', 1, sub {1} ], [ '1234567890', 1, 'not a number' ], [ '1234567890', 1, 1.1 ], [ '1234567890', -3, 2 ], [ '1234567890', 0, 10 ], [ '1234567890', 1, 10 ], [ qw{1234567890 0 -10} ], [ qw{1234567890 3 -6} ], [ qw{1234567890 3 -7} ], [ qw{1234567890 5 0} ], ); for (@test_data) { my ($string, $offset, $length) = @$_; my $i_string = $string // '<undef>'; my $i_offset = $offset // '<undef>'; my $i_length = $length // '<undef>'; say "string[$i_string] offset[$i_offset] length[$i_length]"; my ($left, $extract, $right) = eval { split_string $string, $offset, $length; }; if ($@) { warn '! ', $@; say '-' x 72; next; } say "left[$left] extract[$extract] right[$right]", " joined[@{[$left . $right]}]"; say '-' x 72; }

Output:

$ pm_substr_and_remainder.pl string[ATATTTATATTAT] offset[0] length[3] left[] extract[ATA] right[TTTATATTAT] joined[TTTATATTAT] ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[4] left[] extract[1234] right[567890] joined[567890] ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[4] left[123] extract[4567] right[890] joined[123890] ---------------------------------------------------------------------- +-- string[1234567890] offset[6] length[4] left[123456] extract[7890] right[] joined[123456] ---------------------------------------------------------------------- +-- string[1234567890] offset[9] length[1] left[123456789] extract[0] right[] joined[123456789] ---------------------------------------------------------------------- +-- string[<undef>] offset[0] length[1] ! Input string is undefined! at ./pm_substr_and_remainder.pl line 10. ---------------------------------------------------------------------- +-- string[HASH(0x7fbeaa0320a8)] offset[0] length[1] ! Input string is a reference! at ./pm_substr_and_remainder.pl line 11 +. ---------------------------------------------------------------------- +-- string[] offset[0] length[1] ! Input string has zero length! at ./pm_substr_and_remainder.pl line 1 +3. ---------------------------------------------------------------------- +-- string[1234567890] offset[<undef>] length[1] ! Offset is undefined! at ./pm_substr_and_remainder.pl line 14. ---------------------------------------------------------------------- +-- string[1234567890] offset[ARRAY(0x7fbeaa0369d0)] length[1] ! Offset is a reference! at ./pm_substr_and_remainder.pl line 15. ---------------------------------------------------------------------- +-- string[1234567890] offset[not a number] length[1] ! Offset not an integer! at ./pm_substr_and_remainder.pl line 16. ---------------------------------------------------------------------- +-- string[1234567890] offset[1.1] length[1] ! Offset not an integer! at ./pm_substr_and_remainder.pl line 16. ---------------------------------------------------------------------- +-- string[1234567890] offset[10] length[1] ! Offset out of bounds! at ./pm_substr_and_remainder.pl line 18. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[<undef>] left[1] extract[234567890] right[] joined[1] ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[CODE(0x7fbeaa036df0)] ! Length is a reference! at ./pm_substr_and_remainder.pl line 20. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[not a number] ! Length not an integer! at ./pm_substr_and_remainder.pl line 21. ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[1.1] ! Length not an integer! at ./pm_substr_and_remainder.pl line 21. ---------------------------------------------------------------------- +-- string[1234567890] offset[-3] length[2] left[1234567] extract[89] right[0] joined[12345670] ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[10] left[] extract[1234567890] right[] joined[] ---------------------------------------------------------------------- +-- string[1234567890] offset[1] length[10] ! Length out of bounds! at ./pm_substr_and_remainder.pl line 31. ---------------------------------------------------------------------- +-- string[1234567890] offset[0] length[-10] ! Negative length out of bounds! at ./pm_substr_and_remainder.pl line +24. ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[-6] left[123] extract[4] right[] joined[123] ---------------------------------------------------------------------- +-- string[1234567890] offset[3] length[-7] ! Offset out of bounds for negative length! at ./pm_substr_and_remaind +er.pl line 27. ---------------------------------------------------------------------- +-- string[1234567890] offset[5] length[0] left[12345] extract[] right[67890] joined[1234567890] ---------------------------------------------------------------------- +--

-- Ken