Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Help to override Spreadsheet/ParseXLSX.pm module

by boleary (Scribe)
on Apr 02, 2021 at 20:49 UTC ( #11130732=perlquestion: print w/replies, xml ) Need Help??

boleary has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for a clever way to override the SpreadSheet/ParseXLSX.pm module so I can add some tweaks to it without having to touch the installed library (on many of my different hosts) I'm pretty weak on object oriented tricks, so I'm hoping someone can tell me a clever way to accomplish this. I presently have the modified Module working with the changes I discuss below... I just really didn't want to touch the actual library module like I did because its locally installed and I'm not tracking changes on them.

The current Spreadsheet::ParseXLSX module is great,
but it parses EVERY sheet in a workbook when ReadData is called and that can be quite time consuming for large spreadsheets with many tabs!

I have 2 workarounds I can add to that module by passing in some extra options when I call the Spreadsheet::ReadData Method
The ReadData function cleverly passes any unrecognized options to the ParseXLSX new function, so I can easily extend the option set like shown...

sub new { my $class = shift; my (%args) = @_; my $self = bless {}, $class; $self->{Password} = $args{Password} if defined $args{Password}; # #my new options # $self->{sheet_filter} = $args{sheet_filter} if defined $args{sheet +_filter}; $self->{just_find_sheet_names} = $args{just_find_sheet_names} if d +efined $args{just_find_sheet_names}; return $self; }

then down in _parse_workbook, I can change:

my @sheets = map { my $idx = $_->att('rels:id'); if ($files->{sheets}{$idx}) { my $sheet = Spreadsheet::ParseExcel::Worksheet->new( Name => $_->att('name'), _Book => $workbook, _SheetNo => $idx, ); $sheet->{SheetHidden} = 1 if defined $_->att('state') and $_ +->att('state') eq 'hidden'; $self->_parse_sheet($sheet, $files->{sheets}{$idx}); ($sheet) } else { () } } $files->{workbook}->find_nodes('//s:sheets/s:sheet');

to this where It will only parse the sheets whose names match the sheet names passed in in sheet_filter option:

#my hack to only process worksheets matching the sheet filter my %sheet_filter=(); if ($self->{sheet_filter}) { map {$sheet_filter{$_}} split ",", $self->{sheet_filter}; } my @sheets = map { my $idx = $_->att('rels:id'); if ($files->{sheets}{$idx}) { my $sheet = Spreadsheet::ParseExcel::Worksheet->new( Name => $_->att('name'), _Book => $workbook, _SheetNo => $idx, ); $sheet->{SheetHidden} = 1 if defined $_->att('state') and $_ +->att('state') eq 'hidden'; $self->_parse_sheet($sheet, $files->{sheets}{$idx}); ($sheet) } else { () } } grep {%sheet_filter ? defined $sheet_filter{$_->att('name')} : 1 +} $files->{workbook}->find_nodes('//s:sheets/s:sheet');

and also add a function to just return sheetnames without sufferring the performance time to fully process each sheet:

#my hack to only return the list of sheet names if($self->{just_find_sheet_names}) { my @sheets = map { $_->att('name'); } $files->{workbook}->find_nodes('//s:sheets/s:sheet'); $workbook->{Worksheet} = \@sheets; $workbook->{SheetCount} = scalar(@sheets); return $workbook; }

Any helpful ideas would be appreciated

Replies are listed 'Best First'.
Re: Help to override Spreadsheet/ParseXLSX.pm module
by LanX (Cardinal) on Apr 02, 2021 at 21:06 UTC
    In general there are two approaches which come to mind.

    • subclassing
    • monkey patching

    The first is achieved by creating a new "patch" module inheriting (or importing) from the original class and overriding the differing methods.

    The second is done by changing the differing subs right in place with local *Original::Class::method = sub {...} just before calling the module.

    Note that because of the local all changes only happen temporarily till the end of scope of the calling sub. Like this other applications won't be affected.

    HTH! :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      As the OP said they're using Spreadsheet::Read, at this moment, only monkeypatching is a possible way out.

      I will see if adding your own subclassed class to overrulle supported classed would be possible as an option. Currently it is not, as my wrapper checks on the class name, which would have to be changed to check if the parser used is a subclass of the supported parser. And of course it needs a new method to tell Read overrulle the existing parser.

      boleary please add your request as feature wish to ParseXLSX issues and the wish to subclass parsers to Read issues.


      Enjoy, Have FUN! H.Merijn
        Thanks, personally I generally prefer monkeypatching over subclassing.

        I mentioned it tho, because people kept criticizing me for that.

        In my experience is subclassing only a good way, if the original author supported it from the beginning.

        Monkeypatching is closer to the metal, and it's clear why it fails if it fails.

        Exception here: I doubt monkeypatching is threadsafe, but I never tried.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        As I noted in my reply to Rolf... I think I subclassed it... I'm very weak with OO terminology
        It may be more of a monkeypatch since it really just redefines the 2 named subs

        I will add the feature request, but it looks like the ParseXLXS module hasn't been touched since 2016 so do you think a feature request would even help?
        I did add a pullrequest #98 on gitHub with my changes

      Thanks Rolf!
      So I think I subclassed it with a patch module :)

      For my reference:
      I copied the ParseXLSX.pm module to CE_ParseXLSX_patch.pm then I removed all the subs which I did not modify, so I was left with new and _parseworkbook

      I located CE_ParseXLSX_patch.pm to a new directory in my repo: COMMON/Spreadsheet/ COMMON is in @INC
      Then I added a use Spreadsheet::CE_ParseXLSX_patch in the original module that had use Spreadsheet::Read
      When I run the code, I get the warning that 2 subs are overridden,
      And my modifications work!

      Subroutine new redefined at ../Spreadsheet/CE_ParseXLSX_patch.pm line +19. at ../Spreadsheet/CE_ParseXLSX_patch.pm line 19. require Spreadsheet/CE_ParseXLSX_patch.pm called at ../portable_sp +readsheet_subs.pm line 4 main::BEGIN() called at ../Spreadsheet/CE_ParseXLSX_patch.pm line +19 eval {...} called at ../Spreadsheet/CE_ParseXLSX_patch.pm line 19 require portable_spreadsheet_subs.pm called at ../write_csv_ss.pm +line 3 require write_csv_ss.pm called at ../renameRules.pm line 5 require renameRules.pm called at ../simpleBga.pm line 3 require simpleBga.pm called at ../pin_report_modules.pm line 2 require pin_report_modules.pm called at ../read_pin_report.pm line + 3 require read_pin_report.pm called at d:/GitHub/CE_TOOLS/COMMON/TES +TS/test_acust_daisy_chain.pl line 35 Subroutine _parse_workbook redefined at ../Spreadsheet/CE_ParseXLSX_pa +tch.pm line 37. at ../Spreadsheet/CE_ParseXLSX_patch.pm line 37. require Spreadsheet/CE_ParseXLSX_patch.pm called at ../portable_sp +readsheet_subs.pm line 4 main::BEGIN() called at ../Spreadsheet/CE_ParseXLSX_patch.pm line +37 eval {...} called at ../Spreadsheet/CE_ParseXLSX_patch.pm line 37 require portable_spreadsheet_subs.pm called at ../write_csv_ss.pm +line 3 require write_csv_ss.pm called at ../renameRules.pm line 5 require renameRules.pm called at ../simpleBga.pm line 3 require simpleBga.pm called at ../pin_report_modules.pm line 2 require pin_report_modules.pm called at ../read_pin_report.pm line + 3 require read_pin_report.pm called at d:/GitHub/CE_TOOLS/COMMON/TES +TS/test_acust_daisy_chain.pl line 35
        It's not clear to me how you did the "sub-classing", by inheriting methods or by importing or a do or ...?

        Anyway, as I already said, I prefer monkey-patching over subclassing and you won't need to create another module/namespace for it.

        Sub-classing requires that the author played by "the" book, and the OOP book is indeed a TIMTOWTDI library in Perl.

        Monkey-patching OTOH is well defined and has no dependency to any OOP model.

        So in our own interest consider

        sub call_patched { local *Spreadsheet::ParseXLSX::new = \&patched_new; local *Spreadsheet::ParseXLSX::_parse_workbook = \&patched_parse_w +orkbook; # do whatever with Spreadsheet::ParseXLSX }

        And all patches will disappear outside this function.

        For completeness: Both ways may fail if there are any lexical variables closed over the patched subs, but even this can be solved with PadWalker, iff this really happens.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        I guess this is the hybrid "monkeyclassing"! Was package ParseXLSX; at the 1st line of both original and patched files? If yes then you define a class (package) and then you redefine that class in the 2nd CE_ file. I have no idea what's meant to happen in this case. Obviously it did work for you and in your circumstances but I do not know how robust this is. For example, reversing the order of use statements would have different effect.

        Here is a tiny example of subclassing in Perl. You subclass class P1 by creating a new class in a new file P2.pm . Consider this:

        # file P1.pm package P1; sub new { my $class = shift; return bless {} => $class } sub A1 { print "i am A1 in P1\n"; } sub A2 { print "i am A2 in P1\n"; } 1;
        # file P2.pm package P2; use parent 'P1'; sub new { my $class = shift; return bless {} => $class } sub A1 { print "i am A1 in >>>P2<<< (i have overwritten previous A1)\n +"; } # A2 and everything else are inherited from class P1 1;
        # file test.pl # run: perl -I. test.pl use P2; # and not P1 my $P2obj = P2->new(); $P2obj->A1(); # this is overwritten $P2obj->A2(); # this is inherited verbatim from parent P1

        The above demonstrates how P2 inherits all methods from P1 and then overwrites A1() but all other still hold as they were defined in P1.

        How does that translate to your case? P1 is the original class/package. And P2.pm is the new file and package you created essentially overwriting 2 methods. The caveat here is that ALL your scripts which mention use P1; now should be modified to use P2;. Well OK, unless you don't have this power.

        bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11130732]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2021-04-18 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?