http://www.perlmonks.org?node_id=11131943

derion has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,

I am looking for a good not "the" way to keep my website/webshop fast with growing complexity of data structure.
I have a self made Perl-based webshop with about 80.000 products and a couple of tables where the main part is a category table a products table and a price table.
There are multiple prices for one product depending on the market, buying amount and customer id.
There are some more tables I have and I want to add to give customers more information about the products.
E.g. properties, relations (groups of color variants, size variants, brand variants).
All the data has a structure which seems more or less good but I am happy enough with the structure.
Nevertheless building a detailed page on demand does a lot of reoccuring requests that cost too much time in my opinion.
There are a couple of requests and a significant amount of Perl-Code before a visitor gets the information about a product. I read a couple of ideas and tried this and that and I am not sure which way makes more sense than another.

A lot of statements say it makes sense to keep the data in the database and build the pages on demand.
As far as I see it makes limited sense to do the same request over and over again.

I tried to store the information in files e.g. as JSON but updating it and the performance did not really make me happy. Keep everything in one file or create one file per information group (e.g. price of the product, variants, properties). Both ways did not really make me happy.

Storing the data in the database as JSON seems to be an option but as my DB is still latin-1 I would prefer not to mix things where as far as I can see JSON should be stored as UTF-8.

I stumbled over freezing / thawing hashrefs and storing them in a DB. No statement I read about that was really emphatic but still it seems kind of elegant in my eyes.

My idea was to make one table with text or blob cols where one col is the price of an item, one is for the properties and so there could be several more.
On each field I could do updates on changes and I could get the whole information with only one request for one item. The structured information is still available in the tables from where I build the hashrefs and maintain the data. So if I would need something special I still could access the data. It seems cleaner and faster to me than using files to cache the information. Stocks are updated every ten minutes, prices are updated on a daily basis and all the other information vary on the source where I get my data from daily to yearly.

I am uncertain because what I read seems to say I am going the wrong way. What I have experienced says the ways I have tried are not very good.

I would be very happy to get some opinions that could direct me to a good way to go on.

Thank you very much

derion

  • Comment on performance with mysql / file-caching / hash reference on demand

Replies are listed 'Best First'.
Re: performance with mysql / file-caching / hash reference on demand
by tobyink (Canon) on May 02, 2021 at 11:55 UTC

    In addition to what afoken said, try adding indices to your table.

    CREATE INDEX CT_Media_Relations_ix_SupplierID ON CT_Media_Relations (SupplierID); CREATE INDEX CT_Media_Relations_ix_SupplierPID ON CT_Media_Relations (SupplierPID); -- etc

    Indices will somewhat slow down write queries (INSERT, etc) but can have a huge improvement on read queries (SELECT, etc).

    Any primary keys will already be indexed, but any other columns containing numbers, dates, and short strings which you frequently filter, sort, or group by should probably be indexed.

      This would be a very good point but all columns that are in any query are either primary or indxed.

        Too many indexes, or the wrong kinds of indexes, can cause performance problems too.

Re: performance with mysql / file-caching / hash reference on demand
by choroba (Cardinal) on May 02, 2021 at 08:26 UTC
    At work, we do hundreds of database calls every second. Some of the resources have performance problems, but they usually involve scanning tables that have millions of rows. Your description gives me a vague picture, but I can't guess what the problem might be. Are you using CGI with a fresh Perl start on every request, or do you preload the libraries (mod_perl, plack, mojo)? Are you using an ORM, or do you write your SQL queries by hand?

    BTW, sometimes, the cheapest solution is to upgrade the hardware.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thank you very much for your reply.

      The webshop is built on the base of a Links management system which would work with mod_perl.
      Some of my code for customer related operations might cause hiccup with mod_perl, so I fear using mod_perl a little. For displaying detailed pages I work with speedycgi since a couple of years which did improve performance significant.
      As far as I have seen one problem is what your question implies.
      I loose performance because there are connections built to mysql more often than probably neccessary.
      This seems to be more my problem with Perl than a general problem though PHP which I try to avoid seems to be quicker there out of the box.

      My webserver is a couple of years old but still is performant and works with SSD:

      Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz Stepping: 7 CPU MHz: 2499.711 BogoMIPS: 5000.02 Virtualization: VT-x
      total used free shared buffers cached Mem: 96455 65478 30976 0 4752 53994 -/+ buffers/cache: 6730 89724 Swap: 949 0 949

      At the moment a detailed page needs about 1.1 seconds to display but I want to add significant more information and I would like to get faster - at least not slower.

      One code example I am using to get media (images, datasheets) from the DB:

      sub get_udx_media_relations { my $SupplierID = shift; my $SupplierPID = shift; my $return; my $query = qq|SELECT ProviderID, SourceID, Folder, Filename, Extension, Purpose, Rank, Priority, Width, Height FROM CT_Media_Relations WHERE SupplierID = '$SupplierID' AND SupplierPID = '$SupplierPID' AND isDeleted != 'Yes' GROUP BY Filename, Extension ORDER BY Priority DESC, Rank ASC|; my $sth = $DB->table('CT_Media_Relations')->prepare ($query) || die $GX::SQL::error; $sth->execute (); my ($prev_priority); while ($_ = $sth->fetchrow_hashref) { if ($prev_priority->{$_->{Purpose}} > $_->{Priority}) { next unless $_->{Purpose} eq 'PDB'; } #get rid of mds when manufacturer is present (4 vs 5) if ($_->{Purpose} eq 'AAB' or $_->{Purpose} eq 'EBD' or $_->{Purpose} eq 'PBS' or $_->{Purpose} eq 'PRI') { if ($prev_priority->{IMG} > $_->{Priority}) { next; } else { $prev_priority->{IMG} = $_->{Priority}; } } $prev_priority->{$_->{Purpose}} = $_->{Priority}; if ($_->{Purpose} eq 'PRI' && !$return->{Primary}->{path}) { $return->{Primary} = $_; $return->{Primary}->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $return->{Primary}->{filename} = $_->{Filename} . '.' . $_->{Extension}; $return->{Primary}->{filename} = $_->{Folder} . '/' . $return->{Primary}->{filename} if $_->{Folder}; } my $media = $_; $media->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $media->{filename} = $_->{Filename} . '.' . $_->{Extension}; $media->{filename} = $_->{Folder} . '/' . $media->{filename} if $_->{Folder}; push(@{$return->{$_->{Purpose}}}, $media); } push(@{$return->{IMG}}, @{$return->{EBD}}) if defined(@{$return->{EBD}}); if ( not defined(@{$return->{IMG}}) and defined (@{$return->{PBS}}) ) { push(@{$return->{IMG}}, @{$return->{PBS}}); } push(@{$return->{IMG}}, @{$return->{AAB}}) if defined(@{$return->{AAB}}); if (defined @{$return->{IMG}} && (!$return->{Primary} || $return->{Primary}->{Priority} < $return->{IMG}[0]->{Priority})) { $return->{Primary} = shift @{$return->{IMG}}; } push(@{$return->{IMG}}, @{$return->{PAB}}) if defined(@{$return->{PAB}}); push(@{$return->{IMG}}, @{$return->{DET}}) if defined(@{$return->{DET}}); return $return; }

      My thought was that it could make sense to put these results in a static whatever state (database, file) on a daily basis to avoid doing the same routine over and over agein.

      Cheers derion
        sub get_udx_media_relations { my $SupplierID = shift; my $SupplierPID = shift; my $return; my $query = qq|SELECT ProviderID, SourceID, Folder, Filename, Extension, Purpose, Rank, Priority, Width, Height FROM CT_Media_Relations WHERE SupplierID = '$SupplierID' AND SupplierPID = '$SupplierPID' AND isDeleted != 'Yes' GROUP BY Filename, Extension ORDER BY Priority DESC, Rank ASC|;

        No placeholders here. That prevents DBI, DBD and the database itself from caching the query, so it has to be parsed again from zero each and every time. Even if you only change the IDs. Using placeholders and prepare_cached() instead of prepare() could improve speed, at least when used in a persistent environment (i.e. mod_perl, FastCGI, basically everything but plain old CGI). And, even worse, you might become a victim of SQL injection. See https://bobby-tables.com/.

        while ($_ = $sth->fetchrow_hashref) { if ($prev_priority->{$_->{Purpose}} > $_->{Priority}) { next unless $_->{Purpose} eq 'PDB'; } #get rid of mds when manufacturer is present (4 vs 5) if ($_->{Purpose} eq 'AAB' or $_->{Purpose} eq 'EBD' or $_->{Purpose} eq 'PBS' or $_->{Purpose} eq 'PRI') { if ($prev_priority->{IMG} > $_->{Priority}) { next; } else { $prev_priority->{IMG} = $_->{Priority}; } } $prev_priority->{$_->{Purpose}} = $_->{Priority}; if ($_->{Purpose} eq 'PRI' && !$return->{Primary}->{path}) { $return->{Primary} = $_; $return->{Primary}->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $return->{Primary}->{filename} = $_->{Filename} . '.' . $_->{Extension}; $return->{Primary}->{filename} = $_->{Folder} . '/' . $return->{Primary}->{filename} if $_->{Folder}; } my $media = $_; $media->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $media->{filename} = $_->{Filename} . '.' . $_->{Extension}; $media->{filename} = $_->{Folder} . '/' . $media->{filename} if $_->{Folder}; push(@{$return->{$_->{Purpose}}}, $media); }

        I did not even try to fully understand what happens here. But you are fetching data from the database just to discard it, based on values of the "Purpose" and "Priority" columns. That won't speed up things. Try not to fetch data that you don't need. Make the database filter out everything you don't need, i.e. put your criteria into the WHERE condition. That way, the database won't have to find data you don't need, fetch it, serialize it, and push it through the connection to your application where the DBD has to deserialize it, and all of that just to discard that unwanted data.

        If the remaining code looks similar, that's where you can really improve performance. Use placeholders everywhere to allow at least the database to cache queries, and filter in the database, not in perl.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: performance with mysql / file-caching / hash reference on demand
by karlgoethebier (Abbot) on May 03, 2021 at 06:43 UTC

    As you have implemented now all the cool ideas already given you might consider Stored Procedures. I‘m pretty sure that this will give another performance boost. It might be also worth to take a look at all the funny params in your my.cnf. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      This is new to me and after some reading looks like a simple and good way to make things faster.
      Thanks for that!

        De rien. You may consider partitioning as well. But it is pain in the ass to find the right strategy.

        «The Crux of the Biscuit is the Apostrophe»

Re: performance with mysql / file-caching / hash reference on demand
by derion (Sexton) on May 02, 2021 at 18:50 UTC

    Thanks to all for the comments on my post. I got some new ideas and tasks.

    One part of my initial question is still not clear for me.

    If caching information to get better performance is a good way why is doing this with Storable and DB a bad or at least not so good way?

    I could do my routines on a daily basis, freeze the hashref, store it in a table and retrieve the hashrefs of one item by ID with only one request.

    --------------------------- | my_cache_table | --------------------------- | ID (primary) | --------------------------- | hashref with prices | --------------------------- | mod_date of prices | --------------------------- | hashref with images | --------------------------- | mod_date of images | --------------------------- | hashref with other stuff| --------------------------- | mod_date of other stuff | ---------------------------

    I don´t want to appear stubborn but I like to comprehend things. Caching one or another way seems to be common practice the above mentioned way not at all.

    Thanks for some more enlightment

      There doesn't seem to be anything wrong with that design to me.


      The way forward always starts with a minimal test.
Re: performance with mysql / file-caching / hash reference on demand
by Anonymous Monk on May 02, 2021 at 14:30 UTC
    "Product category pages," which are frequently called-for but rarely change, can also be managed using things like the classic Cache::Memcached. Once you've assembled the necessary (JSON?) information once, you put it the results the cache. Then, before assembling and storing it again, you query the cache first to see if it is already there. You can, if you like, even store complete HTML fragments, ready for immediate display. The "memcached" daemon automatically handles recycling of infrequently-referenced data. Just about every web-site that I am aware of uses this facility routinely for this purpose, and it can have a dramatic positive impact on performance.
      That sounds interesting. Seems a similar approach but storing in memory. My Database having about 4GB should lead to a manageable size of Data for this method. Besides the difference that the information is stored in an even faster way than in a JSON or hashref in a DB the approach seems kind of similar from my point of view. I still do not get the real disadvantage of storing a cached element in a table. Nevertheless Cache::Memcached looks very nice, thank you for that.
      Just about every web-site that I am aware of

      meaningless and useless, as usual.