Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Shared Memory Cache or Database?

by zapoi (Novice)
on Dec 07, 2020 at 15:35 UTC ( #11124778=perlquestion: print w/replies, xml ) Need Help??

zapoi has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm designing a algorithmic trading app and would appreciate any wisdom that you would be so kind as to bless upon me.

I started with the following design:

main program, using IPC::Shareable create and tie %cache.

Run a sub dataCollector in a new fork, which loops every 0.1s incrementing $i, and then adds the data to a hash giving a {$marketId}{$i}{$broker}{price}. Each time a new set of data is published send a SIGUSR1 to the other processors to alert them that they have new work to do. Once a market is out of it's window of interest approximately 30 minutes later delete $cache{marketId}. Each 0.1s loop yields about 6k of data, and there may be upto 8 markets that are of interest simultaneously to total memory requirement is about 420Mb.

Main program will call subs for each strategy, which will fork and return the forked pid to to main. The strategy workers will receive their interrupts and handle them by reading the shared memory, do various computations to decide if the conditions are good to trade or not.

The dataCollector is the only process that would need to write to the cache, and each worker would be triggered simultaneously and be reading the same piece of memory. Another worker task will be writing the data out to file for back testing strategies, however if data is lost it's not an issue.

Hope that makes sense in what my objective is. What I would appreciate your wisdom on is:

1. IPC::Shareable doesn't seem to handle deeply nested hash refs very well and probably therefore isn't the right choice. Is there an alternative that I should consider?

2. Most of my reading has pointed me towards using an in-memory database. In your experience which free database would give me the fasted throughput in this scenario?

Please turn your flame throwers up to max and burn this down, and help me get to the right foundations.

Thanks

Zapoi

Replies are listed 'Best First'.
Re: Shared Memory Cache or Database?
by 1nickt (Canon) on Dec 07, 2020 at 15:54 UTC

    Hi, for shared data structures I use MCE::Shared. Maybe MCE::Shared::MiniDB would be the best tool for you?

    Hope this helps!


    The way forward always starts with a minimal test.
      Thanks 1nickt- It certainly looks like it could be what I was looking for.

      I'll do some reading and experimenting.

Re: Shared Memory Cache or Database?
by Corion (Patriarch) on Dec 07, 2020 at 16:15 UTC

    Assuming that you plan to run all of this on a single machine, another interesting approach could be to just use shared memory among all processes and to do away with using Perl memory structures and instead use a fixed memory layout.

    This makes some assumptions about the number of assets, markets and brokers, but basically imagine a (large) string indexed by fixed offsets where you write the (packed) prices to.

    The downside is obviously the inflexibility between the server and clients, because you need to update them at once whenever a new market, asset or broker comes along.

    If you use shared memory, you will get no backlog as it goes away when the last process is stopped. This may be an up- or downside.

      Hi Corion,

      Indeed the processes will be running on one machine.

      I considered that approach but I'm not sure that I could efficiently code it in this way, as there will be varying number of brokers depending on the market, and some might not be publishing prices all the time, it also will make reusing the memory up once a market falls out of interest challenging. I can see myself getting in all kinds of hashes of index values to try and keep track of where the current and previous data snaps are stored.

      Please let me know if I've completely misunderstood and my concerns are ill founded.

      Thanks

        Yes, if brokers are somewhat dynamic, and you can't use the same set of brokers for all markets (etc.), calculating the index where to store a price becomes quite cumbersome/error-prone.

        Reusing the memory allocated to a market could be done on a time slice, but all of that depends on how dynamic the data really is.

Re: Shared Memory Cache or Database?
by Fletch (Bishop) on Dec 07, 2020 at 16:57 UTC

    If you're not looking for an SQL-y database but want something that maps more directly to an in-memory hash Redis probably is worth looking at. Slightly fancier / more featureful might be Riak (caveat I haven't looked at it in ages so I'm less sure what its performance characteristics are these days).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Yes - Redis is the first thing that came to my mind as well.

      You can use something like Tie::Redis to put this in more familiar perl ground.

                      "Imaginary friends are a sign of a mental disorder if they cause distress, including antisocial behavior. Religion frequently meets that description"

      I second the suggestion of Redis. If you need to federate the shared memory over multiple logically seperated banks of memory you can use memcached*. If you're on a large shared memory NUMA machine and want to do true shared memory via light weight threads, the scripting language Qore may present some interesting options for you. It's very perlish.

      * updated

Re: Shared Memory Cache or Database?
by Marshall (Canon) on Dec 07, 2020 at 18:23 UTC
    I don't know what the performance characteristics of something like this would be, but SQLite does have the ability to run with a totally in memory DB. One limitation of SQLite is that writing requires the acquisition of an exclusive lock. Multiple read operations can proceed in parallel. It sounds like you have a single writer and maybe 8 or so readers? So on the surface, this "slow writes" doesn't sound like a "show stopper".

    I haven't personally used a completely in memory SQLite DB, but one SQLite feature that I have experimented with is varying the memory footprint of SQLite dynamically. I was very surprised to learn that this was possible and I played with it and found out that this actually works. In my test DB, for a complicated operation like creating multiple indices on a 1M+ record DB table, I ran the memory up to say 500MB. This made a huge improvement in execution time. After which, I dropped the memory usage back down. Perl itself cannot reduce its memory footprint (to my knowledge), but SQLite as used by Perl can do so.

    I suppose that if you have an SSD disk and tell SQLite to use say a 1GB memory cache, that the performance will be impressive and approach that of a complete in-memory DB (except for writes), Whether the performance is good enough or not for your application, I do not know.

    Before rushing off to implement some complicated shared DB structure on your own, I would consider at least implementing a prototype with SQLite. For the write operations, you will have to be cognizant of performance implications of a "write" and use explicit start and end transaction statements. The number of transactions per second will be a limiting performance factor. If you need to write to 2 tables, make sure that is a single transaction. The number of rows affected by an operation is not that important, but the number of transactions is important.

    In summary, I would recommend that you consider making a prototype to play with. Even if you plan to "throw the prototype away". There are DB mechanisms that allow you to be notified when a table changes. This may be completely adequate for your IPC needs?

    Update:
    I did a bit more investigation into SQLite performance and found this informative link with C benchmarks on stackoverflow: improve-insert-per-second-performance-of-sqlite.

    This thread talks mostly about increasing row insert speeds, but much of the discussion is also applicable to increasing transactions per second. A transaction is a very expensive thing because the DB is trying to maintain what is called ACID (Atomicity, Consistency, Isolation, Durability). A transaction requires multiple writes to the disk and also journaling so that an incomplete transaction can be restarted. While writing this post, my cat jumped on top of my tower and held the power button down, causing a panic shutdown of Windows. Something like that would cause an incomplete transaction! SQLite can recover from something like that. SQLite will run faster if you "tone down" some of its ability to recover from catastrophic errors. The OP wrote: however if data is lost it's not an issue. The article talks about a number of those parameters. Also as I mentioned before, an SSD will have a noticeable impact on writes because no disk rotational delays are involved.

    I presume that you are not trying to run this on some wimpy cheap laptop and that buying an SSD dedicated to this project is no big deal (let the O/S use its own drive). You will have to do some benchmarking and experimentation on your system to see exactly what is or is not possible. The indexing of the database also matters quite a bit. There are lots of variables. However, it appears to me that a SQLite prototype is completely feasible (app is in the range of a few dozen transactions per second). You would learn a lot by that exercise. Also consider that having an actual DB file vs totally in memory may have some advantages for debugging and monitoring/tweaking the application.

Re: Shared Memory Cache or Database?
by bliako (Monsignor) on Nov 10, 2021 at 12:50 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11124778]
Approved by marto
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2022-05-21 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (76 votes). Check out past polls.

    Notices?