Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Scalable application with perl

by jasz (Initiate)
on Apr 26, 2016 at 12:49 UTC ( #1161540=perlquestion: print w/replies, xml ) Need Help??
jasz has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a new web application with a traffic of around 6k to 12k users connected for around 1.5 to 2 hrs. It is kind of an online examination. Can any one tell me

what r the perl code changes to be made for such huge scalability ?

what r the server side changes ?

what r the database side changes (as the basic system has non partitioned table, single user etc) ?

alternatives of backend languages for the requirement ?

Replies are listed 'Best First'.
Re: Scalable application with perl
by BrowserUk (Pope) on Apr 26, 2016 at 16:39 UTC
    6k to 12k users connected for around 1.5 to 2 hrs. It is kind of an online examination.

    You're asking the wrong questions; or at least, providing the wrong information; and of the wrong people.

    It's simple to demonstrate that a single commodity box with an average amount of memory (say 4GB to 8GB) with suitable server software can sustain 12000 concurrent connections with ease. If they are doing nothing or very little.

    Whether those 12000 users would be serviced in a timely manner if they all attempted to connect at the exact same moment is a different matter; but that's normally not a concern as it is rare for 12000 people to coordinate their actions in that way.

    Of course, for an exam, you might stipulate that they must start at exactly the same time. Even then, you would normally tell them that they must be logged on a least 5 minutes before the start time, and give them a 15 or 20 minute window before that during which to get logged in etc.

    Assuming that you do require them all to see that first page of the questions at "exactly the same time", then the concern is, can your server deliver that first page to 12000 users within some specified period of time. Assuming sufficient bandwidth and ignoring net latency, that equates to can your server/software deliver that first page to all 12000 within that specified period.

    Given the first page will be the same (or substantially the same) for all of them, that means a 'static page' delivery. Even if the first page is customised -- with the user name/ID etc.; as you have this information for 5 minutes before the off; those customisations can be done before the off, thus avoiding active content generation at that peak period.

    After that, each user will move through the pages of the exam at different speeds, which will have a natural tendency to spread the load out.

    So the headline number you need to look at is how much content (kbytes) needs to be delivered to each user over that 1.5 or 2 hour period; and how much resource -- CPU; DB; etc. -- does your software need to generate that content.

    Once you have those figures -- easily measured from your existing or test system setup -- then you have what you need to perform your capacity planning for the numbers involved.

    Bottom line: asking random strangers on the net to guess how to do your job; when only you have the information required to do it; is likely to lead you in completely the wrong direction.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Scalable application with perl
by Your Mother (Chancellor) on Apr 26, 2016 at 13:25 UTC

    Users per year? Per day? Concurrently? 12k concurrent users is quite a lot for a dynamic system to handle and will require serious backend power regardless of "scalability" changes. Some ideas–

    • Perl: Persistent code, as small and as few libraries as possible, run with uwsgi. Write tests and benchmarks from the beginning. Cache templates. Off-load everything possible to webserver. Put as much processing into the front-end (client-side JS) as possible.
    • Backend: nginx, huge amount of RAM, turn off everything that isn't used for the web app.
    • DB: Depends on a lot. Can users update data? Can you cache data? How complicated is the data? There is a reason DB Admins are sought after and command good salaries despite the fact that every dev can do DB Admin work; expert v dilettante.
    • Backend language alternatives: all of them.

    Footnote: This is under-specified and even with what you do specify it is not a beginner task. "Examinations" implies security and getting security right is harder than writing a high traffic app. A beginner will flail with these issues and have little if any success.

Re: Scalable application with perl
by GotToBTru (Prior) on Apr 26, 2016 at 13:08 UTC

    You'll be more likely to get help if you ask specific questions. For instance, we have no idea what to change since we don't know what the code says now.

    See How do I post a question effectively? for help. Also I know what I mean. Why don't you?.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Scalable application with perl
by QuillMeantTen (Friar) on Apr 26, 2016 at 13:24 UTC

    Question 1 : you probably use perl on the server side so 1 and 2 are kinda redundant. I think you will want to use a framework if you are not using one, because they do make building applications easier.
    You will probably also want to look at the kind of code the server side will be running. If its only some crud then it should not take much resources but if each client represent a heavy calculation load then you might want to tune your code for performance when it comes to those calculations. If profiling and optimising is not enough then you might think about rewriting those parts in c/c++ (choice depending on the size of the program needed for said calculations, but choosing between c and c++ is an entirely different can of worms that I won't open here), ideally as a bunch of scriptable primitives.

    On the database side, its something else entirely. What can be told depends greatly on what you know. What do you know about database management? ACID properties? What solution are you currently using?

    you should start finding out how your current solution does when it comes to respecting them. Then again do some profiling, build indexes where they are needed, you probably wont have to rewrite your queries because most dbms nowadays have good optimizers... Basically you want the smallest possible transactions so you can do a bunch of them as fast as possible and easily recover from a database crash.

    Previous paragraph assumes that your database is well designed, of course, if you have doubts on that topic then you might want to normalize it.

Re: Scalable application with perl
by jasz (Initiate) on Apr 26, 2016 at 14:29 UTC
    Here are the precise details.

    well, to be precise, around 12k users connect to the website for 1.5 hrs just once in a day concurrently for taking the test. Consider it a event happening about 20 times in a year

    The, application framework is built with bootstrap...nothing much of aesthetic

    each user will be connected and the session is generated and used for the time line mentioned above

    the data for (multiple choice) questions retrieved from db and the user submit a choice that is stored in the users table (assuming around 12k rows and 120 columns as nothing much)

    with this requirement, like in uwsgi, the concurrency using threads vs process and how many r needed

    how db handlers in the script for each user should be implemented with threads or process?

    approximately what will be the system requirements on the back end to deal with the situation?

    if much of code is implemented on client side, the security checks in the code to be implemented are ?

    security rights is not a problem as the data out of this app is under trial

      Sounds like a fairly simple problem after all maybe. 12,000 users, but what are they doing during that 90 minutes? One GET and one POST each? Dozens? A hundred? Can users undo answers or backpage? Issues like this can change a simple problem into a terribly complicated one quickly.

      Vanilla nginx can serve pre-built or cached GETs of form pages to the tune of 12K per second just fine if you have the bandwidth. The incoming POSTs have to go through the application though and you won't know how fast that is until you build or at least prototype it. uwsgi has deep and powerful controls for how many worker processes to run and can even make changes dynamically. nginx has an experimental perl embed module now too which could be amazing (I haven't tried it yet). Anyway if you have the RAM and a halfway decent CPU I imagine you'll be able to handle the use you're talking about unless a lot of the users are banging on submit the whole time.

      Tips–

      • Prebuild and cache (probably just as files) all forms. Serve them statically or from webserver memory; both are terribly fast in nginx. There are many Perl templating engines to accomplish this. Text::Xslate is a very fast one (the right choice for "live" service) and Template::Toolkit is a very expressive and well known, documented, and extended one.
      • If the UI is complicated (users can backup and change answers, for example) save state in the browser with JS localStorage. It's a mild hassle to duplicate the form validation in the Perl and the JS but it will keep the webserverapp load zero till submit. If you split an app like this, good testing becomes crucial to avoiding nasty surprises.
      • Investigate embedding the code in nginx but uwsgi probably has the dynamic load options you need.
      • Extra clock savings: don't touch the DB at all during the exams. Dump the validated submissions as single field delimited strings to a flat file or NoSQL or similar. Whatever is fastest and puts lightest load on the machine. This is not my forté, I have no direct recommendation today. When the exam is over, transform and load the flat data to the DB. Then you can grade and run reports from there.
        Extra clock savings: don't touch the DB at all during the exams. Dump the validated submissions as single field delimited strings to a flat file or NoSQL or similar.
        Excellent tip. You could use a key/value store like DB_File to save each submission. If you can construct the form "action" attribute to give it a unique path (perhaps using Javascript on the client side) then you would not even have to parse the query arguments:
        <form action="myapp/user1/question1"> <form action="myapp/user1/question2"> <form action="myapp/user2/question1">
        Then use $ENV{'PATH_INFO'} as the key ('/user1/question1'), and read from STDIN directly to get the raw value to store.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1161540]
Approved by NetWallah
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2017-12-16 14:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (453 votes). Check out past polls.

    Notices?