Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Site back up again

by Co-Rion (Monk)
on Jul 11, 2010 at 14:16 UTC ( #848841=monkdiscuss: print w/ replies, xml ) Need Help??

After a database crash on Friday, the site was unreachable. Pair worked over the weekend to recover the data in the database, so we didn't have to restore from backup and lose about 24 hours of activity. The recovery took about 48 hours (and a felt eternity), as MySQLd took its time rebuilding various ISAM indices. At various times I wondered whether it would have been OK to ditch 24 hours of site activity in favour of bringing the site back up quicker.

We thank Pair for proactively noticing that the database server had problems and initiating a recovery even before we opened a ticket for that.

There are some things that are broken, like Best Nodes and Worst Nodes, because the jobs that (re)create those listings did not run, but that should fix itself within the next 24 hours. If you note any other unusual weirdness or brokenness, please reply here.

It's great to have you all back again!

Comment on Site back up again
Re: Site back up again
by Burak (Chaplain) on Jul 11, 2010 at 15:19 UTC
    So, this wasn't some hax0ring thing. Good then :)
      Yeah, I also thought zer0c00l, Cr@sh, and Burn were back to their old ways.
Re: Site back up again
by PeterPeiGuo (Hermit) on Jul 11, 2010 at 15:35 UTC

    Thanks!

    Peter (Guo) Pei

Re: Site back up again
by apl (Monsignor) on Jul 11, 2010 at 15:46 UTC
    So that's what going cold turkey feels like!
Re: Site back up again
by BrowserUk (Pope) on Jul 11, 2010 at 15:50 UTC

    Congratulations and thanks to all those involved in the recovery process.

    One question. WHat on earth does this mean?

    Hag-seed, hence!

    And prior to your update, the dubious choice of place-holder message and image did little to convey a sense of either well-being; nor even that the recovery efforts were actually in hand.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I'm not sure what these mean, but they were available as some old "downtime" placeholder scripts, so I enabled them.

      I'm sorry if the texts were neither humorous nor soothing but I felt that anything was better than giving back the internal server error.

      I'll look into creating (and rather not) displaying alternative downtime placeholders, just in case.

        I'm sorry if the texts were neither humorous nor soothing but I felt that anything was better than giving back the internal server error.

        Indeed, something specific was far better than a non-specific error message. Once you added the update and it was clearly not just some hacker's graffiti, it became reassuring. Thanks.

        I'll look into creating (and rather not) displaying alternative downtime placeholders, just in case.

        Make me happy — Tell me you've made progress on this. :-)

      For the future: the image in the status message was an earlier portrait of Him.

        For a while I thought in fright that it could either be that he has a Frankenstein cousin who wrecked our refuge and conquered our lovely NodeReaper's heart, bones and intestines in a grudging attempt to let trolls and flame-baiters in or that someone played a prank, until I read the assuring message from our old Corion and learning that everything was becoming under control, I just wished for us not to be Monastery-less us the children of Perl...

        Hail PerlMonks, Hail PerlMonks, Hail PerlMonks....



        Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
      Since the image bore a striking resemblance to one of the many depictions of Eddie, of Iron Maiden fame, I felt comforted :)

      ++ to all who helped in the restoration. Withdrawal symptons are nasty.

      That's cited from "The Tempest" by William Shakespeare, Act I, Scene II:
      ...
      Caliban You taught me Language, and my Profit on't
      Is, I know how to curse: The Red-plague rid you
      For learning me your Language.
      Prospero Hag-seed, hence!
      Fetch us in Fewel, and be quick, thou wer't best
      To answer other Business: Shrug'st thou, Malice?
      If thou neglect'st, or dost unwillingly
      What I command, I'll rack thee with old Cramps,
      Fill all thy Bones with Aches, make thee roar,
      That Beasts shall tremble at thy din.
      Cal. No, 'pray thee.
      I must obey, his Art is of such Pow'r,
      It would control my Dam's God Setebos,
      And make a Vassal of him.
      Pro. So Slave, hence. [ Exit Caliban
      ...

      No Fear Shakespeare translates "Hag-seed, hence!" as "Get out of here, you son of a bitch!" ;-)

        "Get out of here, you son of a bitch!"

        What a great way to greet bewildered visitors :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Prospero Hag-seed, hence!

        I guess now we know who was originally responsible for that message!


        Update: Petruchio tells me it wasn't him. So much for my theory....

        What is the sound of Windows? Is it not the sound of a wall upon which people have smashed their heads... all the way through?
Re: Site back up again
by Anonymous Monk on Jul 11, 2010 at 17:16 UTC

    Glad to have you back.

    Thanks.

Re: Site back up again
by Old_Gray_Bear (Bishop) on Jul 11, 2010 at 17:23 UTC
    (THANK YOU)^5

    Thank ALL of you who were involved.

    Can I trouble you for a couple more pieces of information? Can you list the Monks involved in the recovery, that I may say appropriate novenas for them? And can you give more detail on what went wrong with the database? There are not a few of us in the Monastery who use MySQL in the Real World, and I, at least, would like to know what happened in detail so I can look at setting up counter-measures.

    ----
    I Go Back to Sleep, Now.

    OGB

      The database server suffered some corruption of its MyISAM tables. This was discovered and acted on by Pair. The only active involvement by tye, yitzhack and me was to decide that one particularly large table should not be restored. That table was for request statistics, so we've lost the access statistics per page, but that's a small price to pay.

      I'm not sure what caused the table corruption. It became very manifest after I restarted the MySQL server, but maybe that was also what caused the corruption in the first place. Maybe it can be discovered earlier by frequently running show table status $table and/or check table $table.

      As the recovery process employed by MySQL/myisamchk respectively repair $table seemed to be mostly rebuilding the MyISAM index from the raw table file, future decisions will likely weigh the recency of the last backup, as filling the database from a backup is much faster than rechecking/repopulating the table by scanning a MyISAM file.

      Update: Added a crucial "maybe" in the first sentence of the second paragraph.

      I suffered a similar problem with MySQL very recently. During a routine adding of new records to our database and updating old records, my computer (Windows XP) crashed and seemed to have left some of the index tables in disarray.

      After that reading of some records failed for no obvious reason with a so<mewhat helpful error message ("[ERROR] C:\data\Program Files\MySQL\MySQL Server 5.1\bin\mysqld: Incorrect key file for table '.\pandirecords\claims.MYI'; try to repair it"). Running the full check and repair suite fixed it in less than an hour (in total about 600,000 records in that database).

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Site back up again
by ahmad (Hermit) on Jul 11, 2010 at 18:08 UTC

    Thanks to all people involved in the recovery process.

    I think it would be great if we can have more information on what happened exactly, And the steps taken to fix the problem so we can benefit from this experience.

      See Re^2: Site back up again. The steps taken to fix the problem are that I will check our backups more often, so that I have (even) more confidence in their integrity. The last time our backups were taken to the test was when we migrated to a larger database server, two years ago. Since then, the process has not changed, so the backups are likely still good. But this has not been tested often enough.

Re: Site back up again
by exussum0 (Vicar) on Jul 11, 2010 at 21:30 UTC
    I guess we know the backup AND restore system works still. :)
Re: Site back up again
by jdporter (Canon) on Jul 11, 2010 at 22:09 UTC

    Personally, I thought the placeholder page was considerably worse than a generic server error page. It did nothing to ease my mind with regard to the possibility that the site had been hacked. Quite the contrary, in fact.

    Since there was no other communication forthcoming from anyone at Pair or any of the site admins, all I could do was idly hope that things were no worse than in fact they turned out to be.

    I suppose I could have emailed tye or someone whose email address I happen to have, on the hopes that they might know something...

    I did manage to hail Petruchio on another channel, but he avowed ignorance of the situation.

    I posted a message on the PerlMonks facebook group wall, thinking maybe someone who knew something would reply. Alas, that was fruitless as well.

    I think there should a be a defined "emergency plan" for keeping the monks informed in the event of future similar occurrences. As a first rough cut, I'd recommend that one or more of the following be done, in this order of preference:

    1. Put up a real, meaningful placeholder page which is both informative and reassuring.
    2. Send a broadcast message to all of the email addresses of record.
    3. Post messages on other sites of the Perl community — particularly those which point to PerlMonks as a community site. One obvious choice is use.perl.
    At the very least, situation status updates should be communicated poste haste to all gods so that they may respond usefully when queried by monks who may happen to contact them. The expectation that any given god would know what's going on is, I think, not unreasonable.

    Thank you.

    What is the sound of Windows? Is it not the sound of a wall upon which people have smashed their heads... all the way through?
      Send a broadcast message to all of the email addresses of record.

      How about no? I'd much prefer directions to the agreed upon alternate meeting place, ie perlmonks irc channel, use.perl account or something the gods manage that is not on perlmonks servers

        This is to the PerlMonks gods in case they do want to give such a status update, and everyone else who wants to listen to these. There is an unofficial low-traffic perlmonks irc channel on slashnet named #perlmonks. Also, you may give a status update in the topic of the #cbstream channel on the freenode network (just please leave the FAQ url in there; ask me for an access flag so you can update the topic, or, if your religion forbids irc, mail me the information about the update and I may put it up within some time).

      I would gladly follow a low-volume @perlmonks twitter feed, if one existed for the purpose of emergency updates. That should be sufficient. Listing "Follow @perlmonks on twitter for emergency broadcasts" at the bottom of the page ought to be more than adequate advertisement of the 'feature'.


      Dave

Re: Site back up again
by Anonymous Monk on Jul 11, 2010 at 22:45 UTC
    I have a feeling I broke it :) a couple times I'd show up around midnight PST, try to post, and end up with 10 empty duplicates, and then the other night it finally broke

    Sorry :)

      i feel like lost and homeless without perlmonks running and active and having its own consciousness like a living being, so congratulation and good luck.
Re: Site back up again
by JavaFan (Canon) on Jul 12, 2010 at 09:13 UTC
    The recovery took about 48 hours (and a felt eternity), as MySQLd took its time rebuilding various ISAM indices.
    48 hours? What conclusion should be drawn from this? MySQL is a toy database? Or to avoid ISAM tables like the plaque? Or did Pair do something that a restore took that long?
      It wasn't a restore, it was a recovery. It probably took a long time because of how PM was designed. We're mostly on a 10+ year old architecture -- I'm sure there are plenty of new features in MySQL that we are missing out on. I mean, it's impressive the heavily modified Everything2 engine runs as well as it does now, but I can't say I'm surprised recovering from a critical database issue takes a long time for it. Also, they probably didn't know it would take 48 hours when the recovery process was started.

      Elda Taluta; Sarks Sark; Ark Arks

Re: Site back up again
by pid (Monk) on Jul 17, 2010 at 02:27 UTC

    Thank you for your efforts. Thanks for bringing PerlMonks back.
    Next time(ooooooooh no!), I hope the placeholder image of the placeholder won't be that scary... :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://848841]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2014-08-29 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls