• @nick@midwest.social
    link
    fedilink
    434 months ago

    Just had to restart our main MySQL instance today. Had to do it at 6am since that’s the lowest traffic point, and boy howdy this resonates.

    2 solid minutes of the stack throwing 500 errors until the db was back up.

    • @xmunk@sh.itjust.works
      link
      fedilink
      204 months ago

      If you have the bandwidth… it is absolutely worth it to invest in a maintenance mode for your system, just check some flat file on disk for a flag before loading up a router or anything and then, if it’s engaged, just send back a static html file with ye olde “under construction” picture.

      • @nick@midwest.social
        link
        fedilink
        4
        edit-2
        4 months ago

        That’s not really… possible at this point. We have thousands of customers (some very large ones, like A——n and G—-e and Wal___t) with tens or hundreds of millions of users, and even at lowest traffic periods do 60k+ queries per second.

        This is the same MySQL instance I wrote about a while ago that hit the 16TiB table size limit (due to ext4 file system limitations) and caused a massive outage; worst I’ve been involved in during my 26 year career.

        Every day I am shocked at our scale, considering my company is only like 90 engineers.