• @grabyourmotherskeys@lemmy.world
    link
    fedilink
    English
    3301 year ago

    I haven’t read the article because documentation is overhead but I’m guessing the real reason is because the guy who kept saying they needed to add more storage was repeatedly told to calm down and stop overreacting.

    • krellor
      link
      fedilink
      1691 year ago

      I used to do some freelance work years ago and I had a number of customers who operated assembly lines. I specialized in emergency database restoration, and the assembly line folks were my favorite customers. They know how much it costs them for every hour of downtime, and never balked at my rates and minimums.

      The majority of the time the outages were due to failure to follow basic maintenance, and log files eating up storage space was a common culprit.

      So yes, I wouldn’t be surprised at all if the problem was something called out by the local IT, but were overruled for one reason or another.

      • Oliver Lowe
        link
        English
        591 year ago

        and log files eating up storage space was a common culprit.

        Another classic symptom of poorly maintained software. Constant announcements of trivial nonsense, like [INFO]: Sum(1, 1) - got result 2! filling up disks.

        I don’t know if the systems you’re talking about are like this, but it wouldn’t surprise me!

        • @afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          01 year ago

          Yeah a few levels.

          Level 1: complex stand alone devices, mostly firmware.

          Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

          Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

          Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

          Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

          Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed.

          • @Pat12@lemmy.world
            link
            fedilink
            English
            11 year ago

            Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

            Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

            Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

            Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

            Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed

            Is this specific software engineering languages? or is this electrical engineering or what kind of work is this?

            • @afraid_of_zombies@lemmy.world
              link
              fedilink
              English
              -11 year ago

              I am having problems understanding your questions. I generally operate on level 2 and we typically use graphics based languages when we implement scripting languages to do graphical languages. The two most common graphic languages are FBDs and Ladder-Logic. Both have a general form and vendor specific quirks.

              For scripting I tend towards Perl or Python, but I have seen other guys use different methods.

              Level 3 use pretty much the same tools. Level 4 I have in the passed used a modbus/tcp method but this isn’t something I can really say is typical. One guy I know used the python API to do it.

              • @Pat12@lemmy.world
                link
                fedilink
                English
                11 year ago

                oh, thank you

                my background is not in engineering which explains my confusing questions

    • DontMakeMoreBabies
      link
      fedilink
      761 year ago

      I’m this person in my organization. I sent an email up the chain warning folks we were going to eventually run out of space about 2 years ago.

      Guess what just recently happened?

      ShockedPikachuFace.gif

      • @vagrantprodigy@lemmy.whynotdrs.org
        link
        fedilink
        English
        191 year ago

        Literally sent that email this morning. It’s not that we don’t have the space, it’s that I can’t get a maintenance window to migrate the data to the new storage platform.

      • @IMongoose@lemmy.world
        link
        fedilink
        English
        91 year ago

        Sometimes that person is very silly though. We had a vendor call us saying we needed to clear our logs ASAP!!! due to their size. The log file was no joke, 20 years old. At the current rate, our disk would be full in another 20 years. We cleared it but like, calm down dude.

      • Mike D.
        link
        fedilink
        English
        91 year ago

        Can’t you just add a few external USB drives? (heard this more than once at an NGO think tank.)

        • @grabyourmotherskeys@lemmy.world
          link
          fedilink
          English
          121 year ago

          I mean I’ve worked at a hosting company that had a bunch of static sites running off an SSD connected by usb to the server so this did happen back in the day. I try not to think about those days.

          “What’s that? Your accounting front end that’s built in obsolete front page code on an Access database isn’t working again? It’s probably a file lock, I’ll restart IIS.”

    • Dojan
      link
      fedilink
      English
      271 year ago

      Ballast!

      Just plonk a large file in the storage, make it relative to however much is normally used in the span of a work week or so. Then when shit hits the fan, delete the ballast and you’ll suddenly have bought a week to “find” and implement a solution. You’ll be hailed as a hero, rather than be the annoying doomer that just bothers people about technical stuff that’s irrelevant to the here and now.

      • lemmyvore
        link
        fedilink
        English
        161 year ago

        Or you could be fired because technically you’re the one that caused the outage.

        • Dojan
          link
          fedilink
          English
          101 year ago

          Damned if you do, damned if you don’t!

          • Awkwardparticle
            link
            fedilink
            71 year ago

            The ultimate goal is having no downtime. Ballast gives you that result. The cost of downtime far larger than wasting extra space for ballast.

      • @Malfeasant@lemm.ee
        link
        fedilink
        English
        21 year ago

        Except then they’ll decide you fixed it, so nothing more needs to be done. I’ve seen this happen more than once.

    • IWantToFuckSpez
      link
      fedilink
      11
      edit-2
      1 year ago

      And was fired for not doing his job which management prevented him from doing in the first place

  • Semi-Hemi-Demigod
    link
    fedilink
    1751 year ago

    Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.

    Also, set up alerts for disk space.

      • nfh
        link
        fedilink
        English
        301 year ago

        Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.

        • @RupeThereItIs@lemmy.world
          link
          fedilink
          English
          141 year ago

          A system this critical is on a SAN, if you’re properly alerting adding a bit more storage space is a 5 minute task.

          It should also have a DR solution, yes.

          • Nightwatch Admin
            link
            fedilink
            English
            11 year ago

            A system this critical is on a hypervisor with tight storage “because deduplication” (I’m not making this up).

            • @RupeThereItIs@lemmy.world
              link
              fedilink
              English
              51 year ago

              This is literally what I do for a living. Yes deduplication and thin provisioning.

              This is still a failure of monitoring or slow response to it.

              You keep your extra capacity handy on the storage array, not with some junk files on the filesystem.

              You also need to know how over provisioned you are and when you’re likely to run out of capacity… you know this from monitoring.

              Then when management fails to react promptly to your warnings. Shit like this happens.

              • Semi-Hemi-Demigod
                link
                fedilink
                31 year ago

                Then when management fails to react promptly to your warnings. Shit like this happens.

                The important part is that you have your warnings in writing, and BCC them to a personal email so you can cover your ass

      • @Agent641@lemmy.world
        link
        fedilink
        English
        201 year ago

        Yes, alert me when disk space is about to run out so I can ask for a massive raise and quit my job when they dont give it to me.

        Then when TSHTF they pay me to come back.

      • @ipkpjersi@lemmy.ml
        link
        fedilink
        English
        151 year ago

        A lot of companies have minimal alerting or no alerting at all. It’s kind of wild. I literally have better alerting in my home setup than many companies do lol

            • @IonAddis@lemmy.world
              link
              fedilink
              English
              41 year ago

              I imagine it’s a case where if you’re knowledgeable, yeah it’s free. But if you have to hire people knowledgeable to implement the free solution, you still have to pay the people. And companies love to balk at that!

              • @ipkpjersi@lemmy.ml
                link
                fedilink
                English
                21 year ago

                I think it’s that and any IT employees they have would not be allowed to work on it because they would be working on other stuff because companies wouldn’t prioritize that, since they don’t know how important it is until it’s too late.

      • @looz@sopuli.xyz
        link
        fedilink
        English
        91 year ago

        There’s cases where disk fills up quicker than one can reasonably react, even if alerts are in place. And sometimes culprit is something you can’t just go and kill.

        • @afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          -11 year ago

          Had an issue like that a few years back. A stand alone device that was filling up quickly. The poorly designed device could only be flushed via USB sticks. I told them that they had to do it weekly. Guess what they didn’t do. Looking back I should have made it alarm and flash once a week on a timer.

    • @dx1@lemmy.world
      link
      fedilink
      English
      541 year ago

      The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.

    • @Lem453@lemmy.ca
      link
      fedilink
      English
      31
      edit-2
      1 year ago

      Even better, cron job every 5 mins and if total remaining space falls to 5% auto delete the file and send a message to sys admin

      • Semi-Hemi-Demigod
        link
        fedilink
        211 year ago

        Sends a message and gets the services ready for potential shutdown. Or implements a rate limit to keep the service available but degraded.

      • @bug@lemmy.one
        link
        fedilink
        English
        31 year ago

        At that point just set the limit a few gig higher and don’t have the decoy file at all

    • Maximilious
      link
      fedilink
      29
      edit-2
      1 year ago

      10GB is nothing in an enterprise datastore housing PBs of data. 10GB is nothing for my 80TB homelab!

  • @Swiggles@lemmy.blahaj.zone
    link
    fedilink
    English
    971 year ago

    This happens. Recently we had a problem in production where our database grew by a factor of 10 in just a few minutes due to a replication glitch. Of course it took down the whole application as we ran out of space.

    Some things just happen and all head room and monitoring cannot save you if things go seriously wrong. You cannot prepare for everything in life and IT I guess. It is part of the job.

    • @RidcullyTheBrown@lemmy.world
      link
      fedilink
      English
      221 year ago

      Bad things can happen but that’s why you build disaster recovery into the infrastructure. Especially with a compqny as big as Toyota, you can’t have a single point of failure like this. They produce over 13,000 cars per day. This failure cost them close to 300,000,000 dollars just in cars.

        • @GloveNinja@lemmy.world
          link
          fedilink
          English
          31 year ago

          In my experience, the C-Suite dicks will put the hammer down on someone and maybe fire a couple of folks. They’ll demand a summary of what happened and what will be done to stop it from happening again. IT will provide legit options to resolve this long term, but because that comes with a price tag they’ll be told to fix it with “process changes” and the cycle continues.

          If they give IT money that’s less for themselves at EOY for bonuses so it’s a big concern /s

      • @Swiggles@lemmy.blahaj.zone
        link
        fedilink
        English
        61 year ago

        Yea, fair point regarding the single point of failure. I guess it was one of those scenarios that should just never happen.

        I am sure it won’t happen again though.

        As I said it can just happen even though you have redundant systems and everything. Sometimes you don’t think about that one unlikely scenario and boom.

  • MoogleMaestro
    link
    fedilink
    641 year ago

    There’s some irony to every tech company modeling their pipeline off Toyota’s Kanban system…

    Only for Toyota to completely fuck up their tech by running out of disk space for their system to exist on. Looks like someone should have put “Buy more hard drives” to the board.

    • @palitu@aussie.zone
      link
      fedilink
      English
      231 year ago

      not to mention the lean process effed them during fukashima and covid, with a breakdown in logistics and a shortage of chips, meant that their entire mode of operating shut down, as they had no capacity to deal with any outages in any of their systems. Maybe that has happened again, just in server land.

      • @GamingChairModel@lemmy.world
        link
        fedilink
        English
        291 year ago

        Toyota was the carmaker best positioned for the COVID chip shortage because they recognized it as a bottleneck. They were pumping out cars a few months longer than the others (even if they eventually hit the same wall everyone else did).

        • @palitu@aussie.zone
          link
          fedilink
          English
          11 year ago

          They have changed there processes now to ensure a bitore of a buffer in there supply, not so lean any more

      • @burningmatches@feddit.uk
        link
        fedilink
        English
        81 year ago

        It wasn’t just Fukushima. There was a massive flood in Thailand at the same time that shut down a load of suppliers. It was a really bad bit of luck but they did learn from that.

  • @MechanicalJester@lemm.ee
    link
    fedilink
    English
    631 year ago

    I blame lean philosophy. Keeping spare parts and redundancy is expensive so definitely don’t do it…which is just rolling the dice until it comes up snake eyes and your plant shuts down.

    It’s the “save 5% yearly and stop trying to avoid a daily 5% chance of disaster”

    Over prepared is silly, but so is under prepared.

    They were under prepared.

    • @Ryumast3r@lemmy.world
      link
      fedilink
      English
      501 year ago

      Lean philosophy is supposed to account for those dice-rolling moments. It’s not just “keep nothing in inventory”, there is supposed to be risk assessment involved.

      The problem is that leadership doesn’t interpret it that way and just sees “minimizing inventory increases profit!”

      • @IonAddis@lemmy.world
        link
        fedilink
        English
        91 year ago

        The problem is that leadership doesn’t interpret it that way and just sees “minimizing inventory increases profit!”

        Yep. Managers prioritize short-term gains (often personal gains, too) over the overall health of a business.

        There’s also industries where the “lean” strategy is inappropriate because the given industry is one that booms in times of crisis when logistics to get “just in time” supplies go kaput due to the same catastrophe that’s causing the industry to boom. Hospitals and clinics can end up in trouble like this.

        But there’s other industries too–I haven’t looked for it, but I’m sure there’s a plethora of analysis already on what Covid did to companies and their supply chains.

        • Alien Nathan Edward
          link
          fedilink
          English
          11 year ago

          Why would they prioritize long term gains? Their next review is only ever at most 6 months away and they’re either low-mid level and fighting for their lives every day or they’re upper mgmt and can always dump stick and YOLO out, potentially with a golden parachute.

      • @Aceticon@lemmy.world
        link
        fedilink
        English
        7
        edit-2
        1 year ago

        In my own impression from the side of software engineering (i.e. the whole discipline rather than just “coding”) this kind of thing is pretty common:

        • Start with ad-hoc software development with lots of confusion, redundancy, inneficient “we’ll figure it out as when we get there” and so on.
        • To improve on this somebody really thinks things through and eventually a software development process emerges, something like Agile.
        • There are lots of good reasons for every part of this processes but naturally sometimes the conditions are not met and certain parts are not suitable for use: the whole process is not and can never be a one size fits all silver bullet because it’s way to complex and vast a discipline for that (if it wasn’t you wouldn’t need a process to do it with even the minimum of efficency).
        • However most people using it aren’t the “grand thinkers” of software engineering - software architect level types with tons of experience and who thus have seen quite a lot and know why certain elements of a process are as they are, and hence when to use them and when not to use them - and instead they’re run-of-the-mill, far more junior software designers and developers, as well as people from the management side of things trying to organise a tech-heavy process.

        So you end up with what is an excellent process when used by people who know that each part tries to achieve, what’s the point of that and when is it actually applicable, being used by people who have no such experience and understanding of software development processes and just use it as one big recipe, blindly following it with no real understanding and hence often using it incorrectly.

        For example, you see tons of situations where the short development cycles of Agile (aka sprints) and use cases are used without the crucial element which is actually envolving the end-users or stakeholders in the definition of the use cases, evaluation of results and even prioritization of what to do in the next sprint, so one of the crucial objectives of use cases - the discovery of the requirement details by interactive cycles with end-users where they quickly see some results and you use their feedback to fine-tune what gets done to match what they actually need (rather than the vague very high level idea they themselves have at the start of the project) is not at all achieve and instead they’re little more than small project milestones that in the old days would just be entries in Microsoft Manager or some tool like that.

        This is IMHO the “problem” with any advanced systematic process in a complex domain: it’s excellent in the hands of those who have enough experience and understanding of concerns at all levels to use it but they’re generally either used by people without that experience (often because managers don’t even recognize the value of that experience until things unexpectedly blow up) or by actual managers whose experience might be vast but is actuallly in a parallel track that’s not really about dealing with the kinds of technical concerns that the process is designed to account for.

        • @RidcullyTheBrown@lemmy.world
          link
          fedilink
          English
          3
          edit-2
          1 year ago

          it’s excellent in the hands of those who have enough experience and understanding of concerns at all levels to use it but they’re generally either used by people without that experience

          You’re just saying that skilled people can do stuff better than unskilled people. This is hardly a software engineering issue. It is common in all aspects of life

          The difference with software engineering is that the field is still relatively young enough to not have figured out a working model or sets of working models (unlike farming, for example, or finance).This field is realistically 30 years old during which it continuously evolves and redefines itself so it’s still not going to produce good working models.

          Agile, since you picked on it, is very difficult to implement because it specifically relies on engineering figuring out how to work and how to deliver. It’s really not a model at all. It’s just a set of guidelines meant to create the environment in which the figuring out happens. It’s no wonder that it only works when people have the ability to figure it out.

          • @Aceticon@lemmy.world
            link
            fedilink
            English
            2
            edit-2
            1 year ago

            I’m not saying whatever you’re trying to put in my mouth.

            In very very VERY simple terms: A software engineer with half the experience of somebody at a technical architecture level isn’t half as capable a technical architect- such a person is pretty much totally incapable in that domain.

            Experience isn’t linear, it’s a sequence of unlocking and filling up of experience in domains which are linked but have separate concerns, with broader and broader scopes that go way beyond the mere coding, and this non-linerarity happens because it takes a while before people merelly become aware of the implications at the level at which they work of certain things outside their scope of work.

            So if you’re not at the level of even being aware of how the end users of a software being developed themselves have very vague and extremelly incomplete ideas of what they need as software to help the in their own business process, then you can’t even begin to see not only what’s the point of certain practices around things like use cases, but even the entire need and suitability of Agile versus other development processes in a specific project and environment, so you’re not at all qualified to decide which parts of that to do and which not to do in the specific situation of your specific project, or even if Agile is the right choice.

            People who don’t even know about the forms of requirements gathering in different environments can’t even begin to evaluate the suitability for their environment of a Process such as Agile which was designed mostly to address the “fast changing requirements” project situations, which are the product of various weakness in requirements gathering and/or fast changing business needs, which at the development side snowball into massive problems when long-development-cycle processes such as waterfall are used (for example when supposedly “done projects” do not produced something that matches stakeholder needs, hence end up having to be “fixed” so late in the process that it massivelly disrupts the software at a design and even architectural level, introducing massive weaknesses in the code base and code spaghettization, hence bugs and maintenability nightmares).

    • @I_Has_A_Hat@lemmy.ml
      link
      fedilink
      English
      281 year ago

      I work in a manufacturing company that was owned by the founder for 50 years until about 4 years ago when he retired. He disagreed with a lot of the ideas behind lean manufacturing so we had like 5 years worth of inventory sitting in our warehouse.

      When the new management came in, there was a lot of squawking about inefficiency, how wasteful it was to keep so much raw material on the shelf, and how we absolutely needed to sell it off or get rid of it.

      Then a funny little thing happened in 2020.

      Suddenly, we were the only company in our industry still churning out product. Other companies were calling us, desperate to buy our products or even just our raw material. We saw MASSIVE growth the next two years and came out of the pandemic better than ever. And it was mostly thanks to the old owners view that “Just In Time” manufacturing was BS.

      • @daq
        link
        English
        31 year ago

        Cool story, but a once every 150 years pandemic is hardly a good reason to keep wasting money on storing stuff. A fire or a flood was much more likely to wipe it all out in 50 years.

        Even in your anecdote the owner never actually benefited from the extra costs.

        Depending on what you’re producing costs to maintain extra inventory of raw materials can be massive and for the company the size of Toyota, multiply that by million.

        • @some_guy
          link
          English
          121 year ago

          You’re not wrong, but the real answer is somewhere in between.

        • @Kurroth@lemmy.world
          link
          fedilink
          English
          21 year ago

          Even in your anecdote the owner never actually benefited from the extra costs.

          Imagine doing something or having a life/business philosophy that doesn’t exist for your own soul benefit, and exists maybe for the benefit of others.

      • @afraid_of_zombies@lemmy.world
        link
        fedilink
        English
        01 year ago

        Oh man. I work for an OEM and have almost the same story except the CEO didn’t retire so our shelves were never bare. A large part of our sales last year was because our competition couldn’t source parts and we could. Also since they have a skeleton engineering crew they couldn’t figure out how to improvise (at least this is what our salespeople are claiming).

        2022 I spent 3 hours a day, every single working day, just making ECNs for parts out of stock.

        Lean JIT is such fucking crap. When a customer line goes down they are bleeding who knows how many millions a day. They want a solution right freaken now. So they call the OEM and we tell them that part that died we have right here on the shelf and we can overnight it.

        But you know how you really know that JIT Evengelical types are full of it? Talk to any of them now and they won’t say something like “look, normally it works but no one could have predicted this” they will double down on it and say it works but was implement incorrectly. It isnt “true” JIT.

        Many years ago I read a sentence that I use every single week of my life “all ideology can do is reassert itself endlessly”. When I hear crap like that, how X works if and only if it is true-X I know I am dealing with ideology. When I hear that X works under very specific situations and only gives specific results I know I am dealing with facts. Yes if the customer is never in a rush, there isn’t a world shaking event, and you are running out of capacity for storage JIT might be an answer.

  • AutoTL;DRB
    link
    fedilink
    English
    371 year ago

    This is the best summary I could come up with:


    TOKYO, Sept 6 (Reuters) - A malfunction that shut down all of Toyota Motor’s (7203.T) assembly plants in Japan for about a day last week occurred because some servers used to process parts orders became unavailable after maintenance procedures, the company said.

    The system halt followed an error due to insufficient disk space on some of the servers and was not caused by a cyberattack, the world’s largest automaker by sales said in a statement on Wednesday.

    “The system was restored after the data was transferred to a server with a larger capacity,” Toyota said.

    The issue occurred following regular maintenance work on the servers, the company said, adding that it would review its maintenance procedures.

    Two people with knowledge of the matter had told Reuters the malfunction occurred during an update of the automaker’s parts ordering system.

    Toyota restarted operations at its assembly plants in its home market on Wednesday last week, a day after the malfunction occurred.


    The original article contains 159 words, the summary contains 159 words. Saved 0%. I’m a bot and I’m open source!

  • blazera
    link
    fedilink
    321 year ago

    This is a fun read in the wake of learning about all the personal data car manufacturers have been collecting

  • R0cket_M00se
    link
    fedilink
    English
    221 year ago

    Was this that full shutdown everyone thought was going to be malware?

    The worst malware of all, unsupervised junior sysadmins.

  • Carlos Solís
    link
    fedilink
    English
    61 year ago

    And that’s why I have a weekly cronjob in my server to call BleachBit, remove cycled logs, and compress the images on my storage. Having to make do with a rather limited VPS for years taught me to be resourceful with what I had

  • @RFBurns@lemmy.world
    link
    fedilink
    English
    51 year ago

    Storage has never been cheaper.

    There’s going to be a seppuku session in somebody’s IT department.