Intel finally provides a clearer overview of Raptor Lake instability issues No recall, no fix for affected CPUs, investigation still ongoing. Intel has responded to press inquiries about the instability issues affecting 13th and 14th Gen Core processors. Questions from The Verge to Intel were prompted by an earlier, rather underwhelming statement that did not […]
Moore’s Law is Dead shared an interesting video yesterday about these chips. Supposedly, leaks from his sources at Intel say that high voltages being pushed through the ring bus cause degradation. The leaks claim it shares the same power rail as the P and E cores, meaning it’s influenced by the voltage requested by the cores.
For context, the ring bus is responsible for communication between cores, peripherals, and the platform. This includes memory accesses, which means that if the ring bus fails and does something incorrectly, it could appear normal but result in errors far down the line.
Going beyond the video specifically, and considering what others have suggested as workarounds, it seems like ring bus degradation might be a decent candidate for the actual root cause of these issues.
Some observations around chips degrading were:
High memory pressure exacerbates the issue.
Chips with more cores deteriorate faster.
Some of the suggestions to work around the issue were:
Lower the memory speed.
Lower the voltage and clock speeds.
Disabling E cores.
All of those can be related to stress being put on the ring bus:
Higher voltage being put through the bus -> higher likelihood of physical damage
More memory pressure -> more usage of the bus, more opportunity for damage to accumulate
More cores -> more memory pressure
Slower memory speeds -> less maximum throughput -> less stress
I’m not claiming anything definitive, but I think my money is on this one.
The scariest part of this whole problem is there is no way for the owners of i13/14 CPU to figure out to what extent the CPU is damaged. It’s like holding a ticking bomb without knowing when that will go off!
100%. Whatever Intel does at this point, I don’t trust it to be a fix so much as a mitigation or attempt to delay the inevitable until a few years after the warranty period.
If it’s possible for people to return their 13th/14th gen processor and trade up for a 12th gen, that would be the safest solution.
I’ve heard speculation that this is exasperated by a feature where the CPU increases the voltage to boost clocks when running single core workloads at low temperatures. If that’s true, having less load or better cooling may be detrimental to the life of the processor.
Moore’s Law is Dead shared an interesting video yesterday about these chips. Supposedly, leaks from his sources at Intel say that high voltages being pushed through the ring bus cause degradation. The leaks claim it shares the same power rail as the P and E cores, meaning it’s influenced by the voltage requested by the cores.
For context, the ring bus is responsible for communication between cores, peripherals, and the platform. This includes memory accesses, which means that if the ring bus fails and does something incorrectly, it could appear normal but result in errors far down the line.
Going beyond the video specifically, and considering what others have suggested as workarounds, it seems like ring bus degradation might be a decent candidate for the actual root cause of these issues.
Some observations around chips degrading were:
Some of the suggestions to work around the issue were:
All of those can be related to stress being put on the ring bus:
I’m not claiming anything definitive, but I think my money is on this one.
Thanks for the additional details.
The scariest part of this whole problem is there is no way for the owners of i13/14 CPU to figure out to what extent the CPU is damaged. It’s like holding a ticking bomb without knowing when that will go off!
100%. Whatever Intel does at this point, I don’t trust it to be a fix so much as a mitigation or attempt to delay the inevitable until a few years after the warranty period.
If it’s possible for people to return their 13th/14th gen processor and trade up for a 12th gen, that would be the safest solution.
I’ve heard speculation that this is exasperated by a feature where the CPU increases the voltage to boost clocks when running single core workloads at low temperatures. If that’s true, having less load or better cooling may be detrimental to the life of the processor.