Sunday, June 9, 2013

100 shades of product quality

Sometimes we ask ourselves questions concerning the differences and compatibility between different models of the same product.

Underlying those question, is a reason why such questions would even exist at all. By explaining those common-sense approaches any manufacturer would take, it is my hope that programmers who write device drivers for operating systems would explore various possibilities which could be exploited to optimize the performance of the operating systems.

Product Binning

In the 1980s/90s, programmers were rather familiar with floppy disks. There were various grades of floppy disks. As an approximate illustration:

Bin 1 = double-sided 1.44 MB x 2 = 2.88 MB
Bin 2 = double-sided 720 KB x 2 = 1.44 MB
Bin 3 = Single-sided 1.44 MB.
Bin 4 = double-sided 360KB x 2 = 720 KB.
Bin 5 = Single-sided 720KB.
Bin 6 = Single-sided 360KB.
Bin 9 = Scrap, no or negative commercial value.

Did you know that all those commercial grades would come out of the same production line? At an end-of-line operation, each batch of disks would be sample tested and binned accordingly.

Binning is grading the performance of the same product line, so that they could be product branded to optimize the profit margins of the product. Why would we waste financially accountable resources by setting up 16 different manufacturing lines for the same architecture, to satisfy 16 different branded products?

For example, various Yamaha keyboards would actually have the same core, where an end-of-line operation would lock certain features into the core before they proceed for external brand packaging. I remember a piano teacher/Yamaha sales  person teaching me how to unlock certain features by buying a cheaper model.

CPU Binning

Semiconductor devices, memories and CPUs go through similar unified manufacturing process flows and end-of-line differentiation. Various models of CPUs of the same architecture would be manufactured using one single manufacturing process flow. Auxiliary differentiation of processes would depend on the contracts to be fulfilled, e.g., the type of packaging specified in the contracts. Those contracts could be agreements manufacturing have with internal marketing divisions of the company or with external OEMs.

As the wafers roll out from the same fabrication process stream, the individual ICs would be tested and graded/binned according to performance. The features for which performance are graded could include
  • CPU speed
  • bus speed
  • cache memory defects
  • cpu defects
  • ALU defects
  • GPU defects
  • various instruction functionality
A unit that failed GPU test would be graded into a bin where the CPUs would be branded and sold as without GPU.

A quad core CPU failing one core would be binned as a three-core processor.

A CPU failing a set of instructions could be binned into a product contract that does not specify that set of instructions.

The combination of all these features should give a particular product line at least 16 bins if not 32 bins. Honestly, I don't really know.

Regrading Binnings

I recall at one time, I had to sign off down-grading a million dollars' worth of CPUs, by executing an administrative -3 bin regrading. Those CPUs were tested flawless as bin 1. However, the contract was flawed. The contract called for bin 4 CPUs. We had waited a few months for bin 4s to accumulate, but the process stream was so matured that only bin 1 CPUs were produced. And the contract deadline was the following week.

That was a costly contract and I recall that product had stopped offering any grade but bin 1 and 2 CPUs, six months later.

I recall my team having to retest and regrade a great number of units from bin 2 to bin 1 because company had revised the specifications of the product line. At that time, I recall thinking, "Is that even possible? What manner of sorcery did the upper echelons of the company do to pull that one off?"

Overclocking

Why is it even possible to over-clock the CPU that you bought? I mean, if it says 3.2 GHz, it should sit on its butt at 3.2 GHz and should not be able to run at higher speeds, right?

Besides the phenomenon where we could run a device at higher speeds if we spent enough money to drastically cool the device during operation.

Even as low level industrialists, we are obliged to continuously improve the quality and performance of products, by continuously adapting our processes. Our main goal is to reduce the number of low-binning units. The lower the binning, the better the product. The higher the bin number, the lousier the bin.

However, natural statistical phenomena have it that the process improvements affect the whole spectrum of a product line. To eliminate the existence of bin 13 onwards, we might have to improve quality to move the process median from bin 10 to bin 5. The median bin is the bin into which the most number of units got graded.

Let us say that the following is an illustration of a simplistic one dimensional binning distribution of a product line. Where, bin 99 = scrap value, we have 98% yield.



We could improve profit performance by improving processed that could move the median bin from bin 10 to bin 6.




Most likely the process improvement could result in the following bin distribution.



Where we have bins of a1 - a4 treated as part of bin1.

Therefore, if bin 1 = 3.2 GHz performance, bin a1 could be 3.5 GHz and bin a4 could be 4.8 GHz. But we would not know because we will not spend resources to test beyond brand specifications. We might perform extremity test of 5 GHz but that is actually to qualify the CPU for 3.2 GHz, performed at unrealistically low ambient temperatures.

Therefore people who buy an extreme version certified at 3.2 GHz could actually clock it at 4.8 GHz if you are in luck to fall upon that rare outlier.


The OEM

An OEM may present us a contract to test a set of characteristics. The OEM would specify the units to be unmarked and unbranded.

Perhaps, the OEM is hanging on possibilities of catching outliers. Perhaps, OEM is suspecting that we at times are forced to administratively down-grade bin 3 CPUs to bin 6 to satisfy bin 6 contracts - where because of our process improvement is of such excellence, bin 6 no longer exists in huge quantities.

Perhaps, OEM is writing a BIOs that could circumvent and tolerate a minimal instruction failure in one of the cores of a 4-core CPU. The BIOs might be intelligent enough so that they do not have to shut down any core but still operate the CPU as a quad core.

Perhaps, if one of the quadrant of the 256 MB cache fails, they would not burn off access to the whole quadrant which would down grade it to 128 MB cache, but their BIOS would run it at 240 MB by ignoring the bad 16MB cluster.

I don't know, as these are merely my musings what OEMs do with the CPUs they contracted.


No comments:

Post a Comment