# [rc5] Re: OS/2 ramdisk and spindown

Skip Huffman SHuffman at Atl.Carreker.Com
Mon Oct 13 10:43:40 EDT 1997

On Sun, 12 Oct 1997 01:17:25 -0400 (EDT), James Mastros wrote:

>> MTBF means that if all the drives of that model were running at the same
>> time, and the total time they were running were added to each other they
>> would run for X hours combined!!!  before one of them failed.
>If you want to be techinical, MTBF meens Mean Time Between Failures or Mean
>Time Before Falure (one or more of those "Me[ae]n"s are misspelled, I
>know...).  That is to say, if you ran a infinite number of hard drives for
>that amount of time, exactly half of them should have failed.

Actually, you are both confused.  The first is a Cumulative Time Before Failure.  Which seems like a pretty useless number to me.  The second is a Median Time Before failure.  This would be the point where half of the drives have failed.

What is specified is a Mean time Before Failure.  This number is derived by totaling the running time of all the drives in the sample and dividing by the number in the sample.  How this can be derived with drive lifetimes of years and product cycles of months is beyond me, but that is what MTBF means.

This is extremely basic statistics.  There are three primary types of averages.  Mean, Median, and Mode.  A Mean average is derived by adding the measurements of all items in a sample and then dividing the total by the number of items in the sample.  A Median average is the point where half of the items in a sample measure below that point and half measure above.  A mode average is that measurement where the largest number of items have that measurement.

An example:

Lets say you have 100 hard drives that you have driven to destruction to find the average times to failure.  Lets say these are your results.

Time in hours     Number of drives failed
-------------     -----------------------
<10K              30
10K-20K           15
20K-30K           8
40K-50K           4
50K-60K           0
70K-80K           0
80K-90K           1
90K-100K          2
100K-110K         4
110K-120K         8
120K-130K         16
130K-140K         12

The numbers are completely artificial, but the distribution should be about right.

The averages are as follows:

Mode: 10,000 Hours.  Most of these drives failed during burn in testing, or just after delivery.

Median: Just under 30,000 hours.  Half of these drives failed before they reached 30,000 hours of use.

Mean:  62,000 hours.  All the hours of life of all the drives added together and divided by the number of drives. (Also the best number to use when marketing the drives.

Now, by fudging the sample a little, significantly different results can apply.  Lets assume that you run a 10,000 hour burn in on all your drives and reject the failed ones.  Now that first 10,000 hours need not apply, they are part of the manufacturing process.  Your sample is now 70 drives and here are the averages.

Your Mode is now an impressive 130,000 hours.  Most of your drives are failing at the high end of their possible lifetimes.

Your Median is now a bit under 120,000 hours.  Amazing what eliminating the truly defective drives does for you.

The Mean is only up to 88,000 hours though.  Those drives that you eliminated really were not contributing much to the total running hours of the sample.

Always remember what Samuel Clemons said:

" There are three kinds of lies: Lies, Damn Lies, and Statistics."

Sorry for the lecture, but statistics are so misused in advertising that I can't help myself.

Prof. Skip

----
To unsubscribe, send email to majordomo at llamas.net with 'unsubscribe rc5' in the body.