PHASE III CLINICAL TRIAL SAMPLE-SIZE DEMONSTRATION: To Help Non-Statisticians Understand Logistical Study-Size Considerations

Link to Demo Applet, Instructions, and Explanation


PAGE THAT RUNS DEMO APPLET (Click On To Run). Note: if you have the right version of the "Java Runtime Environment" on your browser, this will open a new "load page" browser window with no useful program display, and two resizeable, moveable applet interactive windows. They may take up to about half of a minute to appear. (Note that the applet is not a Windows program, but a safe Java applet. Java is designed so that when run on a browser there is no capability to affect your system, and no harm can be done.)

To turn off the applet (and get it off your screen), just close the "load page" browser window (not the browser window with this page), or else change the page that that browser window is on (by typing a new URL, etc.).

To control clutter from too many windows on your screen, you can resize to very small or minimize the "load page". You need to keep both of the demo applet input/output windows, but you can resize or move them as you see fit. Other browser windows, such as the instructions page, are not needed to run the program, and can of course be closed or minimized.


Then you don't have Java on your browser, or the version is the Microsoft version, or is too old. (I have tested, and the applet ran flawlessly and without difficulty on both Internet Explorer and Netscape Navigator on both Windows XP and Windows 98, and also on Netscape Navigator and the Mozilla Browser in Linux. It did not work on the Microsoft version of Java the came with my Internet Explorer browser.)

The simplest solution is to try switching browsers. In particular, if you have Netscape Navigator, it uses Sun Java, and when it installs, it either installs Sun Java automatically, or gives an option to install that.

If that doesn't work, here are ways to get the correct Java on your browser:

JAVA-DOWNLOAD-SEEKING PAGE. If the applet doesn't run from the prior link due to you not having the Java Runtime Environment, or an old version of same, this page might or might not direct you to the Sun Java Runtime Environment download when run on most browsers.

If that doesn't work, go to this Sun Microsystems link. (You want to download just the Runtime Environment, not the Software Development Kit. If you have an old Microsoft version of Java, you DO NOT need to uninstall it.) You download a file, and then click on it to install it. You can trust what you download because it is from Sun Microsystems. For those of you running my Applet from inside corporate walls, it is possible that the I.T. people have done something to your P.C. that locks you out of being able to install the Java Runtime Environment. In that case, if you are interested, you can try it at home first, and have the I.T. people come over for a few minutes and install the Java Runtime Environment if you find the program useful enough. (The Runtime Environment, if you don't already have it, loads and installs quickly from a high-speed internet line. For the Linux version, there is a little manual set-up to do, described in the install documentation.)

For Macs, the download is provided by Apple, and is here. (It is version 1.4.1, which is the latest version at the time I looked. My applet was written on v. 1.4.1, so this should be fine.)

An alternative to downloading the Java Runtime Environment: You can always download Netscape Navigator, and accept the prompt during the install to install the browser with Java support. Then, when you visit my Applet page with Netscape Navigator, the Applet will work.

About the Java Runtime Environment: The Sun Java Runtime Environment is a program specific to your computer-type and operating system program which takes universal (non- computer-type- or operating-system- dependent) java "byte code", and translates it to run on your machine. The Environment runs Java byte code when you run a Java program on your machine, and also when you click on a Java-byte-code "applet" from the Internet through a web browser. The Java Runtime Environment is designed to not allow any Java applet run from a browser to do anything harmful to your computer, which is why it is allowed to run when you just click on a web page.


The demonstration program is designed to aid people running clinical studies, and other non-statisticians involved in the planning of drug and medical-device development, in understanding the effect of study sample-size decisions on drug- and device- approval timing. It is hoped that this will aid in productive interaction between such people and statisticians.


Say a company is trying to demonstrate the efficacy of a new drug (or medical device) to regulatory authorities by doing some clinical studies. Typically, each study makes the demonstration by comparing the new drug or device (the "test" treatment) to either a known-effective treatment, or a placebo. (We refer here to whichever is used of a known-effective treatment or placebo as a "reference" treatment.)

The typical demonstration of efficacy involves comparing between the treatments the true mean values of meaningful random variables. Random variables commonly used are disease cure (made numerical by 1=cure, 0 = no cure), disease improvement (made numerical the same way), disease-level score, diastolic blood-pressure, etc. The focus is on the true mean for each treatment, rather than the mean observed in the study, to avoid false assertions due to the effect of chance on the study outcome. (It is reasonable to conceptualize the true mean for each treatment as the mean that you would get by conducting an impracticably large study with millions of patients getting each treatment.)

The demonstration program, for simpicity, focuses on the particular case where the random variable being measured is two-valued, such as cure/no-cure or success/no-success. When the variable has more than two values, similar principles apply, but they are not explicitly discussed here, or handled by the demo program.

In the case of clinical studies involving a known-effective reference treatment (rather than a placebo), if one wants to demonstrate equivalence of the new treatment to the reference, regulatory standards typically allow, for study economics considerations, demonstration that the new treatment is no more than a given tolerance worse (e.g., a 5% or 10% lower cure rate) than the reference. In such case, the typical method is to construct a 95% two-sided confidence interval on the difference (test-reference) in the true cure rates from the study data. If the confidence interval entirely lies above the negative of the tolerance, the demonstration is made. For example, if the tolerance is 10%, confidence intervals on difference (test-ref) in true proportions for treatment success of (-.09,.04) or (.02,.24) would make the demonstration, while (-.17,-.02) or (-.13,.12) would fail to make the demonstraton.

A variation on the scheme for demonstrating equivalence to a known-effective treatment is demonstration of superiority to the known treatment by a given margin. Thus, to demonstrate superiority by at least 15%, confidence intervals of (.16,,34) or (.22,.44) would demonstrate the claim, but (.11,.23) or (-.01,.24) would not.

When the reference treatment is not a known-effective treatment, but placebo, then a bare-minimum demonstration of superiority over placebo is any confidence interval (test-ref) entirely above 0, such as (.02,.23) or (.24,.34). However, since that requirement allows just a minimal effect that may not be worth the treatment cost or the safety risk, one really would like to have a confidence interval a good margin above 0. What "good margin" means depends on all the specifics of the treatment risk and disease in question.

One should note that often, particularly the placebo-reference case, the demonstration criterion is expressed in terms of statistical tests rather than confidence intervals. Frequently, though, it works out that the test approach is really the confidence-interval approach in disguise, and is logically bound to produce the exact same results all the time.

Note that, for drugs (but not devices), two studies demonstrating efficacy are usually required by regulations. Thus, we need two separate studies with confidence intervals entirely above the appropriate limit.

Now, in planning Phase III clinical studies (where the idea is to prove an efficacy claim, not just gather additional information), one primary consideration is to have enough patients so that the study does not have a high likelihood of failing to make its desired demonstration, but not so high as to have prohibitive cost, or take too long. Thus, we have the trade-off of more patients giving more likelihood of allowing the claim to be made, but at a later time, more cost, and with more patients discomforted.

The demonstration applet demonstrates precisely this trade-off.


The program has two windows, which are both resizeable for convenience.

The first window, the calculation window, performs calculations appropriate to planning study sample size. These are typically performed, in one form or another, by statisticians planning sizes for studies attempting, in part, to demonstrate efficacy claims.

The second window, the simulation window, simulates conducting a study at a given sample size 100 or 500 times, and looks at the claim-demonstration record of the simulated studies. Its purpose is to aid non-statisticians in a complete and sound understanding of the meaning of the numbers in the sample-size calculations.

This is a picture of the first window, the calculation window. (If you want the applet rather than the picture, and haven't run it, click on the link at the top of this page.)

The left portion of this window allows one to select the type of claim one is trying to make (which controls the number the study confidence interval will need to lie above), and the each potential sample-size and treatment ratio that one wants to look at. After making these selections, one hits the "Calculate for Above" button, and certain probabilities appear on the right portion of the window.

Each probability is the probability of the study demonstrating the claim given a pair of true treatment-success rates for the test and reference treatments. Now, the study planner will not know the true success rates, but should have some idea of the range of values from previous studies and other general knowledge. Thus, the program allows one to highlight cells where one expects the true probabilities lie, by clicking on cells, or dragging across a range of cells. (When this is done to highlighted cells, they toggle to unhighlighted.) The screenshot shows some such cells, highlighted in orange. Note that the highlighting in orange does nothing besides the highlighting -- it is just to help the user hold the place of the particular cells of interest, while switching the sample-sizes and other conditions with the buttons on the left.

Only probabilities where the test treatment has true success rates of 50% and above are displayed, to save space. (The reference treatment might be placebo, so I have left in reference true treatment-success rates below 50%.) Note also I have gaps of .05 in true treatment-success rates. Since variation is smooth, one can roughly interpolate mentally. At any rate, the program is designed as a demonstration, and so one can see a statistician to get more accurate in-between values.

A user should note about the values behind the calculation that they should be read from the little green section below the calculated probabilities, rather than the buttons on the right. This is since the buttons down on the right can be changed, and will not then match the calculations until "Calculate for Above" button is repressed.

Now, the numbers shown on this window can be a little abstract, so we have reinforced them by allowing studies to be simulated, and the resulting confidence intervals and claim-demonstration status to be looked at within and performancewise-across the simulated studies.

When one clicks on the "100 Simulations" or "500 Simulations" button, the situation whose calculated probabilities are shown (which will be described on the green panel below them, not necessarily by the buttons on the left) will be simulated 100 or 500 times. The user is first prompted to click on the cell representing the true test and reference success rates to use for the simulation. (This cell gets highlighted in green, as on the screen shot.) A result for each patient in each of the 100 or 500 simulated studies is then generated by the computer, and the summary studywise and all-study results are displayed on the other window, the "simulation window".

This is a picture of the simulation window. You can scroll the simulation window up and down and look at all 100 or 500 simulated studies. We show the critical 95% confidence interval on the difference (test-ref) in true treatment-success rates, on which determination of claim-demonstration is based. We also indicate whether the demonstration has been made. Just as an aside, we indicate whether the confidence interval happens to contain the true difference. It, of course, should at least 95% of the time. (In fact, for lower sample-sizes like 25, it can sometimes contains it somewhat less than 95% of the time, since I have used, for ease of computation, a large-sample-theory confidence interval. It is a valid observation that the confidence intervals I have used therefore possibly should not be used with smaller sample sizes, and a statistician may thus use something that will contain the true value closer to or absolutely-at-least 95% of the time. Cytel Software's Statxact or its PROC Statxact SAS extension will give such a confidence interval.)

The window also gives, at the bottom, the number and proportion of simulated studies where the claim was demonstrated. The proportion should be close to the probability in the calculation window, particularly with 500 simulations. For small sample-sizes like 25 or 50, it is sometimes a little bit more off than one might like. This is because I have used large-sample methods in the calculations (again, for ease of computation). It is again a valid criticism that with small sample sizes, my numbers may not be good enough. (That is, with up to 2 million dollars a day riding on drug-approval timing, the resource should be applied to get a more accurate number to use to determine sample size. Once again, Statxact or PROC Statxact will supply the exact number.)

One should note that, just as the situation behind the probabilities shown on the calculation window can sometimes vary from that on the buttons on the left of that window, so the situation shown on the simulation window can vary from any of the ones on the calculation window. This happens when one does a simulation, then goes back to the calculation window and calculates probabilites for a new situation. Thus, one should rely on the labelling on the simulation window to know the situation simulated in the simulation window.


For drugs, typically two claim-demonstrating studies are required for each drug indication. The above technique deals with probabilities of each study supporting the claim. But, the probability of both studies supporting the claim is easy to derive.

Support of the claim of efficacy is not the only goal of the study. It must support the safety of the test treatment, which is usually done with less of a formal criterion than efficacy. Thus, the sample size needs, besides supporting efficacy, to be large enough to do a sufficiently good check on safety. Study planners need to take this into consideration.

To the extent that the goals of a study can be expressed precisely in financial terms (maximum profit, or perhaps some good combination of profit and cash-stream stability within ethical constraints), there is a problem getting complete and precise financial numbers, stemming from more than just the not-exactly-known true treatment-success rates. Though we may have a good guess at profit if the studies support the efficacy claim (through projected sales), what happens if one or more studies miss the claim has some vagueness. In particular, additional studies may or may not be done, depending perhaps on the additional information about the true success rates obtained from the failed studies. So exact numbers can not be determined. But approximate numbers, still crucial to planning, can usually be determined.

Non-binary measures: For categorical and continuous variables, the same kind of sample sizing can be done, though there is a bit of usually manageable complication due to more statistical parameters. In many of these cases, exact confidence intervals will not be an option, at least using commonly available software. For calculation of the probability of demonstrating the claim, if one wants the most precise numbers, in many cases they will not come from standard software, and one may have to do custom simulations.

Parsimonious analysis techniques: In the important case when less-elementary analysis techniques are used for the sake of parsimony, such as the various forms of regression or stratification, it does get a bit more difficult. There are more unknown parameters, and certain things may happen, such as lack of fit on the parsimonious model causing a drop back to a less parsimonious technique. It's more complicated, but not intractable. It takes resource to analyze this kind of thing, and standard software will not usually do it. But with up to 2 million dollars a day at stake, it may be appropriate to apply the resource.

One-sided Confidence Intervals: A common criterion uses 95% two-sided confidence intervals, as I have them in my demonstration. However, one-sided 95% confidence intervals on test-ref (infinite above) are O.K., if O.K. with the regulator. Particularly when two studies are required, the chance of a false demonstration is pretty low.

Phase I bioequivalence: Since the criterion for demonstrating bioequivalence is usually based on confidence intervals clearing a tolerance point, the same basic method applies to these studies. Due to smaller sample sizes, one may prefer more precise non-large-sample (and non-normal-distribution-assuming) calculation methods.

Phase II non-demonstrational: For studies where the goal is additional information about a drug, rather than passing a well-defined efficacy-aspect-approval criterion, the methods here do not apply. The correct approach in this case is value of information, with larger sample sizes costing more, and taking longer to supply complete information, but giving more information. These consequences translate to an impact on expected financial gain with the various possible sample sizes. The consequent financial gains may be difficult to foresee precisely, however, so that some may prefer looking at something like a typical or expected single-study confidence-interval size, or an expected meta-analysis confidence-interval size. Such confidence-interval size indicates something about the amount of information that will be gotten from the study, leaving the user somewhat able to reason intuitively about the value of that information. Note that: phase II sample sizing and results interpretation is part of the larger set of techniques that pharmaceutical companies use to guide critical development decisions from biological and chemical exploration to pre-clinical studies onward. The techniques can be mathematically complex, must be honed to accurately reflect reality, and are part of the critical proprietary set of tools and methods that drives some pharmaceutical corporations to be extremely successful. Such tools do not often make it into the literature or the conferences!

Marketing / Direct Marketing / Pricing: Extending our consideration beyond the pharmaceutical industry here, it is not uncommon to do randomized or non-randomized studies that experiment with different approaches or prices. The method demonstrated by my demonstration program does not apply. What is common with that situation (and perhaps all business logistical choices) is the choice of sample size to maximize (expected) profit (within ethical constraints). In such cases, the approach to determining size and design of the study, or even whether to do any study, is often expressed through the expected (monetary) value of the information (i.e., as in the case of Phase II not-for-demonstration-to-regulators clinical studies). This expected value can not be exactly known, but one often can get a good idea of it, and plan good studies using analytical techniques.

Contact info:
We welcome comments, discussion, criticisms, browser/system compatability issues, and (hopefully not) reports of genuine bugs.

GO TO MAIN PAGE OF OUR COMPANY (N.A.S. Technical Services, Inc. We provide statistical and statistical programming services. Call: 203-794-9027)

LITTLE SIDE PROJECT DONE FOR FUN: People interested in acoustics, or who enjoy classical music, might get a kick out of my Music Spectrograms, which I put together for fun some time back.