Continuitis,
Dichotomania and the Tetrachoric Coefficient of Correlation
Balancing
Act: Guernsey Scrapes the Barrel
OK, so you might think that
in this age of air travel, channel tunnels and the like, this is not the
cleverest of commercial developments, since there already exists a well-proven
non-invasive solution to the problem of sea-sickness: I mean, of course, not
travelling by sea. However, some of us, in this age of global warming, are
preparing for the era of rising sea-levels and, having noticed that the typical
altitude of the world's airport is pretty pathetic, are drawing our own
conclusions. And just imagine what will happen to the Chunnel once the
sea-level reaches the height of Shakespeare Cliffs. We are preparing for the
day when the capital of
But along with
perspicacity, foresight and self-effacing modesty, I count philanthropy as one
of my many virtues and so I am prepared to give the world details not only of
my cure for sea-sickness but also of my method of proving that it works.
The cure is salt and
water. The proof of the efficacy will follow in due course but first note the
cunning convenience of the solution (H2O, Na+, Cl-). The cure is none-other
than sea-water itself and thus the more that global warming leads to rising
sea-levels and hence to the spread of sea-travel with the attendant increase in
the potential for that human misery known as mal-de-mer, the more the
prophylactic material will be to hand. It is true that the general increase in
sea-level will be due to the melting of the Antarctic and Greenland ice-caps
and thus an increase in sea-level will be accompanied by a general dilution of
the active ingredient (sodium chloride) but the rough calculations that I have
performed have shown that this effect may be ignored as unimportant.
The efficacy of
saltwater as a cure for seasickness has been established in a fully randomised
double blind parallel group trial in a single centre (Elixir Laboratories of
Pannostrum Pharmaceuticals) using cups of tea (English Breakfast at a strength
of three bags per pot) as a control. A cross-over trial from
I am pleased to
announce that the results were overwhelmingly in favour of saltwater, a highly
significant difference being found.
Carping critics (and
I note in passing, that the carp is a freshwater fish) have complained that the
result is spurious and entirely due to the use of a baseline taken after
treatment. Indeed, they claim that the "benefit" due to sea-water
arises simply because using differences from "baseline" reflect in
inverted form the emetic effect of saltwater as captured by the baselines
rather than its protective effect as expressed in the outcomes.
If this is the sole
criticism these small-fry can produce then, as a dab-hand at dealing with
gripes and obloquy, I leave them to flounder in the sea of their own perverted
logic.
It is true that the
baseline was taken just before provocation and some time after treatment. It is
true that at this point there was a considerable difference between the two
groups. It is true that the saltwater group had much higher NAUSE scores that
the tea groups. Many of them were indeed puking their guts out before the
provocation. But I have two unanswerable replies to the criticism. First, you
can always correct baseline imbalance by subtracting the baselines. Everyone
knows this. Secondly, if you look at provocation trials in general you will see
that this approach is frequently used: with glucose provocations in diabetes,
exercise tests in angina, or methacholine, histamine or allergen challenges in
asthma. In such trials the outcome is invariably referred to (or else titrated
with reference to) a "baseline" taken before provocation but after treatment.
Percentage drop from "baseline" in FEV1 is the standard
measure in provocation trials in asthma, for example. Indeed, I am not even the
first to use this general approach in a trial of treatments to prevent nausea.
Clearly we cannot
change an approach that is already a standard because then our results would no
longer be comparable to those of others. After all, it is more important for
physicians to use procedures that are consistent with those used by other
physicians than to use procedures that correctly measure effects. Consistency
takes precedence over veracity.
In short, any
statistical technique that has won the hearts of so many physicians can't be
wrong.
Can it?
Dear
I am a physician who has
worked happily in the pharmaceutical industry for many years developing (well,
to be honest, trying to develop) drugs for hypertension. Now my life is being
made a misery by the statisticians with whom I work. They keep on giving me
conflicting advice.
There are three in
particular with whom I deal regularly. Last year I consulted them on the best
way to summarise the change from baseline results (baseline minus current) in
blood pressure for a series if open trials which we had planned for various potential
products. The first statistician I consulted,
All this conflicting advice
left me very confused. I then tried out various plans. First I tried a two-stage
plan, plan B. (I haven't spent all these years in drug development. I know by
now that the other plan is usually best.). Plan B was as follows. First I
established the value that Norman, Nick and Robin recommended. Secondly I used
the mode of the three. After a while, however, I noticed that I always ended up
using Robin's estimate. This seemed rather unsatisfactory so I consulted the
three again. Without telling them the reason for my asking, I told them that I
had three possible estimates of a treatment effect and wanted to combine them.
What should I do.
Robin told me that I should
always take the highest of the three estimates, whereas Nick said that I should
always take the median and
I now resolved on Plan A, a
three-stage procedure. Stage 1: calculate the three statistics that each
originally recommended. Stage 2 combine these three using the three rules
suggested by Norman, Nick and Robin. (I now noticed that by the end of stage 2
Robin's rule always produced the same answer as Nick's and although Norman's
second stage answer hardly ever agreed with Robin's and Nick's his first stage
answer often agreed with their second stage result.) Stage 3. I then agreed to
use the answer that Robin's method and Nick's approach agreed on.
I then came upon an article
which suggested that it was always important to establish the asymptotic
properties of any statistic and so decided to implement an infinite stage
estimation procedure (Plan C) using all three rules. Much to my relief I found
that as the iterations increased, Norman's estimate converged on Nick's and
Robin's and indeed, this limit was simply the first stage (and every
subsequent) stage answer of Robin. Thus, having investigated the thoroughly I
carried on with a clear conscience always using whichever of mean and median
turned out higher.
Unfortunately, some months
later I happened to mention what I was now doing to Norman and Nick. They were
both horrified. They said that although they hardly ever agreed with each
other, they both agreed that what Robin was doing was clearly biased and quite
unacceptable. They advised me that I must make sure that I NEVER used Robin's
statistic. This left me with no choice but to fall back on plan Z. I now
calculated Nick's,
Can you help?
Yours sincerely
Perplexed.
Dear Perplexed,
What a cock and bull
story. I don't believe a word of it. I don't even believe that you are a
physician (although it is true that you appear to like open trials and change
from baseline). You are clearly one of these damn Bayesian troublemakers and
have nothing better to do than to embarrass ageing frequentists by asking
awkward questions. And, by the way, when did you ever find two physicians who
agreed on a diagnosis?
G McP
In my time I have been
accused (even by myself) of being a frequentist but the accusation is the result
of a misunderstanding. People have assumed that they can identify what I am for
on the basis of what I am against. In fact, as a matter of principle, I am more
or less against everything, so although I am a non-Bayesian, I am also a
non-frequentist. The practicalities of life, however, force me to sail under
various flags of convenience from time to time but to ask, "what is
McPearson's philosophy of statistics?" is to make the same mistake as to
ask, "what was the religion of the Vicar of Bray?". The answer to the
latter question is, of course, that although he professed many, in truth he
owned none. (The aforementioned Vicar of Bray is not be confused with that
minister of Tunbridge Wells, Thomas Bayes, a man who, as far as I can tell, was
true to an old faith and quite unaware that he was founding a new religion.)
I had assumed until
recently, however, that I would never actually be required to do any Bayesian
statistics and could continue to earn my living producing frequentist fables. I
was aware, of course, of the DeFinetti-Lindley limit ("we shall all be
Bayesians bu the year 2020" Theory of Probability 1 p ix) but had
assumed that, since I shall (in all probability) either be retired or dead by
then, it was of no practical consequence for your truly. Recently, however, I
was idly looking through my copy of De Finetti when I came across the
following: " A probabilistic explanation of the diffusion of heat must
take into account the fact that heat could accidentally move from a cold body
to a warmer one..water being frozen rather than boiled when put on the
stove." (ibid p214)
I suddenly realised, with a
jolt, that De Finetti (or Lindley for him) had implicitly put a probability of
1 on the 2020 prediction despite, in principle, believing that anything was
possible (even heating water to make ice). Now since the probability of the
remaining unconverted statisticians in the world (even if there is only one)
being converted exactly at midnight on 31 December, 2019 must be infinitesimal,
the only coherent conclusion possible is that De Finetti believed almost
surely that the conversion would be completed before 2020. Perhaps his
median prevision date was, in fact, very much earlier: in which case I might be
in danger of being converted before I retire.
Worried by this threat to
the McPearson philosophical inertia I started on one of those introspections of
internal coherence so beloved of the modern Bayesian. I soon came to the
disturbing conclusion that I myself was exhibiting symptoms of incipient
Bayesianism: one of my legs was decidedly Bayesian and as for my posterior...
This opened up the alarming prospect that at some stage in the future, when
asked my advice on analysis, I might find myself suggesting a Bayesian approach
with no prospect of actually being able to carry it out. (It is true
that until recently this was exactly the position in which every Bayesian found
himself but now I gather that an application of the long-run frequency
properties of random numbers, which would have made De Finetti shudder, has via
Gibbs sampling, solved all the problems.)
I immediately resolved,
therefore, on a stringent and rigorous prgramme of education in practical
Bayesianism: I would try an analysis of a simple problem and overcome in De
Finetti's phrase (at least as translated by an English word-Smith) my,
"reluctance to abandon the inveterate tendency of savages to objectivise
and mythologise everything," (ibid p22). (Not to be confused with the,
"inveterate tendency of Savage's to subjectivise and psychologise
everything".)
In Guernsey Has a Go:
Part II I shall probably tell you how I got on.
In part I, I told you my
reasons for taking the momentous decision to undertake a Bayesian analysis. I
decided to choose something simple and to try and previse, as the Bayesians
would put it, the outcome of the 6th toss of a coin having tossed it
5 times.
Now I dimly remembered this
sort of thing from my undergraduate days and, if I recall correctly, the trick
was to assume a prior distribution for the probability, q, of obtaining a head and then to
update this using Bayes theorem. I am not quite sure how the Bayesian
conjugates the verb to previse but in this case he does it with a beta.
However, I had been much
impressed by an article I had read in which a prominent Bayesian had taken
frequentists to task for carrying out all their analyses in a sort of
"Greek hinterland". (I quote from memory.) What with
Now having no reason to
otherwise, I decided to assign each of the 64 sequences a prior probability of
1/64 of occurring. Now, of course, You may think otherwise but that is Your business
and not My concern. (I, as a Bayesian, have a tendency to capitalise pronouns
but I don't care what You think. Strictly speaking, as a new convert to
subjectivist philosophy, I don't even care whether you are a Bayesian. In fact
it is a bit of mystery as to why we Bayesians want to convert anybody. But then
"We" is in any case a meaningless concept. There is only I and I
don't care whether this digression has confused You.) I then set about
acquiring some experience with the coin. Now as De Finetti (vol 1 p141) points
out, "experience, since experience is nothing more than the acquisition of
further information - acts always and only in the way we have just described: suppressing
the alternatives that turn out to be no longer possible..." (His
italics)
Now of the 64 sequences, 32
end in a head. Therefore, before tossing the coin my prevision of the 6th
toss was 32/64. I tossed the coin once and it came up heads. I thus immediately
suppressed 32 alternative sequences beginning with a tail (which clearly hadn't
occurred) leaving 32 beginning with a head of which 16 ended with a head. Thus
my prevision for the 6th toss was now 16/32. (Of course, for a
single toss the number of heads can only be 0 or 1 but THINK prevision is not
prediction anymore than perversion is predilection.) I then tossed the coin and
it came up heads. This immediately eliminated 16 sequences, leaving 16
beginning with 2 heads, 8 of which ended in a head. My prevision of the 6th
toss was thus 8/16. I carried on like this, obtaining a head on each of the
next three goes and amending my prevision to 4/8, 2/4 and 1/2 which is where I
then was after the 5th toss having obtained 5 heads in a row.
Now this was not very
encouraging. I didn't seem to be learning anything and yet the Bayesian approach,
as all WE Bayesians know, provides the perfect solution to every
problem. I couldn't see where I had gone wrong. It is true that as I started
thinking about the problem, form time to time, My thoughts led me down byways
which seemed helpful but I soon perceived that these were heretical, involving,
as they did, meaningless speculations about the propensities of different coins
and, heaven forbid, Greek hinterlands. On the other hand, I think My behaviour
can be shown to have been perfectly coherent and as We Bayesians know, that is
all that matters.
Next issue: How I went in
search of the birthplace of Alexander the Great and met the famous young lady
of
*There was a young
lady of Thrace etc.
Shortly before the
nulliguernsian hiatus, so cruelly imposed by editorial policy upon the readers
of this journal, I described my first primitive attempts at applying Bayesian
analysis to a problem: that of tossing a coin. "What has
I was called in to help
design a clinical trial by Dr Percy Vere, that well-known trialist. Of course I
decided to start with the elicitation of priors and was astonished with the
ease with which Dr Vere provided me with the necessary information. With
benefit of hindsight I should have been highly suspicious but I innocently
proceeded with the work. Some many months later, when the trial was over, I was
recalled to analyse the data. This was, of course, simply a matter of using the
likelihood and the prior to calculate a posterior and provide my client with
the result. This I duly did. So far so simple; so far so obvious. It was here,
however, that events took an unexpected turn.
"Very nice", said
Dr Vere, "can you please now perform a meta-analysis using the data from
my previous trial?" This flummoxed me, as I had mistakenly assumed that
the trial I had worked on was the first trial in this area, but thinking
quickly, I realised there was no particular problem. "That's
unnecessary," I said, "because the prior with which you furnished me
obviously took account of the results from the previous trial. Hence the
posterior I have given you is the meta-analysis." This flummoxed him
but I was not to be let off so lightly. "How can that be?," he
replied, "Are you telling me that the prior with which I provided you was
a valid summary of the results of the previous trial? If that were the case I
would obviously be a natural statistical genius and wouldn't need you at all. I
am surprised that you have the nerve to charge for your work. I can assure you,
however, that what I gave you is a genuine prior and had nothing to do with the
results from the previous trial."
Some of these remarks were
rather surprising, if not downright peculiar, in particular the unjustified one
about emolument, as my services were, in fact, being provided free of charge to
Dr Vere, courtesy of my employers Pannostrum Pharmaceuticals. Nevertheless, at
this point I began to appreciate that perhaps a rather fuller investigation of
the problem was needed, as my client was not, in fact, coherent (as we
Bayesians put it). I asked after the data from the previous trial and was
informed that there was a statistical report available with an analysis by
Professor Smith and his assistant. Now, I happen to know Smith and know him for
a Bayesian. (No, he's not that Smith, nor that one, nor either of the other
two.) It thus seemed to me highly likely that by picking up the report I should
find the prior for the previous trial available, to which I then only needed to
add the data from both trials. Alternatively, and perhaps even simpler, if the
posterior were available, I could use that as my starting prior for the data
from the latest trial. As it transpired, both were available and to my
astonishment I discovered that the prior Dr Vere had given me was the same that
he had given Smith. When I pointed this out to him he made unjustified
sarcastic observations about statisticians expecting different answers to the
same question. Waste not want not was his philosophy. He had assumed
that what was good enough as a prior for Smith should be good enough for
McPearson.
I realised that we had been
at cross-purposes all the while but that the situation was rescuable. Adding
the data from "my" trial to the under-Professor
Smith's-supervision-calculated- posterior (to use a rather Teutonic
construction) gave exactly the same result as adding the data from both trials
to the "prior". We were home and dry.....or so I thought.
Dr Vere was delighted.
"Excellent," he said, "can you also please include now the first
trial in this series of three? The statistician consulting that year was
Professor Jones. I can provide you with his assistant's report." Now I
realised that we were in deep trouble. I also know Jones and know him for a
frequentist. (No he's not that Jones, nor the other one.) It seemed to me
highly unlikely that his report would contain a posterior, let alone a prior
and so it turned out to be: data, descriptive statistics, any number of tedious
laboratory shift tables, point estimates, confidence intervals and P-values
(ugh!) but nary a posterior distribution in sight.
Now I was faced with a real
dilemma. I could take the data from the three trials and add them to the prior
for the second. (It may be that Vere tried to ignore data from all trials when
determining his prior.) The danger in doing this would be if the prior for the
second were, in fact, a posterior to the first. In that case I would be
counting data from the first trial twice, a clearly inadmissible procedure. On
the other hand I could ignore the data from the first trial altogether. (After
all, the first time that Vere provided a prior he may have tried to accurately
express his beliefs but then subsequently acted under the erroneous belief that
this prior would do for every problem.) If, however, these data were not
reflected in his prior then I should be ignoring relevant information and thus
violating the principle of total information: a very serious Bayesian crime.
It was then, in a flash of
inspiration, that I found salvation. As any student of De Finetti will know,
everything is soluble given a wide enough resort to the device of specifying
priors. (Puzzled as to which model to use? Just introduce a meta-model with
priors over the class of models.) Faced with an uncertainty about the possible
prior beliefs of my client, all I needed to do was to introduce my
beliefs regarding his beliefs into the model. It is true that a sort of
hybrid creature arises, a chimeric* posterior (my best bet about what his
best bet ought to be), but who cares.
Thus liberated, all I had
to do was introduce a prior probability of 1 that Vere was a complete idiot.
(It's my prior and I can do what I like with it; I might hesitate to make a
similar remark about my posterior.) This freed me to use an uninformative prior
for the whole thing (after all, what do I know about medicine?), calculate a
frequentist confidence interval using all three trials, palm if off on my
client as a Bayesian credible interval and retire to the Cock and Bull for a
well earned pint.
Cheers!
* Chimeric from chimera,
an improbable creature with the prior of a lion, the likelihood of a goat and
the posterior of a snake.
(See also Statistical Issues in Drug
Development)
In my career as a medical
statistician in drug development I never found anything quite as effective in
winning disputes as the Finally Decisive Argument. For the benefit of readers
of this journal, I illustrate its force with the example of that old chestnut,
not to say canard or red herring (food for thought? appropriate for menu-driven
drug development programmes?): type II and type III sums of squares.
The following is a Socratic
dialogue between two statisticians, one of whom is of the McPearson school of
statistics and one who is not.
Secundus : I see, Tertius,
that you have weighted all centres equally in your estimate of the treatment
effect. Why is that?
Tertius: It is because, Secundus, any other weighting would be entirely
arbitrary.
Secundus: This does indeed appear to be an excellent reason Tertius. However, I
am puzzled to understand one thing, and that is on what basis you chose the
centres in your trial?
Tertius: Oh that is quite simple, Secundus. All the physicians concerned have
good reputations and promised to deliver an adequate number of patients.
Secundus: These are indeed excellent reasons, oh Tertius. However, I cannot
help noting that, although some physicians have indeed provided many patients,
some seemed to have delivered very few patients at all.
Tertius: Indeed, some of the physicians have disappointed me, but when running
trials in future I will not use them.
Secundus: A very wise precaution, Tertius, but it seems to imply that provided
centres perform well, you do not mind which centres are in the trial.
Tertius: This is indeed true Secundus, the main thing is to have enough high
quality data.
Secundus: I see. So that provided only that the centres delivered enough
patients in total you would be indifferent as to whether the trial was based on
say centres 1,3 and 7, or on centres 4, 5 and 8, or on centre 1,2, 8 and 9 or
indeed on any set of centres.
Tertius: That is indeed so, Secundus.
Secundus: And suppose for argument's sake that centre 3 could give you all the
patients you needed would you use it alone?
Tertius: (Smiling) Indeed I would Secundus. This would make life much simpler.
Unfortunately, clinical trials don't usually work like that.
Secundus. What a pity. And if centre 4 could give you all the patients you
needed would you be happy to use that?
Tertius: Of course, Secundus. The centre is unimportant.
Secundus: But this implies Tertius that you are quite happy to base your
treatment estimate on centre 3 alone, if only it has enough patients and on
centre 4 alone, if only it has enough patients and indeed on any centre at all,
provided it has enough patients.
Tertius: (Impatiently.) Quite so. This is obvious.
Secundus: But then, your only preference amongst centres is based on the
precision of the information which they provide, not on any peculiar feature of
any given centre and, since you are otherwise indifferent between them, I fail
to understand why you insist on weighting them equally and in an inefficient
manner.
Tertius: I begin to understand your point, but what is the alternative?
Secundus: The alternative is to weight the centres in such a way that the
precision of the treatment estimate is as high as possible.
Tertius: But does that not correspond to the Type II philosophy?
Secundus: It does indeed.
Tertius: Then I am sorry, Secundus, but you have been wasting my time. That is
a dangerous heresy.
Secundus: Why so?
Tertius: Because the Finally Decisive Argument says so.
Secundus: In that case I do indeed apologise for having wasted your time,
Tertius. The Finally Decisive Argument is transcendental in nature and cannot
be defeated by mere logic.
Next issue. The Finally
Decisive Argument is used to prove that drug development is just like skiing . If you want to
succeed, you must stick to parallels and avoid the cross-overs .
How on earth did we ever manage
without guidelines and standard operating procedures I wonder? It makes me
blush when I look back and consider those innocent days at Pannostrum
Pharmaceuticals, before we had the benefit of the Erewhon Statistical
Guidelines. How did we survive, for example, before we knew that it was
essential to keep a screened-patient log?
In those days, we thought
the important thing about clinical trials was that you should report the
results and characteristics of those patients you had actually experimented upon.
It is true that we had already progressed from that pre-historic naivet when
we had thought that experimentation was defined by treatment, to a state of
relative maturity where we realised that it was defined by randomisation (see
McPearson, G., 'Early Days at Pannostrum: from "Per Protocol" to
"Intention to Treat"', Journal of Statistical Whimsy, 6,
113-124) but it never occurred to us, I am ashamed to say, to record the
patients we hadn't even included in the trial. What a blessing, therefore, that
the Erewhon Guidelines have arrived in the nick of time to inform us that
unless we record the demographic characteristics of the patients we didn't
include we don't know how to generalise our results.
Suppose, for example, you,
as a doctor, wish to know whether a particular beta-agonist will be of any use
in treating a given severely asthmatic patient. This is the age of evidence
based medicine so, of course, you do extensive background research on the
marketed product. You discover, however, that all the trials which established
its efficacy were run in either moderate or mild asthmatics. Clearly, then, you
cannot generalise these results to your patient. However, on reading further
you note that no severe asthmatics were deliberately excluded from the trials.
They would certainly have been included, if only there had been any in the
practices in which the trials were run, but as it turns out there weren't
(rather as in the Flanders and Swann song about the rhinoceros who would use
his horn for taking stones out of a horse's hoof if only he ever met such a
quadruped thus distressed). The fact that such patients would not have been
excluded makes all the difference, of course, and means that you can generalise
the results with confidence.
It is not, however, the
excellent inferential logic behind this requirement which I wish to praise, but
the practical implications. The golden age of medical statistics is upon us.
For how are we to decide whether we have screened a patient or not? We have to
be extremely careful not to be narrow and arbitrary. Supposing a doctor is in
the habit of taking all sorts of measurements on his patients which might be
required for entry onto a clinical trial. He can see at a glance by looking at
his case notes whether he can enter the patients in the trial or not. Is this
not a screening? Suppose that the clinical research associates are in the habit
of asking the doctors at various potential centres whether they have enough
suitable patients with a view to excluding those centres who don't. Is this not
also a screening? If we decide on economic or practical grounds to run the
trial in some countries but not in others, is this not too a screening? If we
run the trial today, rather than yesterday or tomorrow, have we not also indulged
in screening?
The more I have thought
through the implications of all this the more excited I have become.
"Inspired", I think is the word. Indeed, I am now prepared to share
with you McPearson's law of screening which goes:
Every patient who is not
in your trial has been screened out of it.
And by every I mean every:
not just the patients in the centres you chose who weren't included but also
those in the centres you didn't include (physician-screening) as well as those
in the countries you never considered (international screening) and in the eras
you didn't study (temporal screening) and of course who refused consent
(auto-screening). Furthermore, why distinguish between actual and potential
patients? We always screen out those who aren't yet ill (well nearly always)
from our clinical trials (health-status screening).
Now, I agree, that once the
implications of McPearson's law of screening come to be appreciated,
application of the Erewhon Statistical Guidelines is going to become rather
difficult: several millions if not billions of patients will have to have their
demographic characteristics presented for every clinical trial. But can this be
bad for statisticians? Not at all. It all means more work and more work means
more money. And of course, once the implications sink in, it also means that
one will have to accept that no results can ever be generalised at all. (This
does, it has to be confessed, undermine the inferential value of this device,
but so what.) But this simply implies that all products will have to be
permanently on clinical trials and this again means more work for
statisticians. Hence, I confidently prophesy that the golden age of medical
statistics is dawning. I may even have to consider changing my name to Guineas
McPearson.
Of course, if one accepts
that the logic of clinical trials is comparative and not representative, and if
one believes that in any case generalisation has to do with that which one has
specifically studied and that to which one wishes to generalise, then a log of
patients screened is a complete irrelevancy. Nobody, however, could seriously
maintain this position.
Could they?
Next issue: How I meet
Screening Lord Sutch and join the Monster Raving Loony Party.
It stands to reason, of
course, that the treatment effect in a multi-centre trial must be the
straightforward arithmetic average of the treatment effects from each centre.
Anything else would be abhorrent and illogical and to be eschewed by all right-
thinking persons working in drug development (as well as by all those right
thinking persons in drug development who aren't working and, believe me, there
are some of those too). There are two excellent reasons as to why the treatment
effect must be defined in this way, the second of which, is even more excellent
than the first. 1) The expected value of such an estimator does not depend on
the number of patients you happen to have recruited to each centre. 2) A
certain prominent regulatory authority requires it.
The reasons why the second
of these arguments is the more excellent of the two are also twofold. 1) The
first argument is only correct if you happen to condition on the centres you
actually recruited. If you consider all the centres you might have recruited
but didn't then the expectation does depend rather intimately on the
number of patients recruited. On the other hand it only depends on whether a
given centre contributed 0 or 1 patients on the one hand or 2 or more patients
on the other (no treatment estimate possible unless you have at least one
patient on each treatment) rather than, say, exactly how many patients it
contributed. This means, of course, that although it is not a perfectly
excellent argument it is nevertheless an excellent argument. 2) The second
argument is, however, a perfectly excellent argument. I know this
because I have never observed anybody prevail against it, whatever the context,
in all the years I have heard it used at Pannostrum Pharmaceuticals and
elsewhere in the industry. For this reason I refer to it as the Finally
Decisive Argument.
However, observing the
highest standards sometimes brings its penalties. I learned this lesson when I
first started my work at Pannostrum Pharmaceutical's Elixir Laboratories. I was
set to work on a project developing enteric coated suppositories for dysentery
(Strombolite ), in which the trials had been plagued by drop-outs. The medical
advisor on this project, Dr Durchfall, had developed a continuous outcome
measure whose details need not concern us, except that you have my complete
assurance that it had, of course, been completely validated. Now, TROT7
was a four centre trial of two treatments which planned to recruit 36 patients
per centre. One of the centres just didn't perform at all, however, so you can
imagine my relief when I discovered that the three other centres had recruited
52 patients each. The trial as originally planned would have had variance
proportional to 4(1/36+1/36)/16 = 0.014 but now had variance proportional to
3(1/52 + 1/52)/9 = 0.013, which was slightly better even then planned. I broke
the news of these calculations to Dr Durchfall.
"Great news,
Guernsey my boy und now haf I got good news for you. Centre 4 haz recruited 8
patients avder all. A liddle late und not much but zen it is bedder zan
nozing."
"But no, Dr
Durchfall," I replied, "It is worse than nothing. Now we are
really in deep ... I mean in trouble." I whipped out my pocket calculator
(these were in the early days of these devices when they actually had no more
keys than you knew how to use). "You see," I said "the variance
is now proportional to:
(Well I didn't actually say
all those fractions but you get the gist.) "Those extra eight patients
have actually increased the variance by 70%."
"Oh no,"
said Dr Durchfall, "Zis means ve haf an underpowered trial. Ze beta
vill be too high. Bud surely Guernsey zere is some mistake, how can more be
vorse than less?"
"Well it
wouldn't be," I said, "if we didn't have to weight the centres
equally. But I have talked to (or, as they would say, with) our
statisticians in the other place and they assure me that this a case where the
Finally Decisive Argument applies."
"Vell, in zat
case ve haf no choice," said Dr Durchfall making the sign against the evil
eye.
However, thinking
about it later I was able to come up with a new way of looking at the data
which was extremely helpful. I put it to Dr Durchfall like this.
"You know how we
always randomise in blocks of 4 in all of our two group parallel trials
(although, of course, we never say so in the protocols because we don't want
the investigator to guess the block size). Well of course, it is also generally
accepted that "as was the randomisation so is the analysis". It seems
to me that we should include the block in the model. A further argument is that
one reason we don't use historical controls, even at the same centre, is that
we know that recruitment is subject to time trends. Clearly patients could
differ from block to block. Furthermore, centre 1 is a two-consultant centre we
could have declared it as two centres if we wanted to. Perhaps we should really
regard the blocks as pseudo-centres. It is also the case that if we remove the
block effect from the model we are removing the centre effect since blocks are
confounded with centres. Now it stands to reason I think, with all these
arguments about the importance of blocks we should weight them equally. It
would be absurd if we didn't. And by doing this we shall be able to claim we
are treating the possibility of differences between centres very seriously
indeed since not only are we allowing for differences between them but for
differences within them."
Thus was the policy
of Good Mixed Centre Practice (GMCP) introduced to Pannostrum Pharmaceuticals.
Did it work? Yes, indeed. The efficiency of the trial was miraculously restored
and the cure I had proposed worked a charm. I wish I could say the same for
Strombolite but I can't say its poor efficacy was entirely a surprise.
From the start I'd
had a gut feeling to that effect.
Apparently a certain
pharmaceutical company has now instituted a policy that no trial must have more
than 80% power. Now read on...........
What a pleasure to see that
that old custom has been revived of offering a libation to the gods: of making
sure that part of every good thing is burnt up as an offering. Assuredly this
must be a way of attracting luck and good fortune and, more important, of
averting the calamity and disaster which follows hard upon the heels of success
. The Greeks understood the necessity of this sort of thing well. Consider the
story of Polycrates the tyrant of Samos. (That isle which is famed to PSI
members, not only as being the birthplace of Pythagoras and Aristarchus but
also a favourite holiday destination of all those bright young CRAs.)
Polycrates was so fortunate in everything, that Amasis the King of Egypt
advised him to avert disaster by parting with something dear to him. Polycrates
took his advice and threw a prized ring into the sea but a few days later found
it again in the stomach of a fish that had been served to him. It was clear to
all now that he was doomed and sure enough not long afterwards he got at cross
purposes with Oroetes who crucified him.
So there it is, drug
development is a tricky business. Some use a rabbit's foot to bring luck, some
throw away 36% of all just acceptable compounds. Hold on. Where did the 36%
come from? Well don't forget the two trials rule. You have got to have
significance twice. So if you use 80% power and the treatment under
investigation just has the clinically relevant effect (and you have done your
planning correctly) and you run two trials, the probability that both will be
successful is 0.8 x 0.8 = 0.64. Hence the probability that at least one is
unsuccessful is 1 - 0.64 = 0.36. (I apologise to the PSI membership and
associate membership for going through the glaringly obvious. It is not to you
that these calculations are addressed but to any non-statistician, say a
manager, who might read these remarks.)
Now of course, there is an
ethical argument in favour of not having too much power. For serious diseases
it may be unethical, as it may be unacceptable for the patients to continue to
be randomised to a treatment which is known to be inferior, although I
sometimes wonder if this problem doesn't really require handling in a different
way altogether and in any case, nobody develops drugs for serious illnesses:
there is, after all, no money in that. Then, again, it is well known to
Bayesians that there comes a point when it is not worth increasing the power of
the test unless you also reduce the size of the test but this, of course, would
reduce the proportion which the regulator sacrifices to the gods and regulators
also need their rabbit's foot. (How else can we explain baseline testing?)
However, I think that this
sacrifice business doesn't go far enough. The ancients understood this: food,
possessions, animals are all very well but to really avert bad luck there is
nothing like sacrificing people. Therefore, I have a modest proposal to make.
That is that any manager who proposes that no trial should have more than 80%
power should go without his annual bonus whenever one of the two pivotal trials
in a drug development programme is significant and another is not significant.
This should happen to 32% = (0.8 x 0.2 + 0.2 x 0.8) of all drugs having a
treatment effect equal to the clinically relevant difference. Of course, we
probably need to scale this by the number of compounds in development. Say that
there are k due to report in a given year. The manager could lose 1/k times his
bonus for every time this happens. And again perhaps he should have his
baseline annual bonus slightly increased, say by 9.5% = 2(0.05 x 0.95) to
account for the occasions where a useless drug produces the phenomenon. We
should be able to save some money on managers' salaries using this scheme. And
after all, in this era in which directors' pay rises faster than profits, every
little bit helps.
This reminds me that a
cynical spy informs me that I have it all wrong. The real reason for demanding
that no trial should have more than 80% power is not to placate the fates but
to save money. I dismiss this as a vile and vicious rumour. No person having
achieved a position of prominence in drug development could seriously believe
that insisting that no trial have more than 80% power was a rational policy.
They would surely know that every case has to be reviewed on its merits and
that only hard calculation and careful thought will indicate the correct course
of action. After all it is unthinkable that a multi-billion pound business
should have its fate determined by a slogan and a formula misunderstood and
misapplied by rote.
Isn't it?
I am relying on memory
here, but I seem to recall a Roald Dahl story with this name, which followed
the (mis)fortunes of a transatlantic passenger who made a large and unwise bet
on the ship's crossing time and then made an even more unwise attempt to
influence it. (It then turned into a dark tale of survival analysis and
censored observations.) But more relevant to my theme is, in fact, a quotation
from one of my favourite books, The Phantom Tollbooth. It occurs in that
delightful chapter 'Unfortunate Conclusions' in which, you may recall, Milo and
his companions, the watchdog, Tock and the Humbug, having jumped to
Conclusions, an island in the Sea of Knowledge, find they have to swim back.
Tock and Milo emerge drenched but not so the bug, for 'you can swim all day in
the Sea of Knowledge and still come out completely dry. Most people do'.
Well, it seems to me that
with the explosion in journals and databases, not to mention (but of course I
will) the so called 'World Wide Web', we are currently swimming in data. This
brings me to my theme. Which sided of the great divide are you on? Do you
believe that meta is better or do you hold instead that pooling is fooling?
Well, to nail my colours to the mast, I belong to the former school. It seems to
me that there is no other topic in medical statistics, with the possible
exceptions of cross-over trials, bioequivalence and n-of-1 studies, which has
the same capacity as this one to rot the brains. Every time that another
meta-analysis gets published in a medical journal, the editor feels it behoves
him to commission some idiot to write a sanctimonious guest editorial or
discussion piece which jaws on about publication bias, the dangers of pooling
different studies, the benefit of judgement compared to calculation, or the
importance of stratifying studies by baseline risk and so forth. (This latter
is a vile habit I can scarcely bring myself to contemplate.)
For example, it never seems
to occur to such persons that the publication bias of a meta-analysis arises
through the bias in selecting the individual studies. It is hardly possible,
therefore, to ascribe to the whole a sin which is not shared by the part. (Of
course, for the pharmaceutical industry, as we know, the 'file drawer' is
always empty, so that this is not a problem, is it?) In fact, nearly all the
problems of meta-analysis are difficulties of individual trials too. For
example, a study of meta-analyses showed that for various indications, the
results of the largest trials were rather imperfectly predicted by the
meta-analyses of the rest. This was quite enough to have several pundits
mouthing off about inherent difficulties of pooling and so forth. What nobody
seemed to want to do was ask how well the second largest trial on its own would
have done as a predictor of the largest. But I am reminded of what WC Fields
famously replied to the reporter who enquired what it was like growing old:
"It's better than the alternative."
Then what about this business
of not pooling different studies? What makes them different? The protocols, the
populations? If we can't pool them, what specific feature of the trials do we
use in coming to conclusions? It seems to me that people who object to pooling
different studies but would quite happily accept any one of them on its own, if
only it were large enough, for the purpose of informing medical decision
making, should be given thinking lessons. Furthermore, the standard of
information we require for individual published trial reports is, if this is
true, grossly inadequate. If we really feel that we can make use of those
special particular individual features of a given trial for deciding what to
prescribe for future patients in different clinics, in different years in
different continents, then we really ought to make a much better stab at
describing these trials.
This is not to say, of
course, that I like these meta-analysts. Far from it. On the whole they seem to
me to be a repulsively charmless and messianic lot. (Not quite as repulsive as
the pharmacoeconomists, it is true. Which reminds me that I have not yet given
you a definition of this individual so now I will. Pharmaceoeconomist: one who
when evaluating a treatment for dysentry, enquires after the price of toilet
paper.) On the other hand, we should at least be grateful for one thing, the
meta-analyst will, I hope, finally see off that even more odious and overpaid
individual the medical expert as used in the hilarious expert reports which
used to grace European submissions. Which reminds me, I haven't given you my
definition of a medical expert either: one who sums up without bothering to add
(except, of course, his fee).
Negatives have their
commercial uses. For example, there is a cunning German advert for a lottery.
Two individuals discuss a third. I translate. 'You mean to say that he has not
bought a lottery ticket? How ridiculous. Then he has absolutely no chance of
winning x million marks.' This is true, of course, but somewhat beside the
point. Now I recently had cause to travel the London Underground where the
following advertisement caught my eye. 'Nothing is proven to work better than
hedakegone'. Of course, being the sort of sarcastic character I am, my
skull-cinema, to use a phrase beloved of the late John Hillaby which I believe
he adopted from the even later JB Priestley(or do I mean earlier), immediately
played a scene in which GMcP rings up the person who wrote this inspired piece
of advertising equivocation and says, 'Since nothing is proven to work better
than hedakegone, I presume that I am better off taking nothing'. This, however,
is not what the advert is meant to convey. You are meant to think, I imagine,
that hedakegone is better than most alternatives and at least as good as
anything else. At this point we can all permit ourselves a wry smile, since, as
sophisticated, mature and intelligent statisticians (almost a tautology) we all
know that equivalence cannot be claimed by default, but has to be proven. After
all, in a one-gate slalom I might be able to ski as well as Tomba. (And in a
one-gate slalom our Editor might even be able to ski as well as me, but I
digress.)
Let us not be too smug,
however. I recently read a paper on bio-equivalence trials which, despite much mathematical
brilliance, was such statistical nonsense that it must have sent my blood
pressure up 30 points. (I don't know why, but there is something about trials
in which the same individual is treated more than once which encourages
'statisticians' to write nonsense. But I mustn't cross-over from the topic in
hand.) It was shown how, at the expense of some mathematical manipulation, a
test could be produced which had superior properties to the two one-sided tests
at the 5% level which are now commonly used. It started from the observation
that the two one-sided approach has an overall 'size' (type one error rate) of
less than 5%. For very small sample sizes it can be much less than 5% and if
the sample is small enough it can even be zero. (The two one-sided tests
correspond to requiring that conventional 90% limits are between limits of
equivalence. If the standard error is high enough, this can actually be
impossible, since the confidence interval can be wider than the limit of
equivalence. Hence, under such circumstances, the conventional test has zero
size.)
Now I know that I have made
the odd sarcastic jibe at Bayesians but this 'improved' procedure strikes me as
frequentism gone mad. You and I might think that there are occasions where you
might just accept the fact that your type I error rate is going to be less than
5%, but some Neyman-Pearson types just can't abide the thought of it. If the
size is less than 0.05 it means that they have room to manoeuvre and can change
the test to get more power. Having a type one error rate of say 3% is just
going to keep them awake at night worrying about that lost power. (Which
reminds me of a joke. Did you hear the one about the statistician who
complained about his salary slip, which showed that he had only been paid for
one day in twenty? His boss replied that this was the going rate for doing
nothing.) So N-P addicts will 'improve' their equivalence procedure to recover
the power.
Of course, as you get
smaller and smaller sample sizes, the procedure is more and more like rolling
an icosahedral die. (For the benefit of younger PSI members I should explain
that that is a regular solid with twenty triangular faces.) But so what. What
does the frequentist do when stuck for a solution?: (s)he tosses a coin. (And no
doubt there are some Bayesians who think frequentists are a bunch of tossers.)
Rolling a die, tossing a coin: it's all much more interesting than analysing
data, as I am sure you will agree. Furthermore, supposing you know that your
drug is really in-equivalent to the reference. What is your best bet of proving
it is equivalent after all? Why it's simple. Not to collect any data at all.
Just roll that icosahedral die. The type I error rate is 5% so who can
complain? Of course you are likely to fail to prove equivalence but so what. If
you do prove equivalence you can say 'in a most powerful test at the 5% level
Conalol ( was proved to be equivalent to the reference product.' (Don't believe
me? You try finding a data-less procedure with more power than my icosahedral
die.) And that, as we all know, is scientific statistics and hence much more
impressive than the sort of misleading rubbish that advertisers put out on the
London Underground.
STOP PRESS
Pop Charts latest
This Week's Number one: 2 Become 1. Dice Girls.
COMFREY. An herb of Saturn,
and I suppose under the sign of Capricorn, Culpepper.
To be used for what the old goat has sat on, McPearson.
I open my paper to find a
passionate plea from the chairperson of a well-known cosmic consciousness
cosmetics company, The Figure Franchise, which has me in tears. Don't let those
Eurocrats touch our natural remedies. It's outrageous what they are going to
require. Anybody selling comfrey, dog's mercury, rupture wort, Greenland scurvy
grass or the like as medicines will be required to prove their effectiveness
and safety. Outrageous! It is quite inappropriate that standards which apply to
the multi-million, global, (not to mention international), scientist-riddled
and by definition thoroughly evil pharmaceutical industry should apply to
philanthropic not-for-profit corner shops bringing the wisdom of the ages to
technology-blighted customers. After all, you can't argue with the Druids,
Phoenicians, Aztecs and so forth (well I've never found one you can argue with)
and where would the world be without mistletoe, purple and chocolate? And did
you know that if you leave your razor-blade overnight under a cardboard pyramid
it will be as good as new in the morning and that if you want to leave it under
a cardboard icosahedron (see GMcP passim) you are going to have difficulty in
constructing one?
I quite agree that it would
be inappropriate to apply industry standards to alternative medicine. Take
homeopathic medicine, for example. (A case, perhaps, where the alternative is
null.) The more you dilute it the stronger it gets. Think of the drug disposal
problem. Flush the stuff down the sink and it mixes in the sewers just
spreading wider and wider and getting more potent in the process. The fish in
the sea must be as high as hippies at the Glastonbury festival, not to mention
the whales. (Which reminds me that I don't recall that Greenpeace have ever
addressed this problem.) If pharmaceutical industry standard operating
procedures for drug disposal were applied to homeopathic medicines it would
drive them underground. And we can't want that. Kids around the world shooting
up on tincture of arnica and the like. The mind boggles. Mind you, don't let's
knock the homeopathic theory. It's an excuse worth trying if you are ever
caught over the limit. 'Honest, officer, there was less than a drop of tequila
in my margarita. It's just that the barman insisted on shaking it when he mixed
it.'
But there is a problem.
What is a natural remedy? Obviously not acetylsalicyclic acid, digitalis,
reserpine. On the other hand, extract of willow bark, foxglove and rauwolfia
are clearly natural remedies. This just goes to show the importance of names,
something which Guernsey McPearson, for one, would never deny. Hyoscyamine
hydrobromate, or even atropine are clearly dangerous substances in need of drug
regulation but belladonna and deadly nightshade? Why, any sweet shop should be
allowed to sell them. Just think of all the years that were wasted developing
cyclosporin when it could have been sold right away as Scandinavian soil mould.
But as statisticians, we
have to be careful in naming things ourselves. Just look at the mistake that
was made with the bootstrap. It sounds far too friendly. Every Tom, Dick and
Harry is at it. If only it had been called Autologous Replacement Sampling
Estimation, something no-one would dare apply an acronym to (although if they
did, comfrey might have its uses: see above), we could have kept this as the
special preserve of the statistician. After all, we, unlike philanthropic
natural-remedy-sellers with nothing but the public good at heart, must look to
our professional interests.
Stop press.
Aztecs: an apology. Guernsey McPearson would like to apologise to the Aztecs
for any doubts inadvertently cast on their wisdom. Chocolate was a great
discovery.
Mayas: an apology. Guernsey McPearson would like to apologise to the Mayas for
giving the credit for discovering chocolate to the Aztecs. He admits you were
there first but then, hey, that's drug development for you.
That chocolate site in
full: http://hp5.econ.cbs.dk/people/toha96ad/chocolate/
That FDA and complementary medicine site in full: http://cpmcnet.columbia.edu/dept/rosenthal/legal/Fed.html
Pharmacoeconomics: drug development's dismal science.
I believe that it was
Umberto Eco, or it may have been Humbert Humbert, who said, rather wittily,
that The Three Musketeers is really the story of the fourth. I
forget the finer details of Alexandre Dumas p re's novel, it being some 25
years or so since I read it (en fran ais, bien s r!). I do recall, however,
some exciting scenes involving the nymphomaniac ex of Athos (Miladi) and a
sinister executioner: evidently the double entendres of, 'having it off,' made
an impression on my teenage mind. However, I think that by the end of the book
the three musketeers (Athos, Porthos and Aramis: character shorthands for
honour, courage and sensibility) are formally joined by the fourth (d'Artagnan,
a sort of cipher for youthful spirit) so that Umberto Eco is right, indeed. (I
hope that you all appreciate what you get in this column: not just statistics
and drug development but cod literary criticism too.)
Not infrequently, I pass
through airport news-agents in search of serious entertainment for the long
haul (studiously avoiding, of course, the top-shelf magazines). I have not seen
The Three Musketeers in such establishments. What I have seen recently, on more
than one occasion and always prominently displayed, is a work of popular
fiction entitled The Regulators. As you might expect from the title, it
is a horror story. We of the PSI, of course, are well acquainted with The
Regulator and his three musketeers: quality, efficacy and safety (Ethos, Pathos
and I'm a risk) but it now seems that many feel it is time that the fourth
('debt and gain') joined the fray. In fact, opening the latest copy of the APE
( The Albion Physician's Enquirer) what do I find but a clarion call from
an eminent professor of health economics that all drugs should prove value for
money before being registered? What an excellent suggestion! It must surely be
welcomed by health economists up and down the country! For it is quite clear
that requiring value for money will require money for valuation and who shall
we call to perform this task? Why, the health economists. And of course, if a
certain university established a monopoly in such evaluations what better motto
to adopt than, 'all for one and one for all'. (By the by. What is the
difference between a health economist and an economist? You couldn't give an F?
Quite right. That is all it takes to turn utility into futility.)
But don't let us be
hypocritical about this. For, have we statisticians not mounted a most
successful campaign ourselves? It started with the introduction of integrated
reports. (The old system was that The Regulator just got to see the physician's
report: a curious mixture of plagiarisms from the statistician's report and
extravagant and false inventions.) Then we got the CPMP guidelines to require
the participation of a qualified statistician in all stages of the clinical
trial and, of course, the APE and its rival The Speculum, now
have statistical review of their papers. Furthermore, it seems that even the
good old medical expert's days are numbered. He has been replaced by the
statistician's meta analysis. So let us not begrudge another profession's
attempts to make itself indispensable. But the question I ask you dear reader
is this: can we have too much of a good thing? After all, it is but a bootstrap
from feathering nests to festering nethers: a fate too dire to contemplate. I
imagine the scene in the year 2010.
Dear Candidate,
We are unable to grant you
registration with the General Medical Council. It is true that you diligently
attended all your lectures. (Is this a work of fiction: Ed?) You also performed
brilliantly in your exams and gave most effective solutions to all the problems
set. Your clinical work has been of the highest standard and the quality is
excellent. Your elective was most impressive. In a hospital setting you showed
commendable tolerability when faced with awkward patients and difficult
colleagues. However, you have failed to satisfy us that you are intending to
spend several years as a junior hospital doctor working 100 hours a week for
minimal reward. There is a strong suspicion that your intention is to enter
Harley Street and charge outrageous fees. This being the case it is clear that
you will not give value for money and hence we cannot accept your registration.
Perish the thought! On the
other hand, I don't know. Have you noticed a tendency in the pharmaceutical
industry for the medically qualified to earn more for the same job and
performance than their fellow scientists? What? It had escaped your attention.
I think you need to call in the health economists!
It has been stated that we
are wrong to give the credit for building a great cathedral or palace to the
architect. It belongs instead to the stonemasons and bricklayers whose labours
raised the edifice. In my opinion, this insight has been rather hastily
attributed to Bertolt Brecht, who is usually given as the author of it. Not
enough credit has been given to the typesetters and printers not to mention
paper-mill workers and lumberjacks of this world for this observation. Of
course, had BB lived to see the age of desk-top publishing we could perhaps
legitimately give him the credit for Mother Courage, The Caucasian Chalk
Circle and Galileo , which he so plainly doesn't deserve.
According to Alfred
Hitchcock, directors should treat actors like cattle. "What's my
motivation in this scene, Mr Hitchcock?" "Just say the lines,"
he would reply. I have always felt that a similar attitude should be adopted
towards trialists. The last thing you want is some jumped up physician with ideas
of his own. "Just follow the protocol." (If only they would!) This at
least is one of the blessings of multi-centre trials. There are so many
physicians involved that you can reasonably use the excuse that if you were to
start changing the protocol at the request of one, there would be so many
others you would have to go back and get approval from that the process would
be impossible. Nowadays, you find that many of the top Hollywood movie stars
won't appear in a film unless they are granted co-scriptwriting rights. Does
anybody seriously believe that the films that are made are any the better for
this? Not the critics. Not the directors. Probably not the general public. Will
this stop this happening? Not at all. Because of something called "box-office
pulling power". There will always be movie marketing men who will think
that if the price of getting the star's name on the cast list is letting that
star hack the script to bits, it is worth paying.
Despite that, however,
Hitchcock's policy paid off in the long run. In the end his name as director
was as great a pull as that of any of the stars in his films. And names can be
important. Take an example. A film of which I am very fond is Bill Forsyth's Gregory's
Girl. (The film has a truly excellent protocol.) In the wake of its success
there were umpteen interviews with the admittedly comely actress who plaid the
eponymous role. What the critics had failed to notice, however, is that the
film was not about Gregory's Girl 1 but about Gregory, and that Gordon John
Sinclair's acting in the lead role was an important part of its success. I
always felt that he was denied an acclaim that was due to him simply on the
basis of the film's title.
Does anybody think that a
pharma-industry trial is the better for having let some prima-donna
"opinion-leader" force his or her embellishments on a protocol which
has been worked and re-worked by the sponsor's clinical trial experts.
Apparently so. Hollywood is not the only industry in thrall to the marketeers.
As far as pharma marketeers are concerned, it is obvious that the product is a
world-beater as regards efficacy, tolerability and quality (especially quality
of life), even before any work has been done on it. The only thing that remains
in doubt is whether the average GP can be brought to see the benefits of this
wonder product. This is where the really difficult part of the drug development
begins. (Outsiders have no idea how hard it is to sell effective remedies to
desperately ill patients who don't have to pay for them.) There is nothing like
having an opinion leader's name on your publication to sell a drug. And if part
of the price (in addition to the fee) of getting that opinion leader to
co-operate is letting him or her hack the protocol to bit, so be it.
Well what, you may say, has
this got to do with Bertolt Brecht? Just this. There are, I am pleased to say,
a considerable number of cases where industry trials get performed without
using "opinion leaders": trials in which the physicians are quite
happy to carry out the protocol as designed without feeling they need to modify
it. This can be a great blessing. After all, the theory of experimental design
was first established in the field (literally) of agriculture and the
difference between agricultural and medical research is that the former is not
performed by farmers2. The more the professional scientists lead the trials the
better. Some of these trials are rather successful. Inevitably some of these
trials make the Press. A new breakthrough having been developed, The Daily
Sensation will inevitably wish to interview the hero in the white coat, the
Dr Kildare, who made the breakthrough. It will hardly want to be fobbed off
with the industry chemists who first synthesised the molecule, nor the team who
did the background investigation of the mechanism of action, still less the
industry physician and statistician who designed the trial. So courtesy perhaps
of the pharma marketing department one of the trialists will be pressed into
service and propelled into the media spot-light as the discoverer/developer of
the new wonder-product.
But we shouldn't gripe. It
is a generalisation of that phenomenon which Stephen Stigler very wittily and
modestly dubbed Stigler's Law of Eponymy: if a discovery is named after somebody,
then he didn't discover it. So the process is almost inevitable. You will just
have to accept that Dr Limelight from St Enema's is going to get the credit for
your protocol. And the only consolation you will get, from the coverage granted
to near irrelevant persons by the media, is that of being provided with yet
another proof, if proof were needed, that when it comes to drug development,
television and press don't know their base from their apex.
Notes
1. In any case, the point of the film is that it is not the girl that you think
who is Gregory's Girl
2. This witty remark is due Michael Healy. See his paper: Frank Yates,
1902-1994 - The Work of a Statistician, International Statistical Institute,
63, 272-288 (1995).
The Frightfully Drunk
Alcoholic's (FDA) plasma concentration has been affecting his mental
concentration and he is on his knees scrabbling around under a lamppost when
the Police Superintendent (PSI) comes by.
PSI What are you doing here?
FDA Looking for my keys which I lost 200
yards down the road.
PSI So why are you looking here?
FDA The light's better, of course!
I have had cause to
remark in previous SPINs on the lunacy that prevails whenever the topic
of bioequivalence is raised. Now we have fresh evidence that the madness
continues. A certain regulatory agency has produced a document for consultation
in which it is proposed to replace the old notion of average bioequivalence
with those of population and individual bioequivalence. And if
you haven't already heard of prescribability and switchability
you are going to hear a lot more of them in the future.
Suppose that a
physician is faced with the choice between using a brand name product (for
which an enormous amount of regulatory evidence for efficacy, quality and
safety has been provided) or a generic for a newly diagnosed patient. If he has
evidence that the generic is equivalent to the brand-name drug in the sense
that for a newly presenting patient there is no reason (other than price) to
choose between brand-name and generic, then the generic may be said to be prescribable.
Since, in practice, the generic will not have a mountain of direct evidence to
back up its claims, this requires that some good evidence has been provided
that the generic is equivalent to the brand-name drug. Until recently, it was
considered adequate to prove mean equivalence. This was not because drug
developers were unaware that two distributions could be similar in terms of
means and different in terms of variance. Far from it. It was because it was
considered that life was short and there were more important matters to look
at.
Suppose, however,
that you were concerned that the generic product, despite being the same on
average, was more variable than the innovator product. Would you not be
concerned to investigate thoroughly potential sources of such variability, for
example from batch to batch? Wouldn't you think it odd to fill a document full
of symbols for various components of variance, between, within and interactive
but have no time to consider the manufacturing process itself? Wouldn't you
think it was peculiar not to say anything at all about how the samples to be
compared should be chosen? Wouldn't allowing the sponsor to use product from a
single batch chosen by him be as illogical as looking under a lamppost for keys
you had lost two-hundred yards away?
Now suppose that you
are a physician whose patient is currently under a brand-name product and one
of those nasty purchasing agencies is putting pressure on you to switch him to
a generic. Now, it is argued that you need the concept of switchability.
This is because it is theoretically conceivable that two products could have
same mean bioavailability but one might be relatively more bioavailable for one
kind of patient and relatively less bioavailable for another. This is would be
an example of treatment by patient interaction. Now, you might say what
business should this phenomenon be of a regulatory agency. After all we don't
just sell to the prevalence we also sell to the incidence and for new patients,
the concept of switchability is irrelevant. You might think that a simple label
inside the generic product 'although KopyKat is on average similar to
Bigbucksalol, there may be the odd patient who will experience difficulty if
switched from Bigbucksalol to KopyKat,' would do. How wrong you would be! The
regulator is not just there to enforce regulations but to create regulations.
With many ICH documents being finalised there is a severe danger of "all
quiet on the regulatory front," and that would never do.
Don't get me wrong. It is
not that I think that patient by treatment interaction is an unimportant topic
- far from it. However, there is one thing about interaction that I remember
rather well from my undergraduate days and the study of linear models. An
interaction is usually less important than the marginal main effects.
Studying patient by treatment interaction, say, when comparing two quite
different treatments in heterogeneous patients is much more reasonable than
say, looking for interactions when comparing two formulations of the same
treatment in healthy volunteers. Curiously, there is no requirement upon sponsors
to do anything serious about the former. In fact, you are positively
discouraged from using the sort of cross-over trials that would enable you to
seriously investigate treatment by patient interaction (and which are now de
rigeur in bioequivalence) because, as everybody knows, the parallel group
trial is the gold-standard for investigating all possible questions (except
bioequivalence).
And there is another thing
I remember about interactions. As soon as you abandon the parallelism
assumption and admit the possibility of interaction the effect you will see
depends upon the subjects you choose. Now, unless I am very much mistaken, we
don't prescribe to healthy volunteers, let alone switch them from one
formulation to another. If switchability is a concern it is a concern for
patients. If subject by formulation interaction is possible, the effect will
depend on the subjects chosen. It seems rather remarkable that this new
guideline says nothing, therefore, about choosing subjects and in particular
that nothing is said about the absolute need of running bioequivalence studies
in patients.
Why can this be? Presumably
the light is better in healthy volunteers.
Next issue. A
do-it-yourself guide to increasing your scientific status by drafting
complicated guidelines on hitherto neglected matters. Watch out for these hot
topics.
How many stars for
significance in astrology? A user's guide to star-sign by treatment
interaction.
One Yank and it was off. The
effect of British versus American spelling on patient compliance.
A sticky problem. The effect
of chewing bubble-gum on absorption from suppositories.
Is your drug a frequent
flyer? Prescribing treatments in the age of jet lag.
Bad Vibrations. Oscillation
damping and drug shipment. An essential quality measure to avoid homeopathic
potentiation of treatments.
This is a rather complicated number but extremely popular world-wide and
turns up in the most far-flung places of medical research. It has a number of
local variants. I merely describe one here. The dance is divided into two major
sections the forward and reverse reel. They have a pleasing symmetry to them.
Forward reel
1. The introduction. The medical adviser searches for a suitably precise
and sensitive "instrument". Usually a rating scale with many
categories is employed. (Typical example Hamilton for depression.)
Alternatively a continuous measurement scale is used and highly precise
measurements employed, (e.g. FEV1 to the nearest ml).
2. The presentation. The scale is used to measure patients, subjects
etc.
3.Treading the measure. The scale is arbitrarily divided to form two
sections labelled 'responder' and 'non-responder'. (Some local variants have an
intermediate stage known as tripping along the baseline.)
4. The envoi. The dichotomies are handed over to "the
statistician".
Backward reel
1. The acceptance. The data are accepted by the statistician who now
refers to them as "binary".
2. The link. A suitable link is chosen to relate the binary data to a
continuous expectation. (This is necessary because direct models for binary
data don't work well.)
3. The model. A model is introduced into the dance. (This stage can
involve some very delicate footwork. To avoid accidents dancers are requested
to register their steps with the local caller before the dance.)
4. The analysis. The binary data are converted back to predictions on
the continuous scale chosen.
Music
Any tune at all as long as it is played on the fiddle.
Object of the exercise
As with all dances, the object is to introduce elaborate, superfluous
and complex movement. It has nothing to do with logical progression from A to B
and indeed the object is to end up (nearly) where you started. It should be
viewed as an art form, not a science.
Next issue. The full Monty or the art of statistical strip-tease.
Now you have no doubt been
thinking: this Guernsey McPerson has been going for a long time: how long can
he keep it up? Well that's a very personal question, madam, and none of your
business! It does, however, bring me on to a subject of current interest,
namely a currently rather infamous drug, which I shall refer to as Fullaggro,
used to treat a rather embarrassing condition, "condition I", with
increasing prevalence amongst males as they mature past middle age. I've
already said, Madam, that it's none of your business! Fullaggro is so
marvellous that not only is it a cure for "condition I" but also for
the general malaise regarding profits affecting the industry. I believe it was
William Buckley Jr who defined dancing as the vertical expression of a
horizontal desire but I think that the Fullaggro profits could be described as
the vertical expression of diagonal desire.
However, one man's profit
is another man's cost and politicians seem to be becoming very worried about
the cost of Fullaggro. Scarcely a day goes by without some politician's sound
bite on the subject, so to speak, but let's not digress to the political situation
in the US. And Fullaggro certainly is bringing the idiotic commentators out of
the woodwork. For a long time it seemed that you were not going to be able to
get Fullaggro on the NHS. The Minister's position appeared to be that you could
waste any amount of public money in your quest for treatment for condition I.
You could besiege your GP, pester consultants, be prescribed umpteen
treatments, as long as none of these were successful in curing "condition
I".
Now however, the Minister
has relented. You will be allowed your Fullaggro once a week. Has the man gone
mad? What a cock up! Let me tell you an old joke by way of explanation. (Alas,
not a GMcP original but our powers are not what they were.)
The Joke
The European Commission
were deciding on the ideal number of condoms there should be in the Euro Condom
Packet (ECP). "Four", said the Germans. "Four?" Yes,
Monday, Tuesday, Wednesday, Thursday but the weekend is for drinking and
relaxing. "Eight," said the French. "Eight?!!". Yes. Monday
to Saturday and for Sunday, twice". "Twelve," said the British.
"Twelve!!!????!!!" "Yes, January, February,..."
You see the point. George
Mikes famously remarked that the British don't have sex they have hot water
bottles. Once a month Fullagro would have been quite enough. Now the Minister
will be encouraging thousands of elderly males to be all dressed up with
nowhere to go, four times a month, when once was probably as much as they were
ever used to.
What on earth can they make
of it all on the continent? Proof positive that Les Vaches Folles have finally
got to the British. Has no one pointed out to the cost conscious Minister that
a lot more Fullagro is going to be sold out of the UK than in it, and that
since it is a British invention partly developed in Britain some of the profits
are bound to come back to UK limited. Perhaps he should have a word with the
Chancellor.
"He says he's a
beautician and sells you nutrition.
And keeps all your dead
hair for making up underwear."
David Bowie, The Jean
Genie
Shrink with horror?
In a shock statement last
night, it was revealed that genetically modified calculations have been
entering the statistics chain for several years now. It can now been revealed that
for the past nine years statistical scientists have been splicing so-called
"prior beliefs" into data using "Bayesian methods" in an
attempt to make them more resistant to chance fluctuations. This has led to an
immediate call from statistics consumer groups that all such calculations
should be clearly labelled as having been Bayesianically modified and in fact
that a three year moratorium on all such methods being used for public policy
should be imposed.
Frank and Stein
estimation?
Statisticians had long
postulated that Bayesian calculation was possible but until the start of the
1990s nobody had actually succeeded in doing one. It was also known that
so-called shrinkage estimators were theoretically superior to natural "raw
data". Then, a breakthrough in computation showed how, with the help of
"mixed-up calculation, muddled computing," algorithms (MCMC) the goal
of n-Stein restriction could be achieved. Within a short while, statisticians
of the so-called Bayesian persuasion throughout the world were engaged in mass
sessions of self-simulation. By this technique, brain waves are fed into the
computer, mixed with the data, subject to millions of random mutations and then
fed back into the subject's brain via a so-called graphical interface. There are
claims, however, that this technique is extremely dangerous and that, in
particular, failure to converge can result in a bad trip. Non-Bayesian
scientists are, in fact, claiming that Bayesianism is not a science but a cult,
and point to mass indoctrination sessions held at regular if infrequent
intervals in Spain during which mind-numbingly boring mantras are repeatedly
chanted by Bayesian adherents.
Freak and Twist
Statisticians of the
Bayesian school are fighting back however. They claim that there is absolutely
no cause for the public to be alarmed and that Bayesian calculations are
delicious, refreshing and nutritious; indeed, that they provide all that is
needed for a coherent diet. They point out that many naturally occurring
data-sets contain contaminants that Bayesian methods will help to down-weight.
They also claim that the common so-called "freak and twist" methods
have been contaminated with P-values for years.
The Bugs spread
Consumer groups are
unimpressed. They point out that these calculations have been released without
prior consultation. In a shock claim that has not been denied, they state that
one of these statistically modified viruses has escaped from a laboratory in
Cambridge and has spread like wildfire over the Internet appearing at thousands
of locations throughout the world and indeed that these techniques are now
being widely used by persons who have no idea at all what they are doing.
ICH and scratch
They also accuse the
world-wide regulatory body for drug development, the so-called ICH, of having
caved in to Bayesian demands without consulting the public. It seems that
according to the notorious ICH E9 guideline, "the use of Bayesian and
other approaches may be considered when the reasons for their use are clear and
when the resulting conclusions are sufficiently robust."
Storm in a t-test
A Downing street spokesman,
said that there was no need for the public to be alarmed. Successive British
governments had been controlling the release of public statistics in terms of
content and timing for years and that there was no intention of changing this
policy.
Of Simples and
Simpletons
Simple A medicine of one constituent.
Simpleton. One who thinks statistics can and
should be made simple. One who seeks to serve the cause of medicine by
promoting easy lies in preference to difficult truths.
All right thinking readers
of SPIN will, I am sure, be well aware that the NNT (numbers needed to treat)
is a moronic way to summarise the results of a clinical trial. For one thing,
it depends entirely on the background risk, hardly an ideal property to
summarise a clinical trial that is unlikely to be perfectly representative of
the target population. No two statisticians communicating with each other would
ever use such a device, preferring, I am sure, something like odds-ratios or
log-odds ratios. Recently, however, it was put to me that physicians cannot
understand odds-ratios let alone log-odds ratios and so they should not be
used.
I can't pretend that I find
this argument entirely unwelcome. I must confess to being rather jealous of all
those statisticians who get to work in survival analysis. Look at all the books
on the subject that they have produced: more than 30 when I last counted. There
is even a journal devoted to this subject alone. Now consider proportional
hazards, right censoring, Kaplan-Meier estimates, not to mention frailty
models, time-dependent covariates and counting-processes. It is pretty clear to
me that none of this is comprehensible to physicians, so we can just cut it all
out. That should wipe some smiles off a few faces.
This argument opens the
possibility of a whole new approach to medical research. Before employing any
scientific device, we should make sure that the physician understands it. We
may have our work cut out. Is that spirometer measuring forced expiratory
volume in one second as the area under a flow-rate curve? If so, please check
that the trapezoidal rule is being employed. If Simpson's rule or something
more complicated is being used, you will have to replace the spirometer because
the physician will never understand it.
And just think how much
simpler life is going to be. The average physician has a very basic understanding
of genetics, immunology and pharmacology. We can more or less cut out research
in those areas altogether, since the results can never be applied. Just think
how many ethical dilemmas we can avoid. For that matter, any physician using a
computer, say for word-processing, should be forced to demonstrate that they
have a working knowledge as to how it is put together.
And in this democratic age
we can take the issue even further. No physician should be allowed to prescribe
a drug to a patient if the patient does not understand the science. This will
solve the problem of finding a cure for Alzheimer's at a stroke (not to mention
the cure for stroke) since on this basis if you have it you can't be treated.
We could more or less wipe out paediatrics as a discipline. Intensive care?
Forget about it. And as for obstetrics: how many foetuses understand forceps,
epidural blocks, caesareans or the finer points of scanning?
Yes, I think that this
policy could really make life a lot simpler.
On the other hand, an
alternative approach might be to "black box" things. We could stick
with our log-odds ratios and simply invite the physician to input a background
risk to some suitable software. Then, via the magic of computer programming,
get the computer to spew out the predicted probability of death, cure or
survival (or whatever) under the treatment.
Silly me! This is obviously
far too complicated to contemplate.
Statistics. The scrubber science. All anyone
wants is a quickie without the encumbrance of a meaningful relationship.
Medical statistician, OBN. A sort of physician's
flunkey.
Millenium Medicine
This issue the GMcP column
looks forward to the sort of medicine, and news items about medicine, we can
expect in this new millennium. This will include
The Fullagro-Nicotine
Combi-Patch
Increasing awareness that
smoking is a major cause of impotence leads to a burgeoning market in
combination therapy. (Why did it take us so long to realise this side effect of
nicotine? How often have film directors given us shots of couples in bed
smoking before they had sex?) To prevent those taking nicotine patches
suffering a disastrous loss of libido, a Fullagro-nicotine patch is developed.
Fullagro cigarettes also prove to be a big hit and an unanticipated benefit
turns out to be that if you change your mind you can stuff them back in the
pack much more easily. Fullagro cigars are particularly popular in America.
Shag tobacco takes on a new meaning. Whisky distillers take the hint.
"Highland Caber," a new blend of malt and fullagro is particularly
popular with the Ibiza set and is ordered by saying "give me a stiff
one".
The Emergence of APRIS.
Over-exposure to Australian
soap operas leads to vast sectors of the British population being infected by
Antipodean Posterior Rising Intonation Syndrome (APRIS). This acquired speech
impediment causes every statement to be turned into a question by raising the
intonation at the end of the sentence. This can lead to confusing social
interactions. (In the following examples, the symbol ^ indicates that rising intonation follows.)
Waiter. Will you have the apple pie or the
plum pudding?
Customer. The apple pie^.
Waiter. It's a sort of covered pastry tart
with cooked apples in it.
Boy at party. Where do you live?
Home and away addict.
Boy at party. I said where do you live.
Girl at party. What do you do?
Neighbours-watching get-a life moron. I am a statistician^.
Girl at party. Well if you don't know, I don't
know.
The Scourge of GNUS
Genetic Nonsense Uttering
Syndrome (GNUS) is an advanced form of Alzheimer's afflicting many who work in
the pharmaceutical industry and in particular chief executive officers.
Particularly noticeable when these give public lectures in which phrases like
'human genome project', 'genotyping', 'targeted drugs', 'new opportunities',
'specific reaction', 'more precise clinical trial' and 'increased profits' are
improbably linked together with hot air.
Homeopathic
Pharmaco-Economics
The public's delight with
all things alternative leads to an increasing interest in homeopathic medicine
on the NHS. The National Institute for Clinical Excellence does a cost
efficiency analysis and rules that such medicines are reimbursable on
the NHS provided that the profit margin on the cost of active ingredient does
not exceed 1million percent. With street prices of arnica at one pound a
milligram, homeopathic practitioners throughout the country are left wondering
how they can make a living by charging 10,000 times nothing at all.
The Cure for Nicotine
Addiction
As 'patching' sweeps the
world, there is an alarming increase in the incidence of skin cancer. People
who wear nicotine patches are particularly affected. Several eminent scientists
claim that there is no proof of any causal link: it could plausibly be that
those who are genetically disposed to "patch" are genetically
disposed to skin cancer, rather in the same way that those who are genetically
disposed to accept payment from patch manufacturers are genetically disposed to
say this sort of thing. A new cure for patch addiction is discovered. It
consists of rolling up tobacco inside a tube of paper, lighting it and
inhaling. With the help of this ingenious device, patch dependence can be
eliminated.
Call me old-fashioned, call
me unimaginative, call me a curmudgeonly stick-in-the- mud, call me a cynic,
why, yes, call me a statistician, call me what you what will but I have a very
simple view of matters. Whenever the medical advisor asks me how many patients
we need I always give him the standard answer, "too many (and then
some)". If you ask me why it is that we run clinical trials with at least
dozens, often hundreds and sometimes thousands of patients, I will tell you
that that is how many patients you need to tell that the treatment works at
all: that is to say in some patients. Yes, yes my friends this is
the sort of pessimism that statistics induces.
How much nicer to be a
physician. A physician can look at a single patient and tell you whether the
treatment worked or not. In fact the most important part of the clinical trial
is the awards ceremony at the end. That's when the doc hands out the medals: a
responder tag to him, a non-responder label to her. That's the genius of
medicine. You only have to look at a patient to tell what would have happened
to him or her, had he or she been treated differently. This is no doubt why the
medical profession managed so well without clinical trials before statisticians
came around poking their noses in everywhere and spoiling a good system.
"It is true that Mr Brown died after I treated him but he would have died
anyway, whereas Mrs Smith, who recovered, would not have done so without my
help." Don't knock it, it's a system that gave us scalding baths for cholera
and blood-letting for tuberculosis and let's face it, nobody would ever do
anything so cruel to their patients unless they were utterly convinced it was
in their best interest, so they must have been very good treatments. (And they
were also so cheap!)
This reminds me of a joke.
Two ducks in Ballymena. One says, "quack, quack". The other says,
"he said ducks not docs"* .
How much better still to be
a manager. It surely didn't escape your attention that recently geneticists
have completed sequencing the human genome. (Did you know by the way that gene
is a four-letter word and so is genome?) Expect a lot more in the way of
portentous announcements from upper management and not just from CEOs in charge
of USA limited and UK limited. Not only is the human genome project going to
deliver a lot of targets for drugs, it's going to make clinical trials a lot
cleaner because we are going to be able to screen for non-responders.
Of course, if I were going
to play the bad fairy at this particular party, I would point out that in most
clinical trials patient-by-treatment interaction is not identifiable (to
use some nasty statistical jargon) and so not distinguishable from noise. Thus
we do not know that the reason that some patients appear to respond and
some do not is anything to do with a specific reaction to the treatment. It may
be the operation of a hidden cause only temporarily associated with the patient
or it may even be due to measurement error. What we also know, is that
patient-by-treatment interaction provides an upper bound to gene-by-treatment
interaction since human beings differ by more than their genes.
For example, we now know
that grapefruit juice can interfere with the elimination of some
pharmaceuticals. Therefore, unless and until we identify the gene for wanting
grapefruit for breakfast, it seems reasonable to suppose that this is one
aspect of variability (amongst many) that the human genome project is not going
to eliminate. (Of course grapefruit juice itself is eliminated but that
is another matter.)
We could have been
investigating this sort of thing all these years. We could have been running
multi-period cross-over trials in chronic diseases. We could have been using
sequences of n-of-1 trials to check whether there really is patient-by-
treatment interaction or not but of course this would have left us open to the
possibility of carry-over, a problem so serious that it prohibits all
within-patient designs.
How fortunate then that our
industry has such wise captains, who can "know" what the cause of
variability in clinical trials is without having had to run the experiments to
find out. It's a pity that they never succeeded in persuading the MCA and the
FDA to let genius, flair and insight decide whether treatments worked, rather than
being forced to waste time on all those expensive trials, collecting something
as old-fashioned as evidence. To take a related field, just think of all
those years that psychologists were carefully carrying out twin and sib studies
to see what was nature and what was nurture. If only they could have had the
wisdom of pharmaceutical industry management, think of the trouble they could
have saved themselves. That is what genius, flair and insight does for you.
Edison may have said,
"Genius is one percent inspiration, ninety nine percent
perspiration," but he was talking about genius not geneius.
This summer's best
seller. Men are
from Mars, Women are from Venus, CEOs are from Uranus.
*In the original version
the other says, "for goodness sake I'm going as quack as I can."
Meet the Archies
A paper in the Albion
Physician s Enquirer (APE) has compared meta-analyses produced by
members of the Archie Association (AA), an organisation devoted to summarising
and disseminating medical evidence, with those produced by the pharmaceutical
industry (PI). The authors, who are members of the AA, find that when compared
using an instrument created by members of the AA and validated by
members of the AA, meta-analyses carried out by other members of the AA are of
higher quality than those performed by the PI. This staggering piece of
impartial research (rather like comparing ballet shoes and ski-boots at Covent
Garden with Darcey Bussell doing the judging), deserves to be made as
widely known as possible. Yours truly was reminded that there is a long-running
programme on the wireless (the word is appropriate for something that
started fifty years ago) in which rural drama in deepest England is used as a
vehicle for imparting nuggets of wisdom to the intellectually impoverished. I
have thus proposed to the Corporation that an episode of The Archies be
devoted to this very topic and am in the privileged position of being able to
give readers of SPIN an aper u of the script, which now follows.
The Archies: an Everyday
Story of Database Summarisers
Dramatis Personae.
Dan Archie, a stalwart
overviewer.
Doris Archie, his wife
an equally stalwart analyst.
The Reverend Man, the
local vicar.
Scene: It is a Wednesday evening in the kitchen
of Dan and Doris Archie's evidence base in the village of Umbrage near
Boretester. Dan and Doris are sitting at the table. There is a bottle in front
of them and each has a glass.
Doris. But I thought we was in favour of
farmers, Dan.
Dan. Aye, that we be, Doris, but you
mean farmers not pharmas. It be these pharmas we must look out for.
Doris. How so?
Dan. Now Doris, you haven't been
a-sleeping through the vicar's sermon again have you? Surely you remember what
he said last Sunday.
Doris. Can't say as I do. What did he
say? [There is a knock at the door. Doris gets up to answer.]
Doris. Why, hello Vicar! What a
coincidence! You'll never believe this, but we was just talking about you. Will
you come in and have a glass of Cowslip wine?
The Reverend Man (For it is he.) With
pleasure Doris, provided only that the dose is large enough. Anything else
would be unethical. [Doris rises to get a glass. Various clinking and
gurgling noises, provided courtesy of the Corporation's Stereophonics Workshop,
may be heard.]
Dan. Vicar, we was just discussing your
sermon on Sunday. Could you remind us of some of the finer points?
Rev Man. I don't want to chide you, Dan,
but surely you will remember that the important thing in summarising anything
is to include all the points, even those of lesser quality? To do otherwise is
to lay oneself open to the charge of bias. [Doris and Dan make the sign of
the cross.] Of course, you may always draw attention to the doubtful nature
of low-quality points in an aside.
Dan [Crestfallen] Well, tell us
the whole thing then, Vicar.
Rev Man. I was discussing "The Ten
Commandments of Overviewing", which are as follows
Doris. Is that it?
Rev Man. Well, of course, these are just
the points I covered in last Sunday's sermon. But this is an ongoing story. We
can always find more bits to add. I think you will agree that what has been
said so far is highly significant. Over the next few Sundays, as I add more and
more to this theme, I shall prove how relevant it is.
Dan. But I have heard tell as to how
there be an eleventh commandment?
Rev Man. Ah, yes Dan. You are referring to, "thou shalt
castigate as idiocy all recourse to random-effect models". That is a
highly controversial point. It has been proposed by the Oxford Movement, but
although it has a long and venerable tradition, it has not been universally
accepted as canon law. In particular Bishops of London and Cambridge have spoken
against it. You and Doris have only done the Alpha Course. You need to do the
Beta and Delta Courses on sample sizes first before I can discuss that one. In
the meantime I would stick to more fundamental matters, if I were you.
Doris. You mean we should be
fundamentalists?
Rev Man. [Smiling.] No not exactly,
Doris. I mean, that if you always remember to go through your quality
assessment exercises you can't go far wrong. Remember that we have been
promised that when two or three of us are gathered together we may validate
our instruments. Now that reminds me. It seems to me that it is a long time
since either of you went to the annual retreat and you know how important those
are to your religious development...
Stop Press News:
Ski-Boots Better After all.
In a recently conducted
trial in Kitzbuhel, Hermann Maier (The Hermannator), using a slalom
course set by the Austrian ski-coach, claimed to have proved conclusively that
ski-boots were, contrary to previous research, better than ballet shoes. A
tight-lipped spokesperson for the Archie Association said that there was no
evidence that slalom courses had been validated, and, in the absence of
such evidence, no credence whatsoever could be given to this research.
'There's no such thing as objective marking.'
Malcolm Bradbury. The History Man
In my dim and distant
youth, in the days before I joined the Elixir Laboratories of Pannostrum
Pharmaceuticals, I taught adults for a living in an institute of higher
education: the College of Buchaillemore. The teaching load included a fair
number of ancillary courses. One that will not easily be forgotten was
'introductory mathematics and statistics' for the first year quantity surveyors
or "Brick One", as they were known. What scallywags they were:
filling in the temporary register with names such as Dick Tater, Cliff Erosion,
Hugh Jarse and Juan Kerr in the hope that their lecturer would read them out at
the next lesson. The extremely comely tutorial assistant had a busy time of it,
poor thing. A male hand would be raised for help. However, if yours truly
approached, the hand would be lowered. Still, I suppose everybody benefited: my
voluptuous assistant got exercise both physical and mental, which can only have
done her good, the quantity surveyors discovered pleasures in learning
statistics they could not have imagined and I was entertained.
It occasionally happened
that students appeared to fail on one or other of the courses that I taught. I
say appeared to fail for such occurrences were usually temporary. If for
any reason the examination board had failed to give their case a sympathetic
hearing, the director of student services, Mrs Mahen, could always come and
plead their case at the appeals board.
Mrs Mahen: The reason that John McSlacker
has failed is that he has been working nights as a barman to pay off his debts
and so has been unable to study at all. (Murmurs of sympathy all round.) Personally,
I think that speaks volumes for his sense of social responsibility.
Dr McPearson: (Suspecting from McSlacker's class
attendance record that he must have a day job too.) But is his student
grant* not sufficient? (Intakes of breath, tut-tutting at this
unsympathetic and ungentlemanly line of enquiry etc., etc.)
Mrs Mahen: (Triumphantly putting down a
smart-arse.) He had to pay off the loan obtained for the purchase of his
motorbike.
Principal Conniver: (Intervening swiftly, fully aware
that McPearson is aware, that the hall of residence in which McSlacker lives is
half a mile from the College and seeing McP's next question a mile, if not half
a mile, off). Well what could be clearer than that? I think we must, in the
name of fairness, condone this failure. Who is the next student we need to
consider?
However, sometimes, in
order to obviate the necessity for an appeal and if, perhaps, there had been an
excessive number of failures on the course, there was a decision at the
examination boards to re-scale the marks or, as they say in America, "mark
on a curve". Attentive readers will know that your columnist is a
statistical totalitarian, who holds the radical view that statistical method
should be applied to everything, even statistics. Not to apply
statistical reasoning to statistics itself would, of course, be the ultimate
hypocrisy.
However, hypocrites is what
the majority of statisticians are, believing that statistical reasoning should
be applied to anything and everything but not statistics. Do you think I
am being unfair? Well, what statistician when consulted with a request to
design an experiment would say, 'don't worry about that just start and see how
you get on'? But is that not how all statisticians do simulations, which are,
after all, their own experiments? What statistician would recommend presenting
results from an experiment without measures of precision. But how many
statisticians in quoting the results of simulations give you standard errors as
well as means? If a physician comes to a statistician saying, 'I want to screen
for disease. The false positive rate of my test is 10%, the false negative rate
may be as high as 70%, the exact prevalence is unknown but the disease is
fairly rare and the treatment is not very effective," will the
statistician reply, "excellent, go ahead!"? Yet this is exactly the approach
that PSI recommended at one time for screening for carry-over in cross-over
trials.
To return to examination
boards, my radical proposal was that since we wished to transform marks on the
interval 0 - 100 so that they remained on the interval 0 - 100 (but were
increased),we should use a logit transformation. All my colleagues, were of the
unanimous opinion that this was far too difficult to contemplate.
But, I persisted. I
prepared a little table for the next examination board. A more modern version
of this is in the Excel Sheet attached. You then simply
imagined what you would consider a mark of 50 ought to be improved to and this
then predicted how all other marks should change. For example if you thought
that 50 should really be 58, then you looked for the row which had 50 in it and
then found the column with 58. This was then the column you used for re-scaling
all marks so that, for example a mark of 35 becomes a mark of 42.6
It made no difference. It
was perfectly acceptable for us to teach our own mathematics and statistics
students the theory of generalised linear models and to examine them in it
(bearing in mind, of course, the necessity of being generous in marking) but
re-scaling marks using a logit transformation. Who on earth could have any
faith in something so complex? Who would understand it. How would you explain
it? My scheme was not implemented and we continued to adjust marks on the
ancient and trusted piecewise linear system.
It occurs to me, however,
that my little table might be useful after all: if not at examinations boards
then to the biostatistical community. Most physicians, I am told, have the
greatest of difficulties in understanding odds-ratios and seem to be as averse
to them as statisticians at an examination board. However, apparently nomograms
are very much appreciated in the same quarter. Indeed some who think physicians
will have difficulty understanding odds ratios have promoted quite complex
nomograms for sample size determination. So I have added on a little odds-ratio
coda to my table.
First catch your odds
ratio. Then once you have it, see how it transforms the mark of 50. Once you
have that, you can use the main table to see how it transforms any mark
whatsoever. Yes, folks, that's the magic of statistics for you.
Of course, anybody who
wishes to have a more extended version of the table is welcome to contact your
truly who will supply it in return for a typically modest+ fee.
* Yes, in those dim and
distant days, students got grants
+This being a highly
appropriate adjective for anything associated with GMcP
How wonderful it is that we
have such a high quality medical press. It cheered my heart recently to read
editorials in the Speculum and the Albion Physician's Enquirer
(APE) announcing that the editors of the world's leading medical journals are
banding together to put a stop to the evil machinations of the pharmaceutical
industry.
For example, it seems to be
a particular nasty and growing habit of this industry to employ so-called Contract
Research Organisations for running and analysing trials: to use professionals
(I can scarcely bring myself to write this ugly word) when they could be
using amateurs such as academics. Yes yes, instead of having audited data
trails, quality control, timely and high quality data, and so forth we could go
back to the good old days, of having, mixtures of text and numeric data in the
same field, missing consent signatures, source data filed in the waste-paper
basket and all the other advantages of the academic approach to collecting
data. I am reminded of the wonderful scene in Chariots of Fire when the
masters of Trinity and Caius (played by Sir John Gielgud and Lindsey Anderson)
confront Harold Abrahams (Ben Cross) with his awful crime of employing a
professional coach. "Here at Cambridge we favour the approach of the amateur."
And who, when asked to compare the two forms of that game with two codes,
(which the French call La Philosophie Ovale) would not have affirmed the
moral superiority of Union (as it was!) over League. Publicly paying working
men for time lost on the field? The very idea! Bungs under the table and
Masonic handshakes, is the way that things should be done.
And think of the moral
dimension we shall gain. CROs are in it for the money. This makes them
inherently evil. Academics on the other hand, apart from a desire to publish as
much as possible, and thereby earn fame, promotion and a better living, have
nothing at heart but the good of mankind. Look at the history of the Nobel
Prize. Never in all the hundred years of its existence have there been any
attempts to influence judges, steal credit, re-write history, unfairly upstage
colleagues and so forth by any scientist anywhere in the world: truly Nobel
behaviour.
There is the further
advantage, the so-called Teflon factor. Mud does not stick in the
academic world. For example, if (as has regrettably often been the case)
academics have faked data, their co-authors are morally blameless. The
University involved will set up a commission, which will roundly condemn the
disgusting individual concerned. Particular stress will be laid on the awful
crime he or she has committed in bringing his or her senior colleagues into
disrepute by association when all they wanted to do was add another publication
to their CVs. The idea that these co-authors, let alone the university, still
less the whole academic community, should be tarnished by association is quite
frankly ludicrous. On the other hand, if one ambitious marketing person in one
company attempts to exert some undue influence on a publication, then the whole
of the evil and global (ugh!) pharmaceutical industry is guilty. This is
because these swine are all feeding at the same trough. Academics on the other
hand are ploughing a lonely furrow, following their divine inspiration wherever
it leads them (albeit frequently congregating on the author lists of
publications).
And there is yet another advantage.
For example a fair way to compare standards in the pharmaceutical industry and
the medical press does not involve average quality. No, no! It is appropriate
to take the very best examples of mega-trials published in the literature (with
collaborative input from eminent statisticians) and compare these to the worst
examples of abuse in the industry. The bottom of the one distribution should be
compared with the top of the other. (It is an indictment of the total lack of
imagination of statisticians that this procedure is found in no standard
statistical textbook.)
And think of all the
creativity we can let loose as soon as we analyse things the way they do in the
Speculum and the APE. A favourite example of mine from the Speculum
some years ago illustrates the sort of innovation in analysis we could have.
Patients were treated for several months in a cross-over trial. For each
patient, for several outcomes a significance test was carried out using days
as independent replications to compare the two treatments. Patients were
treated for several months on each arm and in fact the investigators had values
of n1 and n2 in excess of 100 for each
patient for each outcome. The authors had significance pouring out of
their ears. (A slight criticism must be entered here. If only the authors had
used hours instead of days this procedure could have been made even more
efficient.) On the other hand if we look at the miserable ICHE9 guidelines that
the pharmaceutical industry uses, the authors would have had to pre-specify the
analysis and would certainly not have been allowed to use the one that appeared
in the Speculum. They would not have been allowed to treat dependent
data as independent and would have to have dealt with the multiplicity problem.
Just imagine, how much more rapidly medicine would progress if we could have
the editor of the Speculum run the MCA.
So I am looking forward to
this brave new and pure world. I am just hoping that that other most controlled
of all industries is sitting up and paying attention. The next time I fly to
Paris I don't want to hear "this is your captain speaking" over the
intercom. What I want to hear is "Hallo. This is Professor Norbert
Know-all, Daniel Bernoulli Professor of Aeronautics at the University of
Perfection-on-Smug. I shall be flying your plane tonight. You will be pleased
to know that together with my colleagues we have made a few improvements to the
design of the machine. The stewardesses, who are all sociology postgraduates,
will shortly be coming round with some questionnaires for you to fill out.
These when analysed, will appear in their PhD theses. Unfortunately this means
they will not have time to serve you any food or drinks and we apologise if our
pre-flight publicity has been misleading in this respect. We should be cleared
for take-off shortly and you will be pleased to know that the Hillingdon
Amateur Radio Association, who meet every Wednesday in the local scout hut, are
in charge of traffic control at Heathrow tonight. "
STOP PRESS NEWS. Speculum
wins Nobel prize .... for fiction.
Invincible
Ignorance
Invincible
Ignorance. "A term in moral theology denoting ignorance of a kind which
cannot be removed by serious moral effort." The Concise Oxford
Dictionary of the Christian Church.
Company Regulatory Affairs
at Pannostrum Pharmaceuticals (CRAPP) do not usually feel the need to bring
statisticians with them when they go to see the Pangean Commission (which I
shall just refer to as the Commission from now on) at the Pangean Pharmaceutical
Evaluation Agency, (which I shall just refer to as "the Agency" from
now on). This is partly because the Agency (unlike another famous agency) does
not employ statisticians (of which more anon) but also because CRAPP have found
from bitter experience that statistical arguments are much easier to understand
if you don't have statisticians to explain them. For example, statisticians will
try and say things like, "a P-value of 0.04 does not mean that there only
4 chances out of a 100 that the drug does not work," when every
non-statistician knows that it does. They will also baffle you by informing you
that you cannot assume that things are the same because they are not
significantly different. As regards the latter point, I once gave the head of
CRAPP this helpful analogy by way of explanation: a person who has been
acquitted of child-molesting is not automatically your first choice as
babysitter. She replied that a) her children were now grown-up, b) she had
always taken the greatest care in her choice of babysitter and c) if only I
knew how difficult it was to raise a family while pursuing a career I would not
make such hurtful remarks.
However, in some fit of
madness, CRAPP decided to take me along to a recent hearing at the Agency for
one of our products, Sniffgon , which we wished to extend to use in SARN
(Seasonal Allergic Runny Nose). The Agency, of course, does not employ
statisticians (of which more anon) but some of the Pangean member states do and
in any case it is of course the member states, or their representatives, via
the Commission who decide on the fate of submissions. As regards statisticians,
Hyperborea has several, Teutonea has one or two and Calcamalbion now has three
or four. Admittedly, of the Tethic states only Aegea has one but all in all
there must be at least a dozen statisticians working for Pangean agencies,
which is an average of nearly one per member state and therefore clearly
adequate. You might think, therefore, that at a meeting of the Commission you
might find a statistician or two. You'll be lucky! Each member state sends two
representatives and with only two to send, they are unlikely to send a
statistician, assuming they have one. Beside which, the Agency doesn't employ
statisticians (of which more anon) so why should the Commission send any.
This time our luck was out.
On entering, I scanned the room carefully. My heart sank. No statisticians that
I could recognise were to be seen.
This did not, however, stop
one of the assessors from launching into a long 'statistical' criticism. In one
of our trials it seems that the allergic status of patients had been
established two weeks before the start of the trial. This meant, the assessor
explained, that since the status of some patients may improve from time to
time, the trial could have consisted of a mixture of responsive and
non-responsive patients. This in turn meant that the groups could now differ
randomly and the test of significance would be invalid since the results might
be due to "random chance" and this could be the explanation of the
observed value of P=0.0003.
I explained as tactfully as
possible (tact being one of my natural virtues, along with modesty) that the
whole point of a significance test was to explain to what extent the results
could be explained by "random chance." In this respect the results
would be no more or less valid from this trial than from any other. Indeed
there were probably dozens, possibly hundreds, maybe even thousands and perhaps
as many as 30,000 different ways patients in any trial could differ from group
to group through genetics alone, if the biologists were to be believed* . The whole point of randomisation,
I continued, was to make sure that there were only two possible explanations of
any result: "random chance" or an effect of treatment. Of course, I
carried on, where covariates had been measured, and they were believed to be
predictive of outcome, they could be used to refine one's opinion as to the
extent to which "random chance" was the explanation. However, the whole
point of randomisation was that it permitted one to apply the property of the
average to the individual case to the extent that the individual case could not
be recognised as differing from the average. In the same way, insurance
companies could validly set premiums for individuals using the experience of
populations, provided that the individuals could not recognise that their risk
differed from the population average.
I was rather enjoying the
debate at this point, seeing it as providing me with a golden opportunity of
proselytising heathens in a place in which statisticians are famously absent
(of which more anon). My argument, however, was surprisingly badly received. It
seems that the assessor preferred to think that the reason that I was disagreeing
was not that the subject lay in my competence rather than the assessor's but simply that I was clutching at
straws in a desperate attempt to defend a dossier whose fatal flaw had now been
exposed. It might be the case that five out of our six trials were significant.
However, only two were pivotal Phase III trials, and one of these had now been
exposed as having results that might be due to "random chance", which
left us with only one pivotal significant trial, which was not enough. I looked
around the room to see whether this nonsense was producing the same effect on
the rest of the assessors that it was on me. However, there were of course no
statisticians present, the Commission taking its cue from the Agency, which
does not employ any (of which more anon).
I was about to reply again,
when a sharp kick on the shins from the head of CRAPP warned me that it had
been decided to beat a hasty retreat. We then thanked the Agency for their
trenchant criticisms, made several placatory noises and left.
On the way back to the
office I expressed my dissatisfaction with the whole process to the head of
CRAPP. Why, I said, (this is now the "more anon") since the whole
thing was run by amateurs who clearly had no knowledge of clinical trials, did they
not employ a statistician as a sort of "clerk of the court" who could
give a technical opinion where technical advice was needed. She explained to me
that the Agency was not involved in evaluation and therefore did not need to
employ statisticians. I remarked that this was a brilliant principle. Could she
tell me who, apart from statisticians, were usually involved in assessing
regulatory dossiers? Why, she replied, I must know that physicians,
pharmacists, pharmacokineticists and so forth were all involved in this
process. In that case, I replied, since the Agency was not involved in
evaluation, I assumed it took care to employ none of the above. Don't be
ridiculous she replied, how could the Agency perform its job unless it employed
such people and had it ever occurred to me that I had a very odd way of looking
at things and a most perverse manner of expressing them?
I am sorry, I replied
tactfully, it's just that I have this prejudice that when it comes to logic,
people who can't count don't count.
* That is assuming that genes act singly and we don't need to worry about
interactions and that a group of scientists who recently thought that there
were 100,000 genes can be trusted when they now say that there are 30,000.
Next issue. The role of Astrology in
evaluating medicines: Cancer and oncology, Gemini and infertility, Libra and
vertigo, Aquarius and urology, Virgo and impotence...Taurus and the Agency.
An anniversary is a time to
look back but also a time to look forward; and what, you are doubtless itching
to know, does GMcP see when looking forward? Well here is a riddle for you.
"What do lap-dancers and GMcP have in common?" Answer: they both see
a lot of silicone looking forward. Yes, folks, the future of PSI is silicone.
Now I know what you're
thinking: Guernsey, with his penchant for terrible puns and sarcasm (or should
that be Sarkasm), has decided to talk about the breast-implants story and its
implications for the pharmaceutical industry and we are now in for some
terrible and tasteless puns: implants go bust, storm in a D-cup, thanks for the
mammaries, and so forth. Rest assured; this article is not about that and I
shan t use any of those terrible puns: delicacy forbids. This is not to say
that this is not an important topic. Nobody is immune from the cupidity of
lawyers and the stupidity of juries. "Twenty years ago my client had
breast implants, five years ago she developed connective tissue disease. The
one clearly caused the other. We now need $10m in compensation." More than
half of this, of course, will go to the shysters. Clearly if that can work for
implants why not for pharmaceuticals? We should all tremble for our pensions.
(Or marry a lawyer as an insurance policy.) Well, I may cover this topic at a
future date (I find myself curiously attracted to it), but instead I am going
to talk about that other use of silicone: to make chips for computers. Yes,
GMcP is keeping abreast of all developments. I am going to talk about
simulation.
Simulation, virtual drug
development, in silico development: these are all crucially vital topics of the
day. We are in for a wonderful new era in drug development. This was made clear
to me the other day. I had an invitation from the Pannostrum marketing
department to join them hear a presentation on the topic by a management
consultant. It was amazing stuff. He started out with PK data from eight
patients, fed in a possible PD model, pressed a button on his laptop and before
you could say "modern miracle" had therapeutic outcomes for 200 patients.
"So," he said, "we can now feel a lot more comfortable and
confident about the effects of the drug." It is no exaggeration to say
that all present were absolutely staggered.
Admittedly our reasons for
amazement were somewhat different. The marketing men were amazed that
information about the effects of drugs could be had so easily. I was amazed
that marketing men could be had so easily. This, despite years of having had to
put up with the nonsense they produce. I thought the time was ripe for a little
question. "Two hundred patients is rather few for a Phase III study",
I said, "would it be possible to simulate data for 1000?" The speaker
shot me a look that would have curdled milk. I could see he had noticed me for
the first time. I was rumbled: battered tweed jacket, tie at half-mast, very
old-fashioned teeth. It was quite clear that I was quite a different
proposition to the Armani suits filling the rest of the room. He d sussed me
for a statistician. "Well, of course", he added, "simulation could
never replace a Phase III programme. However, it can make us more confident
about what we will see in a Phase III programme." The Armani suits all
nodded wisely in agreement.
Actually, I nearly agreed
with him. Why? Well because simulation is just mathematics by other means and
of course mathematics, or at least Statistics, is the way that we calculate the
implications of the work that we have done. Simulation is just a means of doing
convolutions, which is to say of performing the numerical integrations that are
necessary to mix one distribution conditional on another. Why, I have used
simulations myself to check theoretical results and the two have always agreed
most admirably, thus proving that either my theory was correct or that I had
made the same mistake in my simulation as my theory. And, of course, we even
simulate to do calculations these days. The robustniks bootstrap everything, we
use multiple imputation for missing data, and the way that Bayesians use
simulation to do their calculations bugs everybody. No, it s not using
simulation to do integration that I object to that is done all the time it
is using simulation for multiplication that I find objectionable. Eight
patients are eight patients and so should remain.
However, if you can t beat
them, join them. And that is my advice to you all. We are all going to have to
learn a lot more about in silico development. At least I know that this is true
for Pannostrum Pharmaceuticals. The Marketing Department has a lot more
influence than Biostatistics and we had some wonderful memo from on high only
the other day about streamlining development, faster time to market, proof of
concept blah, blah, blah, virtual drug development, simulation. Did I forget
anything? Oh yes, pharmacogenomics and theranostics. The share price reacted
very favourably. So I have enrolled on a course that is now being run within
the company by the same group that provided the management consultant.
This led to a strange and
disturbing conversation in the GMcP household and I should be grateful if any
member of PSI, perhaps some female member with a better understanding of the
thought processes of the fair sex, could explain it to me.
GMcP. Pannostrum are paying
for me to attend a course to learn about simulation.
Mrs McP. People pay to
learn about that? I have been doing it for years.
In the year 3535,
Ain't gonna need to tell
the truth, tell no lies;
Everything you think, do,
or say
Is in the pill you took
today.
Zager and Evans
PSI annual conference is a
great event, to be sure, but somewhat depressing for us wrinklies. Everybody
else looks so young. There is scarcely anybody there who could be expected to
remember that in August 1969 the song, 'In the Year 2525,' by one-blockbuster-wonders
Zager and Evans reached number one in the UK charts, thus repeating the trick
it had pulled off in the previous month in the US. Perhaps, at the next annual
meeting, rather than dancing to 'YMCA' for the third time of the evening we
could have a go at Z&E.
Now, to have a top-seller
both sides of the Atlantic is something of which our marketing men are always
dreaming, but it has to be admitted that even by the recent rather torpid
standards of Pannostrum Pharmaceuticals, a launch data of 3535 is rather tardy
and not particularly impressive to the investors. Our management is becoming
increasingly anxious about performance. New ideas are what we need, apparently,
at least so we were told in a memo recently from our CEO, Sir Lancelot Pastit,
informing us that he had detailed his 'Millennium Task-Force' (MTF) to come up
with some. Call me cynical and old-fashioned, but I thought we needed new drugs
not ideas. The MTF is peopled by re-cycled marketing men, and whereas I have
never denied their ability to secure old drugs and thereby benefit the
Colombian economy, I have found them about as much use as a laptop in a kayak
when it comes to developing new ones.
In the last GMcP I told you
of my unfortunate involvement with the simulation-sellers. You would have
thought that my management would have learned from experience, but blow me (to
use a rather old-fashioned phrase) if I didn't find myself signed up for the
theranostics presentation organized by the MTF. "Theranostics?", I
hear you cry. "What can that be?" Well, let me say it could just as
well have been called 'diapeutics'. It is the synergistic fusion of diagnostics
and therapeutics to deliver personalized targeted medicine based on individual
characteristics. Or, so it was explained to me.
Wonderful, I thought. The
PK people will love this. At long last marketing is going to have to cave in
and allow us to dose drugs by bodyweight which is what we have been trying to
persuade them to do for years. Everybody knows that whereas with our current
one-size-fits-all mentality the rugby front row don't get the full effect and
the little grannies are likely to suffer overdoses. Dosing by bodyweight is
logical and scientific.
No, I was told. It is far
too difficult to sell drugs by bodyweight. It is too complicated for the
prescribing physician and in any case the opposition with their one-dose,
once-a-day alternatives would kill us dead.
"Let me get this
straight", I said. "You will give patients a different pill based on
their individual genetic codes but you are not prepared to give a different
dose based on body-weight. You will diagnose people with gene chips but not
with bathroom scales?"
"Yes, we must exploit
the exciting promise of pharmacogenomics," came the reply, "and
Pannostrum will be at the forefront of the new theranostic technology."
Now, if I look at the
history of medicine, it seems to me that it has not been the diagnostics that
have been lacking but the therapeutics. We could diagnose diabetes more than
two millennia before we could treat it. My humble opinion is that we had better
spend our time rather urgently at Pannostrum finding some new drugs that
actually worked, rather than worrying about establishing who they worked best
for. Simultaneously finding wonderful drugs and perfect patients wasn't going
to make life much easier for us. (And don't think I don't realize, as a
statistician, that we have to think factorially. It is not main effects we
would be looking for but interactions, and in any case you can't allocate
patients their genes. These are blocks not treatments.)
However, I am nothing if
not positive and helpful and volunteered the following wonderful creative
insight to the MTF. If Sir Lancelot doesn't nominate me for a special bonus
this year, he is not the man I take him for.
This is my idea. There is
an extremely important division of human beings on the basis of genetics, which
has the potential to make a considerable difference, if not to the actual
treatment that should be given, then to the optimal dose. Furthermore, this genetic
division forms two subtypes distributed with almost equal frequency within the
human population, which is the optimal situation if you wish to tailor-make
therapy. (After all, if there are more than two subtypes it gets complicated
and if one is very rare, it is hardly worth bothering about.) This is a major
genetic difference, which has its origin in a massive chromosomal deficiency in
one subtype and which leads to a considerable difference in phenotype. The
medical importance of this can be judged by the fact that life expectancy at
birth in the deficient subtype is several years less than in the normal form.
Yet currently, for the vast majority of treatments, we take no account of this
genetic phenomenon in prescribing drugs. The only bad news is that this genetic
difference is rather easy to detect, so that we will have some problems
patenting the diagnostic part of the therapeutic strategy. On the other hand we
shall be well ahead in the therapeutic race if we start organizing our trials
to look at these differences.
I think I had the MTF quite
excited. What are these genetic subtypes?
"Oh that's
simple", I said. "They are called men and women."
Dredging for P
I write this at the end of
another festive season. So cheers! Is your glass half empty or half full? The
question is not without relevance but I must remind you that patience is a
virtue, Rome was not built in a day, make haste slowly and short cuts make long
delays, as all of us who work in drug development surely know. You will have to
wait for the relevance of this to be revealed.
Your columnist is
occasionally called upon to lecture. This is not, of course, because he is
believed to have anything interesting to say. I suspect the phenomenon is
similar to that which affects the pop charts these days. In addition to seeing
them occupied by the fit, young and beautiful, the occasional haggard has-been
can be seen on television crooning (or croaking) some duet with some nubile
young thing. Why people like this, I don't know. It may be nostalgia or simply
charity. Sometimes I feel that I am being wheeled out (not quite literally yet)
as a warning to the youngsters. 'Look what happens to you if you don't make
that career move into project management or marketing - you turn into a boring
old statistician.'
Occasionally, double-acts
have their unintended opportunities for humour -at least of the debased sort
that makes GMcP laugh. A couple of years back, when contributing to a series of
statistical lectures being given to a medical audience, I was rather depressed
at the prospect of having to follow a lissom young lady. She was getting a
terrific reaction. All my subsequent appearance was going to do was produce a
tidal wave of disappointment from the male half of the audience without any
compensatory ripple of appreciation from the ladies.
However, serendipity, is
the greatest factor in any success, as anybody who has made any survey of the
performances and abilities of pharmaceutical CEOs must surely conclude. I was
offered an unexpected gift. The young lady explained that the concept of the 5%
significance level, like all great ideas, occurred in the bath. Before Fisher
and Yates, tables were tabulated in terms of the statistic rather than the
significance level. It was for copyright reasons that Fisher decided on reverse
tabulation (like anti-logs), using significance levels rather than values of
the statistic. But what level to use? This is where the bath comes in. Fisher
spotted his five toes - et voila - 5%.
This, of course, was a
godsend to yours truly, who opened his lecture by explaining that he was
tempted to provide the missing explanation as to how Fisher, in his bath, came
upon the idea of the 1% level of significance, but that good taste had
prevailed. From that moment onwards, the success of the lecture was assured.
And this story brings me to the point, so to speak, of this piece. In fact, the
purpose of this rambling, is precisely to discuss significance levels and also
P-values.
Now, I don't know how
things are where you work, but here at Pannostrum Pharmaceuticals we
occasionally get disappointing results. In fact, we often get poor results, but
since experience has taught us to expect them, we are only occasionally
disappointed. By a poor result, of course, what I mean is P > 0.05. However,
to explain what our medical advisors do with such results I am going to have to
make yet another diversion. This time to recount an old joke by Bennet Cerf
(slightly adapted) on the two brothers: a pessimist (surely a statistician) and
an optimist (who must have been a medical advisor). On opening their Christmas
presents, the statistician, a great connoisseur of malt whisky, is glum to find
a crate of rare cask-strength Glen Kinchie in numbered bottles. 'What a
terrible hangover I shall have', he remarks. The medic, equally a fan of malt
whisky, receives a bottle of Irn Bru, famed throughout Caledonia as a cure for
hangovers. 'Goody, goody. Somebody has bought me a crate of malt.'
So what do we do at
Pannostrum? We have two standard devices. The first is, 'the one that got
away'. Its presence is marked by statements of this sort, 'unfortunately the
result was not quite significant (P = 0.11) due to the trial being too small'.
This is what we employ medical advisors for, of course, to know that the reason
the result is not significant is not because the drug doesn't work but simply
because the trial isn't large enough. Who needs patients. A medical advisor is
worth a hundred of them: in fact, worth more than a hundred patients because
with the patients you never know what the results might show. Before you sneer,
however, ask yourself if you have ever fallen into this trap: you have a
meta-analysis of results to date which show significance and confidently expect
that when the next trial comes in the overall analysis will be even more
significant.
The second is, 'the cheque
is in the post'. For this we have statements of the form, 'the result showed a
trend towards significance, p=0.09'. This is really rather similar to the previous
one. A P-value, despite the name, is a position not a motion. Is the glass half
empty or half full? 'The wind bloweth where it listeth, and thou hearest the
sound thereof, but canst not tell whence it cometh, and wither it goeth'. But
as far as our medical advisors are concerned, P-values are a northerly gale and
the direction they are heading is 'down'.
I upset a medical advisor
the other day by slipping in, just for a joke, the following statement. 'The
result should be treated with caution, as it shows a trend towards
non-significance, p=0.03'. 'Why did you do this, Guernsey?,' she asked.
'Symmetry and logic,' I replied. 'If 0.05 is a magic boundary and results such
as 0.09 are trending to cross it into significance, then to keep the
thermodynamic balance the results below it must be trending into
non-significance. By the way,' I added, 'why is it that when we have P=0.09 for
a side-effect, we never describe it as, "trending towards
significance"? .' 'You have a very perverse and unhelpful way of looking
at things,' was the reply. I had failed to appreciate that medical advisors are
to P-values what Maxwell's demon is to thermodynamics
(see http://www.maxwellian.demon.co.uk/name.html
for an explanation of the latter). They selectively open the little gate of
significance to let the important results through.
Is the glass half empty or
half full? I find it difficult to say. If the wine is being poured into it you
might have a case for saying it is half full. When the wine is being drunk, it
is clearly half empty. Most of my glasses are (very briefly) half empty. Which
reminds me. I got a rather fine bottle of The Glenlivet for Christmas.
Are any of you out there working on a cure for hangovers?
"You cannot hope
to bribe or twist
thank God! the
British Journalist
But, seeing what
the man will do
unbribed, there's
no occasion to"
Humbert Wolfe
Benjamin
Franklin once said, "But in this world nothing can be said to be certain, except
death and taxes" and somewhat later, John Maynard Keynes remarked that
"in the long run we are all dead". These sayings, you may think, are
examples of that phenomenon whereby if you are famous or infamous enough the
blindingly obvious or mundane is put down as a witty or profound remark and
preserved for posterity. "I think I could eat one of Bellamy's veal
pies", Pitt; "we had better wait and see", Asquith; "This
will never do", Francis Lord Jeffrey and so forth. However, I am going to
demonstrate to you that far from its being accepted as obvious, many of our
fellow citizens, that is to say the vast majority who suffer from the grave
disability of not being statisticians, deny this truth. I am referring here to
the death bit, not the taxes. Whenever I view my tax bill, my bile rises and I
come to the jaundiced conclusion (a case of hepato-fiscal-toxicity) that
Benjamin Franklin was not right after all and that whole swathes of society are
not paying taxes at all: not only living at the expense of others but dying at
their expense too. However, I mustn't get started on the subject of taxes. Let
me get back to death.
A
consequence of the Keynes dictum is that if you study enough patients long
enough some of them will die almost surely, to use probabilistic jargon.
A corollary is that just because some patients have died who took a drug, it
doesn't mean that it's the drug what did it. However, try telling that to the
journalists of London's famous street Magazine, CAPITAL LETTERS which
only a few years back ran a story under the heading "STIFF STIFFS" as
follows. "They thought that Fullagro was a wonder cure for impotence but
now only a year after launch, 31 users are dead. Who is to blame?" Who
indeed? I blame it on the education system in this country, which does not
stress clearly enough the importance of thought. I wrote in protest to CAPITAL
LETTERS pointing out that several former readers of that journal must now
be dead and asking if they would be withdrawing the magazine pending an
investigation. What was missing, if we were to make any sense of these figures,
was an estimate of exposure. I got a reply from the journalist concerned saying
there was no need to be patronising and he was well aware of the argument that
so many thousands died every year who had drunk cups of tea. To which I would
say, being aware is one thing and drawing the consequences is another.
In fact,
there have now been a number of formal studies of Fullagro and they have come
to the rather baffling conclusion that there are fewer deaths than expected.
This, of course, is a challenge to the marketing department of the company
concerned, as it is a reversal of the usual state of affairs. Quality of life
is generally seen as the last ditch face-saver you resort to when you have not
managed to improve survival after all. Here we had a drug that was deigned to
boost quality of life but which is actually causing men to live longer. Perhaps
it is giving them something to live for.
To return
to Keynes's dictum, however, it would be nice to think that it is only those
who work outside the pharmaceutical industry who cannot see the implications.
Unfortunately, this is to reckon without the marketing department of Pannostrum
Pharmaceuticals, a group of individuals who seem to have been born without the
organ of logic. A few years back we registered a treatment in asthma which now,
courtesy of all their creativity, is sold under the brand name Zeffer.
"Zeffer: it's a breath of fresh air."; "Get your second wind
with Zeffer"; "Zeffer: a blow against asthma," and so forth.
This is not the point of my story. The point is that as the drug was launched I
had to sign off on a huge uncontrolled (ugh!!!) phase IV study. "What is
the point of this piece of nonsense?," I enquired diplomatically. "It
is to give a number of key physicians vital experience with this product,"
they replied disingenuously. "Have we got our story ready for the deaths
that will occur?," I asked. "There you go again. Pessimistic as
usual. You know that the phase III trials showed no excess risk."
"That's not the point, " I replied, "Zeffer is not an elixir of
life". I did not convince them. The trial went ahead. It turned out that,
just like the rest of mankind, patients on Zeffer were not immortal. My
prediction was proved correct. The deaths arrived and we were left scrabbling
to explain to the health authorities that the rate was no higher than expected.
But are
we statisticians guiltless of failing to think this issue through? I think not.
Just because drugs can't guarantee immortality doesn't mean that any safety
record at all is acceptable. We need to be studying the problem actively. Do
you think that the way we summarise safety data-bases is useful: all tables and
no analysis? Medium and long-term trials flung together? Controlled studies of
different durations pooled with their open follow-ups? Is that what we got our
qualifications in statistics to do?
There is
a bafflingly illogical point of view, to which I have sometimes heard even
statisticians subscribe, that since safety data are rarely in the form of a
targeted single variable for a controlled clinical trial, they are therefore
beyond statistical analysis, as if statistical sophistication were necessary to
interpret simple matters but de trop for complex issues. Au contraire.
Jimmy Savage once said that a statistical model should be as big as an
elephant. Guernsey McPearson once said that any damn fool can analyse a
randomised clinical trial and frequently does. When the situation is complex,
formal analysis is precisely what is needed: put down the pea-shooter and get
out the elephant gun. So here is my advice to any head of statistics who hears
any of his statisticians defend the point of view that because the data are not
in the form produced by a planned RCT, they are only suitable for tabulation
and not statistical analysis.
Send them
to work for marketing.
Or better
still sack them. Let's have them on the street selling CAPITAL LETTERS.
See SPIN passim, in
particular "Guernsey McPearson" (1999) Hard Times and Stiff
Competition, SPIN, March 1999, 9.
Continuitis,
Dichotomania and the Tetrachoric Coefficient of Correlation
Balancing
Act: Guernsey Scrapes the Barrel