Well, it’s
been far too long since I’ve posted anything here, but after a bit of an
absence I’m back again and ready to get spreading the word of endometriosis
research. Firstly a quick note on where I’ve been. I finally finished my PhD
(more on that in the future), which required a year or so of pretty hard concentration
in the final race to the finish, this meant I didn’t have as much time write
all the other things I like, such as this blog! But I didn’t forget about it
and have been constantly reading all the latest endo research and occasionally updating
the @EndoUpdate Twitter account ready for my return here.
For this
blog post I’d like to talk about something that is not directly related to endometriosis, but is still very important,
which I think everyone needs to know a bit about, and that is the criticism of
evidence. We use our judgement of evidence in our everyday lives all the time,
it’s what helps us survive, and stops us getting all our money stolen by an
email claiming to be from a Nigerian prince looking to offer us millions in
exchange for all our bank details. For some things the definitive proof of
whether something is true or not can be fairly straightforward, or it can be
hard to ascertain, but by employing well tested methods of evidence analysis,
we can arrive at a more accurate conclusion.
Let’s
suppose you meet someone who claims to be able to turn iron into gold, and when
you challenge them to do it, they can’t, then that’s pretty good evidence they
weren’t telling the truth. But what if they did do it, what if the person waved
their hands and instead of a lump of grey metal there was now a shimmering
piece of gold? Would you automatically assume they possessed genuine
transmutative abilities? Probably not, because the way to the truth and the way
in which we analyse evidence is a complex deductive process that can require a
great deal of testing to arrive at an acceptable level of certainty that what
we have found is correct.
Let’s go
back to the previous example, how would you test that the man who appeared to
turn iron into gold was actually doing so? More than likely you would suspect
he was performing a magic trick, so you might suggest replacing the initial
piece of iron with your own piece, or one supplied by an independent person.
You might suggest moving the activity to a controlled environment, where
multiple people can observe at different positions, you might think of ways to
test the supposed ‘gold’ afterwards to check that the outcome was genuine, or
filming the trick with a slow motion camera. Either way you would come up with
a list of ways to test the idea that this person can turn iron into gold and
each way would give you a piece of evidence that would lead to an overall
conclusion. The trick is judging what quality of evidence each of those tests
would provide. In the biological and medical sciences the assessment of
evidence quality is extremely important, because in many cases there are
people’s lives resting on the outcome of certain experiments or trials. One of
the staples of evidence quality judgement in this field is the hierarchy
of evidence, which often takes the form of a pyramid like the one below
(although there are minor variations on this, the basic premise remains the
same), with the highest quality of evidence at the top and the weakest at the
bottom.
This is a
simplified example of what constitutes good and bad evidence and is being somewhat
superseded by other evidence classification systems such as the GRADE
system, which I won’t go into here, but I recommend following the link and
reading a bit about.
Let’s talk a
little bit about each stage of the hierarchy of evidence and what they mean. At
the bottom of the pyramid you have case reports, opinion papers and letters. The
last two form expert opinion that,
although may be well informed, is still only the view of an individual or small
group who may be biased towards ideas they favour themselves or represent one
specific view point out of many, equally or more valid ones. Case reports are
individual reports of something occurring, we see these fairly frequently in
endometriosis, for example here and here are examples
published within the last few months. These are still important as they
document unusual presentations or novel ways to treat a disease, but they are
still only the experience of a small number of people.
Next is animal trials and in vitro studies. This is basic lab based research involving
animals or models of disease using cells or tissue grown under lab conditions.
This is very useful basic research as it informs the direction for all the
above steps in the pyramid, but it has its limitations. For example, animal
research has the obvious drawback of not being done in humans, but there are
some experiments that need to be performed in the context of the complexity of
a living organism, but are simply too dangerous to test on humans. Often animal
and in vitro (cell and tissue)
experiments may only test a specific component of a disease under certain
conditions and not the disease as a whole, so while it gives us an indication
as to how pieces of the overall puzzle fit together, or help us to answer a
specific question, it doesn’t necessarily tell us how it will translate to the
human body. This type of research is very often misinterpreted in media
outlets, for example I wrote before
about how evidence of this kind can be misrepresented or overblown in media
reports, leading to false impressions being given to the reader for the sake of
sensationalism. This type of problem is called ‘extrapolation’, in that a piece
of work focussing on something narrow, like the behaviour of cells in a lab
experiment, is widened to the belief that this same effect will be observed in
human body. It would be like seeing a horse running really fast, and observing
the horse has legs, and you see you also have legs, and believing that
therefore you must be able to run as fast as a horse because you both have
legs.
Next up is cross-sectional studies, where
information is collected from a group, or groups, of people at a certain point
in their life to examine the relationship between a disease and whatever else
the researchers are interested in. So let’s say for example we want to know
what the diet of women with and without endometriosis is like in Eastern
Canada. We could have 1000 questionnaires and send 500 to women with endo (these
would be the cases) and 500 women
without endometriosis (these would be the controls)
in Newfoundland asking them about their diet and compare the responses. Sounds
simple enough right? Well yes, on the surface, but there are lots of ways in
which a study (and hence the evidence it provides) can be made better or worse.
For example, in this hypothetical questionnaire, is it being sent out to cases
and controls of roughly the same age? Do the cases have the same type of endo?
Do they have the same symptoms? Are they from similar cultures that would have
similar diets or very different cultures that would eat different foods? Do
they have other conditions that may be affected by diet, or that diet might
affect? How are you measuring food intake? Are you sure you have selected
enough cases and controls to make the conclusions valid? What do you do if lots
of people in one group don’t respond? These are just a few of the
considerations that should be made when designing the study and taken into account
when assessing the quality of the evidence that the results provide. If any of
those considerations aren’t accounted for then this introduces error into your results, the more error
there is, the weaker the evidence it provides.
The next
rung on the pyramid is case-control
studies. We discussed a bit about what cases and controls are in the
cross-sectional study, but case-control studies are a bit different. For this
type of study we would still be compared with endometriosis to women without
like the cross-sectional study, but the information collected on them would be
‘retrospective’ i.e. asking them questions, or looking back through medical
records, for information on things that happened in the past. The things we’re
looking at could be any number of different categories, from exposure to
environmental pollutants, to previous medications, to lifestyle and living
conditions over time and so on. Let’s imagine a fictional study where we wanted
to know if living in the countryside or the city during childhood years was associated
with endometriosis in adult life. We could use a case-control study to ask
women about where they lived and how rural/urban that place was. From this
information we could compare the case and control groups to make a judgement on
whether women with endo were more or less likely to live in an urban
environment when they we children. The results of this would be presented as
‘odds’, and would be reported as something like ‘women with endometriosis are X
times more likely to live in a city during childhood’. Now, can you see how
this could be deliberately or unintentionally misinterpreted by a website, blog
or news outlet? A finding such as this could easily become a sensationalist
article with a title like ‘city living causes endometriosis!’, but that isn’t
what the research was saying at all, it was just saying that women diagnosed
with endometriosis as adults were more likely to live in a city as a child. It
may be that there is some causal link between city living and endometriosis,
but you would need a whole other study to confirm that idea. An important
phrase here is ‘correlation does not equal causation’. This means that just
because two factors change with one another, does not mean one is caused by the
other.
A great example of this is at this
link, which shows that, in the US up until 2007, the age of Miss America
followed the exact same trend as the number of people killed by steam. Not very
likely that those two are actually related. Another issue with case-control
studies is that they rely on recalled information, in our example people had to
recall the details of the environment they grew up in as children, which relies
on the accuracy of human memory, which has inherent flaws. Another problem is
that the association with urban living could be due to something else entirely
(like the differences in diet between urban and rural communities), a factor
that would be overlooked because it wasn’t included in the study design.
Case-control studies are therefore good jumping off points for future research
but don’t necessarily give the whole picture and the accuracy of their
conclusions depends a great deal on how well they were designed.
Moving up
again, we get to the cohort study.
Previously we have seen that a cross-sectional study looks at the present,
case-control studies look at the past, but prospective cohort studies look at
the future. Let’s say we want to investigate how the diet of women with
endometriosis affects their day to day pain levels. With a cohort study you
would recruit groups of women with endometriosis who have different dietary
habits, like vegans, vegetarians, omnivores etc. then give them questionnaires
to fill out (or get them to attend regular meetings with you) to gather
information about their diet and day to day pain. This would continue for
however long you planned the study for, it could be any amount of time really, a
year, 3 years or even 20 years (studies conducted over a long period of time
are called ‘longitudinal’). However long it was, once you reached the end of
the study you would then be able to compare all the data from the different
groups and see whether one type of diet was better or worse than the other for
day to day endometriosis related pain. One thing you have to be careful of here
are ‘cofounding variables’. Because people’s lives are complex and varied, diet
is unlikely to be the only thing influencing pain, confounding variables in
this study could be all manner of other factors like: exercise level, type of
endometriosis, medication, stress, other medical conditions besides endo,
healthcare access, financial stability, support network, etc etc. Fortunately,
although these confounding variables can lower the significance of a discovery
and weaken the strength of evidence it provides, if they are taken into
consideration while planning the study and recorded as the study progresses,
they can accounted for with fancy mathematical statistics. A solution to the
confounding variable problem is ‘matching’ patients, this basically means
making sure the women in your chosen groups have similar characteristics, for
example making sure they are all of similar ages, BMI, activity levels and on
the same medications. This does drastically reduced the confounding problem,
but narrows the scope your result to a very particular population.
So let’s imagine you did this study and it turned out that, over 5
years, in Caucasian women with endometriosis aged 20-30, with moderate levels
of exercise, not taking any hormonal medication or pain killers, those with a
vegetarian diet experienced less pain than vegans or omnivores. It’s a very
specific answer to a very specific question, and it would be very wrong for
someone else to take that finding and report it as ‘women with endometriosis
reduce pain with veggie diet’ because what our imaginary study found was not
applicable to all women with endo, just the cohort we selected.
Getting into
very strong evidence territory now we arrive at the Randomised Control Trial (RCT). In the
world of medicine, testing whether or not a new treatment, be it a drug or
therapy of any kind, is the most crucial part to healing. Therefore having the
tools and skill necessary to get good evidence on a new treatment is a shining
example of how proper evidence testing can make a huge difference to people’s
lives. Imagine that you have invented a drug, you’ve gone through all the lab
testing and now you’ve got approval to test it in humans as a new treatment for
endometriosis. How do you ensure the test is fair, and provides the best
evidence that the drug does or doesn’t work? Fortunately you don’t have to
worry about that because hundreds of years of painful scientific work, often
costing the lives of many people, have refined the process to a randomised
controlled trial. So let’s break down each step to see why it is useful.
Firstly you
have a group of volunteers who should be matched like we discussed before i.e. relatively
similar in terms of age, disease type, BMI etc, who are willing to test the new
drug. The people will be allocated into different groups, but to do this they
should be ‘randomised’. That is, the way in which people are put into groups is
random, so you could identify people by numbers then use a random number
generator to put them in one group or another. This is done to prevent people
with certain characteristics (either known or unknown) all being put into one
group, essentially it is done to make sure each group has a good mix of people.
The next
step is assign a treatment type to each group. In the case of the new drug you
have developed, this would be Group 1 and Group 2 for example. Group 1 will be
the treatment group who will receive the real drug, while Group 2 will be the
‘control’ group who will receive a placebo or no drug at all. A control group
is essential to make sure the outcome
you are measuring (in this case let’s say it is pain relief), is due to the intervention you are testing (i.e. your
drug). The reason for this is that some people may get better or worse without
any treatment, or their symptoms might change as a matter of course. In essence
without a control group how would you know that the drug you’re testing was
responsible for pain relief and not just people getting better by themselves?
Placebos play an important part
in drug testing, these are things with no medical value whatsoever (like sugar
pills) given to patients in the control group, but they look and are
administered in the exact same way as the real drug. This is to take into
account ‘the placebo effect’, a bizarre but very real effect noted in medical
trials, where some people will notice improvements even if they are just taking
sugar pills i.e. their body reacts in the same way it would as if it was taking
the real drug. This effect doesn’t happen for everyone of course, it only
occurs in a very small number of people, but it is enough to alter the results
of clinical trial. The placebo effect is a particular problem when testing pain
relieving drugs or anxiety and depression modulating drugs, where the
psychology of a patient can alter the presentation of their symptoms. There is
also the flipside of the placebo effect which is the ‘nocebo’ effect. This is
where someone’s symptoms will get worse if they believe what they are taking
will harm them, even if it has no active ingredients at all. A neat little animation about placebos can be
found here. You may
very well ask, why bother with a placebo at all? Just give the control group an
older version of the treatment. Well you’d be pretty sharp in this observation
because the use of placebos has raised some ethical concerns, especially when
dealing with drug trials for potentially life-saving therapies. Modern clinical
trials usually will compare old and new treatments and may just include a
placebo group to get a baseline of patient reactions to taking what they
believe is a drug treatment.
A further
step in an RCT is ‘double blinding’. You remember that the placebo and nocebo
effect are problems because they occur when people have an expectation of the
outcome, well the same goes for those who are recording the results. If you are
a doctor monitoring patients taking a drug being trialled, and you know whether
they are taking the real drug or placebo, you are more likely to notice
positive effects in the treatment group than the placebo group, because the
doctor too is biased by their expectation of a particular outcome.
Double-blinding removes this by making sure that neither the patient, nor the
doctors administering the treatment, know which the real drug is and which the
placebo/old drug is. Usually this is achieved by giving the medications and/or
patient groups code numbers instead of names. So patient 1 might be randomised
into group 1 and given drug code named ‘DRG01’ by a doctor. Neither the patient
nor the doctor knows whether group 1 is the treatment or control group and
whether drug ‘DRG01’ is the real drug or fake drug, therefore their
expectations cannot influence the results. Once the trial is complete and the
results have been collected, the researchers will be given access to what the
codes mean and they can decrypt which patient had what drug and do their
analysis.
Once enough
randomised controlled trails and other scientific investigations have been done
on a particular subject, these can be put together into a Meta-analysis or Systematic Review. This is where experts in a
particular area gather all the available evidence about a particular subject
(like ‘does drug A perform better than drug B at relieving pain?’), review it
and write it up as an analysis of the strengths and weaknesses of all the
evidence they have found. This is considered to be the highest and most
reliable form of evidence, because the weak evidence is sorted from the strong
and by comparing the results of many strong studies we can finally arrive at a
definitive answer to the original question.
As important
as what understanding is evidence, is
understanding what is not evidence. There is an exhaustive list of things that are
not evidence, but here are some of the most common ones: YouTube videos,
personal anecdotes (something like “oh a friend of friend tried this and it
really helped”), gut feelings, websites that don’t reference their sources (or
that use weak evidence and over interpretation), random one offs and blind luck
(we’ve all heard anecdotes like “my grandad smoked 20 cigarettes a day and
lived until he was 90!” while ignoring the thousands upon thousands of people
who die early due to smoking related illness) and most news reports about
medicine. These are not sources of evidence in any way, shape or form and that
is which is proposed without evidence, can be dismissed without evidence. Unfortunately
our brain can trick us, or be tricked, into thinking in certain ways that mean
we shun evidence in favour of unreliable information. A great book on the
subject is ‘Thinking Fast and Slow’ by Daniel Kahneman, which to not do it
justice, explains how our brains like to save mental energy where they can.
When presented with an argument our brains will default to the easiest
conclusion based on our previously held beliefs rather than go for the mentally
taxing complex and logical process of information evaluation. Like I said, I
haven’t done the book justice, but if you’re interested in why your brain
thinks the way it does, it’s worth a read.
A factor
that plays into how we respond to evidence is something that is very human and
difficult to get rid of, and that is bias. We are all biased in some way, we
think our sports team is the best, our kids are the best, our country is the
best and even sometimes that our thoughts and ideas are the best and we will
often defend those biases against even the most compelling evidence. Scientists
are just as prone to bias as anyone else and, although we have systems of
evidence assessment I have talked about today, it still creeps in. One of the
best examples is ‘conformation bias’, which is looking for evidence that
supports our beliefs while ignoring evidence that disproves it. It also
encompasses being more critical of evidence that disproves what we believe
while ignoring the flaws in evidence that supports them (a good example of this
is ‘cherry picking’, where someone will go through hundreds of pieces of
evidence until they find one or two that support what they believe). I have
seen this many times in my professional life, and I’ll admit I’ve found myself
doing it sometimes as well, so it’s a difficult trait to rid yourself of.
Bias is also
prominently evident in scientific publishing, where there is a huge bias
towards only positive results being accepted for publication. It is based on
the (false) assumption that negative results are inherently not worth as much
as positive findings. However knowing that something doesn’t work (like a drug
for endometriosis) is just as important as knowing it does work. When we are
dealing with people’s health, we must try to put these biases aside for the
good of those being treated, if this is not the case and a person, or
organisation, are ignoring evidence in favour of their own biases, then they do
not have the patients wellbeing at heart, and are instead concerned with
someone else (like reputation, fame, or most commonly, money).
Ok so let’s
suppose you are a research scientist, you’ve got your snazzy lab coat and have
been working hard on a drug that you think would be a great treatment for severe
period pain women with endometriosis suffer with. The imaginary drug you have
developed is the very creatively named ‘Drug X’. This drug is based on a
compound that is naturally found in bananas, but you have modified the
chemistry of the compound to make it easier to absorb by the body, so it’s a
different compound from the naturally occurring one. Over the years and years
you’ve done all your scientific experiments and have managed to perform a
well-designed clinical trial which has shown that Drug X is actually effective
at reducing severe period pain in
women with endometriosis after they have had surgery, you even manage to get
the results published in a prestigious journal. Can you take a guess, based on
what we’ve learned so far, about what could happen to your results? There is a
chain of interpretation of your research, which is very close to the
interpretation of research I have seen actually seen happen in real life, that
can lead to your findings being misinterpreted and used to spread false
information. Below is an example of what could happen after the initial
research paper was published.
Although this particular example is based on entirely
fictional research, you may see similarities between this and real world reporting
of scientific research. I’ve used this as an example of something you may come
across in day to day life and with what we have learned throughout this post,
hopefully you’ll cast a more suspicious and critical eye over any such claims
in the future. In addition, it is often very difficult for someone outside of
academic/research institutions to get hold of an original article due to
restricted access and most articles being behind expensive paywalls. Similarly,
academic writing is often quite specialised and difficult to understand unless
you happen to be an expert in that field, so the majority of people rely on
interpretation of this research elsewhere and put their trust in other to
interpret the research correctly. What I’ve been discussing today hopefully
gives you a better idea of how to interpret the interpretations and exercise
caution when reading news about endometriosis.
People often
think of scientists as being very rigid and closed minded, and in some cases
that may be true, but a good scientist is one who is willing to accept any
conclusion based on the proper evidence, and the more remarkable the claim, the
more remarkable the evidence required to prove it. Similarly a real scientist
is willing to change their mind based on new evidence, even if it means
rejecting a deeply cherished belief. Of course there are some things we simply
don’t know yet, and our natural instinct is to try and fill those gaps in our
knowledge with some sort of explanation, but sometimes it ok to say “I don’t
know” and wait until the answer is proven. Those gaps in our knowledge though
are ripe fruit for those who push fraudulent information and quack remedies.
Scientists certainly don’t have all the answers, but the ones we do have are
the best ones (for now).
If you read that
someone is claiming they have a cure for endometriosis, hopefully what I’ve
written here will give you the basis to understand the evidence they present to
support that claim (if any) and how strong or weak that evidence is. There are
also other further reading resources to really sharpen your evidence assessing
skills, such as these few to get started:
The Systems to Rate the
Strength of Scientific Evidence – by the Agency for Healthcare Research and
Quality
Assessing
the Strength of Evidence – by gov.uk
A favourite
quote of mine, which has been attributed to many people over the years is “keep your mind open – but not so much that
your brain falls out”