Pages

Monday 18 June 2018

Prove It


Well, it’s been far too long since I’ve posted anything here, but after a bit of an absence I’m back again and ready to get spreading the word of endometriosis research. Firstly a quick note on where I’ve been. I finally finished my PhD (more on that in the future), which required a year or so of pretty hard concentration in the final race to the finish, this meant I didn’t have as much time write all the other things I like, such as this blog! But I didn’t forget about it and have been constantly reading all the latest endo research and occasionally updating the @EndoUpdate Twitter account ready for my return here.

For this blog post I’d like to talk about something that is not directly related to endometriosis, but is still very important, which I think everyone needs to know a bit about, and that is the criticism of evidence. We use our judgement of evidence in our everyday lives all the time, it’s what helps us survive, and stops us getting all our money stolen by an email claiming to be from a Nigerian prince looking to offer us millions in exchange for all our bank details. For some things the definitive proof of whether something is true or not can be fairly straightforward, or it can be hard to ascertain, but by employing well tested methods of evidence analysis, we can arrive at a more accurate conclusion.

Let’s suppose you meet someone who claims to be able to turn iron into gold, and when you challenge them to do it, they can’t, then that’s pretty good evidence they weren’t telling the truth. But what if they did do it, what if the person waved their hands and instead of a lump of grey metal there was now a shimmering piece of gold? Would you automatically assume they possessed genuine transmutative abilities? Probably not, because the way to the truth and the way in which we analyse evidence is a complex deductive process that can require a great deal of testing to arrive at an acceptable level of certainty that what we have found is correct.

Let’s go back to the previous example, how would you test that the man who appeared to turn iron into gold was actually doing so? More than likely you would suspect he was performing a magic trick, so you might suggest replacing the initial piece of iron with your own piece, or one supplied by an independent person. You might suggest moving the activity to a controlled environment, where multiple people can observe at different positions, you might think of ways to test the supposed ‘gold’ afterwards to check that the outcome was genuine, or filming the trick with a slow motion camera. Either way you would come up with a list of ways to test the idea that this person can turn iron into gold and each way would give you a piece of evidence that would lead to an overall conclusion. The trick is judging what quality of evidence each of those tests would provide. In the biological and medical sciences the assessment of evidence quality is extremely important, because in many cases there are people’s lives resting on the outcome of certain experiments or trials. One of the staples of evidence quality judgement in this field is the hierarchy of evidence, which often takes the form of a pyramid like the one below (although there are minor variations on this, the basic premise remains the same), with the highest quality of evidence at the top and the weakest at the bottom.


This is a simplified example of what constitutes good and bad evidence and is being somewhat superseded by other evidence classification systems such as the GRADE system, which I won’t go into here, but I recommend following the link and reading a bit about.

Let’s talk a little bit about each stage of the hierarchy of evidence and what they mean. At the bottom of the pyramid you have case reports, opinion papers and letters. The last two form expert opinion that, although may be well informed, is still only the view of an individual or small group who may be biased towards ideas they favour themselves or represent one specific view point out of many, equally or more valid ones. Case reports are individual reports of something occurring, we see these fairly frequently in endometriosis, for example here and here are examples published within the last few months. These are still important as they document unusual presentations or novel ways to treat a disease, but they are still only the experience of a small number of people.

Next is animal trials and in vitro studies. This is basic lab based research involving animals or models of disease using cells or tissue grown under lab conditions. This is very useful basic research as it informs the direction for all the above steps in the pyramid, but it has its limitations. For example, animal research has the obvious drawback of not being done in humans, but there are some experiments that need to be performed in the context of the complexity of a living organism, but are simply too dangerous to test on humans. Often animal and in vitro (cell and tissue) experiments may only test a specific component of a disease under certain conditions and not the disease as a whole, so while it gives us an indication as to how pieces of the overall puzzle fit together, or help us to answer a specific question, it doesn’t necessarily tell us how it will translate to the human body. This type of research is very often misinterpreted in media outlets, for example I wrote before about how evidence of this kind can be misrepresented or overblown in media reports, leading to false impressions being given to the reader for the sake of sensationalism. This type of problem is called ‘extrapolation’, in that a piece of work focussing on something narrow, like the behaviour of cells in a lab experiment, is widened to the belief that this same effect will be observed in human body. It would be like seeing a horse running really fast, and observing the horse has legs, and you see you also have legs, and believing that therefore you must be able to run as fast as a horse because you both have legs.

Next up is cross-sectional studies, where information is collected from a group, or groups, of people at a certain point in their life to examine the relationship between a disease and whatever else the researchers are interested in. So let’s say for example we want to know what the diet of women with and without endometriosis is like in Eastern Canada. We could have 1000 questionnaires and send 500 to women with endo (these would be the cases) and 500 women without endometriosis (these would be the controls) in Newfoundland asking them about their diet and compare the responses. Sounds simple enough right? Well yes, on the surface, but there are lots of ways in which a study (and hence the evidence it provides) can be made better or worse. For example, in this hypothetical questionnaire, is it being sent out to cases and controls of roughly the same age? Do the cases have the same type of endo? Do they have the same symptoms? Are they from similar cultures that would have similar diets or very different cultures that would eat different foods? Do they have other conditions that may be affected by diet, or that diet might affect? How are you measuring food intake? Are you sure you have selected enough cases and controls to make the conclusions valid? What do you do if lots of people in one group don’t respond? These are just a few of the considerations that should be made when designing the study and taken into account when assessing the quality of the evidence that the results provide. If any of those considerations aren’t accounted for then this introduces error into your results, the more error there is, the weaker the evidence it provides.

The next rung on the pyramid is case-control studies. We discussed a bit about what cases and controls are in the cross-sectional study, but case-control studies are a bit different. For this type of study we would still be compared with endometriosis to women without like the cross-sectional study, but the information collected on them would be ‘retrospective’ i.e. asking them questions, or looking back through medical records, for information on things that happened in the past. The things we’re looking at could be any number of different categories, from exposure to environmental pollutants, to previous medications, to lifestyle and living conditions over time and so on. Let’s imagine a fictional study where we wanted to know if living in the countryside or the city during childhood years was associated with endometriosis in adult life. We could use a case-control study to ask women about where they lived and how rural/urban that place was. From this information we could compare the case and control groups to make a judgement on whether women with endo were more or less likely to live in an urban environment when they we children. The results of this would be presented as ‘odds’, and would be reported as something like ‘women with endometriosis are X times more likely to live in a city during childhood’. Now, can you see how this could be deliberately or unintentionally misinterpreted by a website, blog or news outlet? A finding such as this could easily become a sensationalist article with a title like ‘city living causes endometriosis!’, but that isn’t what the research was saying at all, it was just saying that women diagnosed with endometriosis as adults were more likely to live in a city as a child. It may be that there is some causal link between city living and endometriosis, but you would need a whole other study to confirm that idea. An important phrase here is ‘correlation does not equal causation’. This means that just because two factors change with one another, does not mean one is caused by the other.
A great example of this is at this link, which shows that, in the US up until 2007, the age of Miss America followed the exact same trend as the number of people killed by steam. Not very likely that those two are actually related. Another issue with case-control studies is that they rely on recalled information, in our example people had to recall the details of the environment they grew up in as children, which relies on the accuracy of human memory, which has inherent flaws. Another problem is that the association with urban living could be due to something else entirely (like the differences in diet between urban and rural communities), a factor that would be overlooked because it wasn’t included in the study design. Case-control studies are therefore good jumping off points for future research but don’t necessarily give the whole picture and the accuracy of their conclusions depends a great deal on how well they were designed. 

Moving up again, we get to the cohort study. Previously we have seen that a cross-sectional study looks at the present, case-control studies look at the past, but prospective cohort studies look at the future. Let’s say we want to investigate how the diet of women with endometriosis affects their day to day pain levels. With a cohort study you would recruit groups of women with endometriosis who have different dietary habits, like vegans, vegetarians, omnivores etc. then give them questionnaires to fill out (or get them to attend regular meetings with you) to gather information about their diet and day to day pain. This would continue for however long you planned the study for, it could be any amount of time really, a year, 3 years or even 20 years (studies conducted over a long period of time are called ‘longitudinal’). However long it was, once you reached the end of the study you would then be able to compare all the data from the different groups and see whether one type of diet was better or worse than the other for day to day endometriosis related pain. One thing you have to be careful of here are ‘cofounding variables’. Because people’s lives are complex and varied, diet is unlikely to be the only thing influencing pain, confounding variables in this study could be all manner of other factors like: exercise level, type of endometriosis, medication, stress, other medical conditions besides endo, healthcare access, financial stability, support network, etc etc. Fortunately, although these confounding variables can lower the significance of a discovery and weaken the strength of evidence it provides, if they are taken into consideration while planning the study and recorded as the study progresses, they can accounted for with fancy mathematical statistics. A solution to the confounding variable problem is ‘matching’ patients, this basically means making sure the women in your chosen groups have similar characteristics, for example making sure they are all of similar ages, BMI, activity levels and on the same medications. This does drastically reduced the confounding problem, but narrows the scope your result to a very particular population.
So let’s imagine you did this study and it turned out that, over 5 years, in Caucasian women with endometriosis aged 20-30, with moderate levels of exercise, not taking any hormonal medication or pain killers, those with a vegetarian diet experienced less pain than vegans or omnivores. It’s a very specific answer to a very specific question, and it would be very wrong for someone else to take that finding and report it as ‘women with endometriosis reduce pain with veggie diet’ because what our imaginary study found was not applicable to all women with endo, just the cohort we selected.

Getting into very strong evidence territory now we arrive at the Randomised Control Trial (RCT).   In the world of medicine, testing whether or not a new treatment, be it a drug or therapy of any kind, is the most crucial part to healing. Therefore having the tools and skill necessary to get good evidence on a new treatment is a shining example of how proper evidence testing can make a huge difference to people’s lives. Imagine that you have invented a drug, you’ve gone through all the lab testing and now you’ve got approval to test it in humans as a new treatment for endometriosis. How do you ensure the test is fair, and provides the best evidence that the drug does or doesn’t work? Fortunately you don’t have to worry about that because hundreds of years of painful scientific work, often costing the lives of many people, have refined the process to a randomised controlled trial. So let’s break down each step to see why it is useful.

Firstly you have a group of volunteers who should be matched like we discussed before i.e. relatively similar in terms of age, disease type, BMI etc, who are willing to test the new drug. The people will be allocated into different groups, but to do this they should be ‘randomised’. That is, the way in which people are put into groups is random, so you could identify people by numbers then use a random number generator to put them in one group or another. This is done to prevent people with certain characteristics (either known or unknown) all being put into one group, essentially it is done to make sure each group has a good mix of people.

The next step is assign a treatment type to each group. In the case of the new drug you have developed, this would be Group 1 and Group 2 for example. Group 1 will be the treatment group who will receive the real drug, while Group 2 will be the ‘control’ group who will receive a placebo or no drug at all. A control group is essential to make sure the outcome you are measuring (in this case let’s say it is pain relief), is due to the intervention you are testing (i.e. your drug). The reason for this is that some people may get better or worse without any treatment, or their symptoms might change as a matter of course. In essence without a control group how would you know that the drug you’re testing was responsible for pain relief and not just people getting better by themselves?
                Placebos play an important part in drug testing, these are things with no medical value whatsoever (like sugar pills) given to patients in the control group, but they look and are administered in the exact same way as the real drug. This is to take into account ‘the placebo effect’, a bizarre but very real effect noted in medical trials, where some people will notice improvements even if they are just taking sugar pills i.e. their body reacts in the same way it would as if it was taking the real drug. This effect doesn’t happen for everyone of course, it only occurs in a very small number of people, but it is enough to alter the results of clinical trial. The placebo effect is a particular problem when testing pain relieving drugs or anxiety and depression modulating drugs, where the psychology of a patient can alter the presentation of their symptoms. There is also the flipside of the placebo effect which is the ‘nocebo’ effect. This is where someone’s symptoms will get worse if they believe what they are taking will harm them, even if it has no active ingredients at all.  A neat little animation about placebos can be found here. You may very well ask, why bother with a placebo at all? Just give the control group an older version of the treatment. Well you’d be pretty sharp in this observation because the use of placebos has raised some ethical concerns, especially when dealing with drug trials for potentially life-saving therapies. Modern clinical trials usually will compare old and new treatments and may just include a placebo group to get a baseline of patient reactions to taking what they believe is a drug treatment.

A further step in an RCT is ‘double blinding’. You remember that the placebo and nocebo effect are problems because they occur when people have an expectation of the outcome, well the same goes for those who are recording the results. If you are a doctor monitoring patients taking a drug being trialled, and you know whether they are taking the real drug or placebo, you are more likely to notice positive effects in the treatment group than the placebo group, because the doctor too is biased by their expectation of a particular outcome. Double-blinding removes this by making sure that neither the patient, nor the doctors administering the treatment, know which the real drug is and which the placebo/old drug is. Usually this is achieved by giving the medications and/or patient groups code numbers instead of names. So patient 1 might be randomised into group 1 and given drug code named ‘DRG01’ by a doctor. Neither the patient nor the doctor knows whether group 1 is the treatment or control group and whether drug ‘DRG01’ is the real drug or fake drug, therefore their expectations cannot influence the results. Once the trial is complete and the results have been collected, the researchers will be given access to what the codes mean and they can decrypt which patient had what drug and do their analysis.

Once enough randomised controlled trails and other scientific investigations have been done on a particular subject, these can be put together into a Meta-analysis or Systematic Review. This is where experts in a particular area gather all the available evidence about a particular subject (like ‘does drug A perform better than drug B at relieving pain?’), review it and write it up as an analysis of the strengths and weaknesses of all the evidence they have found. This is considered to be the highest and most reliable form of evidence, because the weak evidence is sorted from the strong and by comparing the results of many strong studies we can finally arrive at a definitive answer to the original question.

As important as what understanding is evidence, is understanding what is not evidence.  There is an exhaustive list of things that are not evidence, but here are some of the most common ones: YouTube videos, personal anecdotes (something like “oh a friend of friend tried this and it really helped”), gut feelings, websites that don’t reference their sources (or that use weak evidence and over interpretation), random one offs and blind luck (we’ve all heard anecdotes like “my grandad smoked 20 cigarettes a day and lived until he was 90!” while ignoring the thousands upon thousands of people who die early due to smoking related illness) and most news reports about medicine. These are not sources of evidence in any way, shape or form and that is which is proposed without evidence, can be dismissed without evidence. Unfortunately our brain can trick us, or be tricked, into thinking in certain ways that mean we shun evidence in favour of unreliable information. A great book on the subject is ‘Thinking Fast and Slow’ by Daniel Kahneman, which to not do it justice, explains how our brains like to save mental energy where they can. When presented with an argument our brains will default to the easiest conclusion based on our previously held beliefs rather than go for the mentally taxing complex and logical process of information evaluation. Like I said, I haven’t done the book justice, but if you’re interested in why your brain thinks the way it does, it’s worth a read.

A factor that plays into how we respond to evidence is something that is very human and difficult to get rid of, and that is bias. We are all biased in some way, we think our sports team is the best, our kids are the best, our country is the best and even sometimes that our thoughts and ideas are the best and we will often defend those biases against even the most compelling evidence. Scientists are just as prone to bias as anyone else and, although we have systems of evidence assessment I have talked about today, it still creeps in. One of the best examples is ‘conformation bias’, which is looking for evidence that supports our beliefs while ignoring evidence that disproves it. It also encompasses being more critical of evidence that disproves what we believe while ignoring the flaws in evidence that supports them (a good example of this is ‘cherry picking’, where someone will go through hundreds of pieces of evidence until they find one or two that support what they believe). I have seen this many times in my professional life, and I’ll admit I’ve found myself doing it sometimes as well, so it’s a difficult trait to rid yourself of.  

Bias is also prominently evident in scientific publishing, where there is a huge bias towards only positive results being accepted for publication. It is based on the (false) assumption that negative results are inherently not worth as much as positive findings. However knowing that something doesn’t work (like a drug for endometriosis) is just as important as knowing it does work. When we are dealing with people’s health, we must try to put these biases aside for the good of those being treated, if this is not the case and a person, or organisation, are ignoring evidence in favour of their own biases, then they do not have the patients wellbeing at heart, and are instead concerned with someone else (like reputation, fame, or most commonly, money).

Ok so let’s suppose you are a research scientist, you’ve got your snazzy lab coat and have been working hard on a drug that you think would be a great treatment for severe period pain women with endometriosis suffer with. The imaginary drug you have developed is the very creatively named ‘Drug X’. This drug is based on a compound that is naturally found in bananas, but you have modified the chemistry of the compound to make it easier to absorb by the body, so it’s a different compound from the naturally occurring one. Over the years and years you’ve done all your scientific experiments and have managed to perform a well-designed clinical trial which has shown that Drug X is actually effective at reducing severe period pain in women with endometriosis after they have had surgery, you even manage to get the results published in a prestigious journal. Can you take a guess, based on what we’ve learned so far, about what could happen to your results? There is a chain of interpretation of your research, which is very close to the interpretation of research I have seen actually seen happen in real life, that can lead to your findings being misinterpreted and used to spread false information. Below is an example of what could happen after the initial research paper was published.


Although this particular example is based on entirely fictional research, you may see similarities between this and real world reporting of scientific research. I’ve used this as an example of something you may come across in day to day life and with what we have learned throughout this post, hopefully you’ll cast a more suspicious and critical eye over any such claims in the future. In addition, it is often very difficult for someone outside of academic/research institutions to get hold of an original article due to restricted access and most articles being behind expensive paywalls. Similarly, academic writing is often quite specialised and difficult to understand unless you happen to be an expert in that field, so the majority of people rely on interpretation of this research elsewhere and put their trust in other to interpret the research correctly. What I’ve been discussing today hopefully gives you a better idea of how to interpret the interpretations and exercise caution when reading news about endometriosis.

People often think of scientists as being very rigid and closed minded, and in some cases that may be true, but a good scientist is one who is willing to accept any conclusion based on the proper evidence, and the more remarkable the claim, the more remarkable the evidence required to prove it. Similarly a real scientist is willing to change their mind based on new evidence, even if it means rejecting a deeply cherished belief. Of course there are some things we simply don’t know yet, and our natural instinct is to try and fill those gaps in our knowledge with some sort of explanation, but sometimes it ok to say “I don’t know” and wait until the answer is proven. Those gaps in our knowledge though are ripe fruit for those who push fraudulent information and quack remedies. Scientists certainly don’t have all the answers, but the ones we do have are the best ones (for now).


If you read that someone is claiming they have a cure for endometriosis, hopefully what I’ve written here will give you the basis to understand the evidence they present to support that claim (if any) and how strong or weak that evidence is. There are also other further reading resources to really sharpen your evidence assessing skills, such as these few to get started:

Ben Goldacre’s Books – Bad Science and Bad Pharma and accompanying blog

The Systems to Rate the Strength of Scientific Evidence – by the Agency for Healthcare Research and Quality


A favourite quote of mine, which has been attributed to many people over the years is “keep your mind open – but not so much that your brain falls out