Related to this topic: Patient+ | Weblinks | Equipment | Books | Your Experience | Other resources | Glossaries
Print options:
Other options:
(what's this?)
PatientPlus articles are written for doctors and so the language can be technical. However, some people find that they add depth to the articles found in the other sections of this website which are written for non-medical people.
Different levels of evidence - critical reading
Evidence-based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients.
In criminal law, a jury may spend hours or even days weighing up the evidence in a case before bringing a simple verdict of guilty or not guilty. In Scotland there is an additional option of not proven which in English law has to be given as not guilty. In criminal law the onus of proof is beyond reasonable doubt whilst in civil law on the balance of probabilities will suffice. Medicine also requires a diligent weighing and assessment of the evidence but the verdict tends to be more complex than simply effective or ineffective and the verdict is often given to reflect the weight of the evidence.
Questions to be asked include:
- What is the evidence?
- How reliable is the methodology? Bad methodology is rarely obvious from reading the paper or the editor would not have published it. It usually comes to light some time later.
- How convincing is the result? Perhaps the P result was rather unimpressive or perhaps a small sample gave a large confidence interval. Was one given?
- Are there alternative explanations? Association and causation are not the same.
- Is there selective publication? This is suggested by many papers showing unspectacular results and meta-analysis may be misleading.
- Is there a conflict of interest? Just because a pharmaceutical company finances a study of its products does not mean that the paper is corrupt or invalid but the reader may be a little more wary.
Usually the evidence is not a single paper but many, perhaps with conflicting results.
Potential pitfalls:
- Lack of evidence of efficacy should not be confused with evidence of lack of efficacy. The former says that there is insufficient evidence to pass judgement. The latter says it does not work.
- The more data is pooled the less relevant it becomes to individual patients.1 Meta-analysis can be a useful tool but it has some important limitations.
- The evidence-based approach may be more applicable to the use of drugs than to other treatment modalities, particularly holistic therapies. That is not to suggest that complementary therapies are not amenable to the scientific method.
- Medical journals cannot always prevent papers from being ghostwritten by pharmaceutical companies.2
- Case study methodology (which evidence-based medicine has virtually replaced) focuses on individual patients rather than populations.
A good doctor uses both individual clinical expertise and the best available external evidence, and neither alone is enough.
Making decisions:
When faced with evidence a doctor has to ask three questions:
- Is the evidence valid?
- Is it important?
- Is it applicable to the patient in front of me?
Is it valid?
Fortunately, when researching a topic,it is rarely necessary to perform a full literature search as it is often possible to find that the subject has already been discussed and the evidence distilled. There may have been a leading article, written by an expert, in a recent journal. Unfortunately, leading articles are often written by those with a vested interest, even an axe to grind and all too often there is lack of objectivity. This may be apparent from reading the article. A systematic review, as its name implies, is a rather different matter. The search of the literature is very thorough. The evidence is weighed more judiciously and the presentation should be unbiased.
Producing a systematic review is a long and arduous task. Sometimes such reviews appear in journals and they can be found by a PubMed search. Enter the subject and "review" in the PubMed search box. The Cochrane Collaboration produce many systematic reviews of high quality with reviewers from all over the world. Many different organisation like NICE, Prodigy, Bandolier, SIGN and the York Centre for Review and Dissemination review Evidence-based Medicine and produce guidelines. Other examples include the Royal College of Physicians' review of the management of stroke3 or the Royal College of Obstetricians and Gynaecologists' review of intrapartum fetal monitoring.4 Each stratifies the level of evidence with every recommendation that it makes and whilst there is much similarity between nomenclature, there is no uniformity. This is something that would be well worth addressing at an international level in the near future. Hence, before starting to read such recommendations, it is worthwhile checking what their levels of evidence means.
As an illustration the following is taken from the NICE clinical guidelines with regard to COPD.5 The hierarchy of evidence and the recommendation grading's relate to the strength of the literature not to clinical importance.
Hierarchy of evidence:
| Level of evidence | Type of evidence |
|---|---|
| Ia | Evidence from systematic reviews or meta-analysis of RCTs |
| IIb | Evidence from at least one RCT |
| IIIa | Evidence from at least one controlled study without randomisation |
| IIb | Evidence from at least one other type of quasi experimental study |
| III | Evidence from non experimental descriptive studies, such as comparative studies, correlation studies and case control studies |
| IV | Evidence from expert committee reports or opinions and/or clinical experience of respected authorities |
| NICE | Evidence from NICE guidelines or Health Technology Appraisal programme |
| HSC | Evidence from Health Service Circulars |
Grading of recommendations:
| Level of evidence | Type of evidence |
|---|---|
| A | Based on hierarchy I evidence |
| B | Based on hierarchy II evidence or extrapolated from hierarchy I evidence |
| C | Based on hierarchy III evidence or extrapolated from hierarchy I or II evidence |
| D | Directly based on hierarchy IV evidence or extrapolated from hierarchy I, II or III evidence |
| NICE | SC from NICE guidelines or Health Technology Appraisal programme |
| HSC | Evidence from Health Service Circulars |
A simpler system of A,B or C is recommended by the US Government Agency for Health Care Policy and Research (AHCPR).
- Requires at least one randomised controlled trial as part of the body of evidence.
- Requires availability of well-conducted clinical studies but no randomised controlled trials in the body of evidence.
- Requires evidence from expert committee reports or opinions and/ or clinical experience of respected authorities. Indicates absence of directly applicable studies of good quality.
Expert opinion must not to be confused with personal experience that is sometimes called eminence-based medicine. Expert opinion is the lowest level of evidence and is quite correctly below experimental evidence but in the absence of experimental evidence it is the best guide available.
Is it important? The importance of a finding depends upon the significance of the event and the level of risk.
Thus an increase of 50%, raising the incidence from 4 in 10 to 6 in 10 is very important whilst a ten-fold increase in incidence may be ignored if it raises the risk from 1 in 50 million to 1 in 5 million. The significance of the event, in the context, is also important. Hence a 15% risk of a minor complication to a life-saving procedure is probably acceptable but if there was a 15% chance that a plane would crash no one would travel by air.
Is it relevant? Look at the subjects in the original research and ask if they are comparable with your patient. A study on the bushmen of the Kalihari may not be applicable to the commuters of Tunbridge Wells. Two areas of concern are the management of heart failure and use of statins in the elderly. For good reasons of methodology most trials for heart failure used patients with no other medical conditions. Hence they tended to be between 55 and 70 years old. The typical patient who presents in the surgery is over 80 with multiple problems and on a variety of medication. The large and original trials for statins usually had a maximum age for subjects of 75. NNT falls with increasing age and so, by extrapolation, it may be argued that they will be even more effective in the elderly. However, extrapolation may not be valid and until more recent trials in older people there was no direct evidence that statins helped people over 75. In both cases common sense and clinical judgement must prevail.
What is proof? The question sounds rather like "What is truth?" and is just as contentious. In coping with uncertainty in primary care (when published)the meaning of statistics and probability is discussed. Even the most impressive P value is not absolute proof but suggests that it is extremely unlikely that there is no effect and the results were obtained by chance. To be able to understand the validity of evidence it is important to be able to understand possible shortcomings or common errors of methodology.
Randomised controlled trials are regarded as the gold standard of clinical research and of those, the purest of gold has double blind placebo control. This is fine for drug trials although sometimes side-effects of drugs will make it obvious who is taking what. It is more difficult for other techniques such as sham acupuncture or sham manipulation although some cunning and devious devices such as retracting acupuncture needles have been used. The more we learn about the placebo effect, the more impressive and important it becomes. A survey of RCTs published in the BMJ in 2001 found that only 17% had a placebo control. Many were educational interventions and "dummy education" does not seem feasible but the Hawthorne effect6 can be very real. Placebo surgery in which the patient is simply opened and closed would be most unlikely to receive ethics committee approval. Hence lack of placebo control should not necessarily be seen as poor methodology. When there is no placebo and the patient knows which group he is in it is called open label.
Two much more serious shortcomings are failure of randomisation and failure to analyse by intention to treat. When a patient is recruited to a trial it should not be apparent which group he will enter as such knowledge may affect recruitment. Allocation is irreversible. If a patient is unable or unwilling to undergo the treatment or intervention he is still included in the original group for the purpose of analysis. There are stories of cancer trials in which patients who were too ill to receive therapy were put in the control group. In a paper about the use of an educational package to reduce teenage pregnancy, one group was offered an educational package and the other was not. However, those who were offered but refused were transferred to the control group. The methodological requirements of RCTs are very strict and anyone wishing to conduct one must be aware of the many and onerous requirements. Usually these failures of methodology are uncovered some time after publication and, no doubt, many are never discovered.
Meta-analysis: The bigger a trial, the narrower will be the confidence limits and the more likely is statistical significance to be achieved. Basically, what a meta-analysis does is to take perhaps 10 trials of 100 patients and to combine the results as if it were a trial of 1,000 patients. Although such a technique rates highly with systematic reviewers it is fraught with danger. The methodology may not be identical and so they may be measuring slightly different parameters. Errors may be compounded but perhaps the greatest bias is selective publication. In 1997 a paper in The Lancet extracted just papers of good methodology on the topic of homeopathy.7 Comparatively few had a P value beyond 0.05 yet in none was the placebo found to be superior to the remedy. This should have been an overwhelming alert of selective publication but the results were summated and the conclusion was that homeopathy is effective. P<0.05. In 2001 an education paper in the BMJ used this work as an illustration of selective publication and the problem with uncritical meta-analysis.8 By using a technique called funnel plotting and cut and fill the authors discovered that many negative papers had been unpublished and concluded from the evidence that homeopathy is of no value. Nowadays a good meta-analysis should contain funnel plotting with cut and fill to assess the completeness of publication.
A large, well conducted trial is far more valuable than a meta-analysis. The need to publish negative as well as positive results, especially if sample size was adequate, is emphasised. It is as important to know what does not work as to know what does.
Longitudinal or cohort studies: Here a group of people are followed over many years to ascertain how variables such as smoking habits, exercise, occupation and geography may affect outcome. Prospective studies are more highly rated than retrospective ones although the former obviously take many years to perform. A classic of its type is Sir Richard Dolls's work in which he followed up a cohort of doctors from the 1950s for many decades, reporting intermittently.9 His achievements, in terms of methodology include a large number of subjects, narrow social spectrum, remarkably few lost to follow up, reliable data about outcome and a very long duration of study. His decision to include only male doctors reduced one variable but makes his work less applicable to women. In the 1950s only about 10 to 15% of doctors were women.
When viewing such a study there are a number of questions to ask:
- Is it a prospective or retrospective study? The latter is more likely to produce bias.
- How big was the sample? The million women study of HRT was an incredible achievement.10
- Can men and women be analysed separately? This may account for bias or it may show a sex difference.
- How reliable is the data extracted? Even death certificates can give an inaccurate picture. Lay people may report metastatic cancer in the lungs as lung cancer and stomach cancer may be any malignancy under the diaphragm.
- What is the rate of loss of follow-up? This may be another source of bias and error.
Qualitative research: Some aspects of care are not amenable to quantification although not as many as may be thought. A score of 1 to 5 may be put on replies such as yes definitely, probably yes, not sure, probably not and definitely not. Even a subjective sensation of pain can be quantified. There are still areas in which qualitative methodology is required. Simply making up a questionnaire is inadequate and qualitative tools have to be validated. This is very demanding. The easiest way is to use a tool that has previously been validated, such as the Hospital Anxiety and Depression Scale or the Geriatric Depression Scale. Unvalidated qualitative research is unlikely to appear in peer reviewed journals but it may appear through local research networks and it must be identified as such.
Association and causation: Because two things are in some way linked does not necessarily means that one causes the other. It is worth noting the key principles for any scientific study that seeks to prove causation rather than merely association, as set out by Sir Austin Bradford Hill in 1965:
- Is there evidence from true experiments in humans?
- Is the association strong?
- Is the association consistent from study to study?
- Is the temporal relationship appropriate? Did the postulated cause precede the postulated effect?
- Is there a dose-response gradient? The more cigarettes are smoked the greater the risk of lung cancer.
- Does the association make epidemiological sense? A link between passive smoking and breast cancer is nonsensical when there is no link between active smoking and breast cancer.
- Does the association make biological sense?
- Is the association specific?
- Is the association analogous to a previously proven causal association?
Health economics and effective healthcare: Evidence-based medicine asks simply if an intervention is effective but we cannot ignore such matters as cost and value for money. It may be that clopidogrel is more effective than aspirin in preventing atherosclerosis but should we change over all our patients? The financial implications would be enormous. The NHS has a finite budget and so money that is spent in one field reduces what is available to spend in another. Numbers needed to treat (NNT) become very important. We may be told that a new treatment for cancer halves the rate of recurrence. Superficially this seems very impressive but if this means that the rate of recurrence falls from 20% to 10% it means that it is necessary to give 10 patients the new treatment to prevent 1 recurrence. If the new treatment costs £20,000 more than the conventional treatment, then the cost of preventing one recurrence is £200,000. There is further discussion about presentation and interpretation in coping with uncertainty in primary care (when published).
Cancer, myocardial infarction and death are all clear indices to measure but the quality of life is extremely important. Total hip replacement may not save a life but it usually improves the quality quite considerably. There is an index called QALY (quality adjusted life years)11 that may be used for such parameters as pain, incontinence and disability. It is another example of how "soft criteria" may be given numerical value. The management of small and pre-term infants has resulted in the salvage of many who would formerly have died but is the effort justified in terms of long term handicap? Modern medicine provides ethics with many challenges. Value for money is a judgement that we cannot ignore.
- Tonelli MR; The limits of evidence-based medicine.;Respir Care. 2001 Dec;46(12):1435-40; discussion 1440-1.[abstract]
- Flanagin A, Carey LA, Fontanarosa PB, et al; Prevalence of articles with honorary authors and ghost authors in peer-reviewed medical journals.;JAMA. 1998 Jul 15;280(3):222-4.[abstract]
- Royal College of Physicians National Clinical Guidelines for Stroke, 2nd edition. June 2004.
- Royal College of Obstetricians & Gynaecologists The use fo electronic fetal monitoring. May 2001.
- Hierarchy of evidence and grading of recommendations Thorax 2004;59;13-14
- Wickstrom G, Bendix T; The "Hawthorne effect"--what did the original Hawthorne studies actually show?;Scand J Work Environ Health. 2000 Aug;26(4):363-7.[abstract]
- Linde K, Clausius N, Ramirez G, et al; Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials.;Lancet. 1997 Sep 20;350(9081):834-43.[abstract]
- Sterne JAC, Egger M, Smith GD Systematic reviews in health care. Investigating and dealing with publication and other biases in meta-analysis BMJ 2001;323:101-105 ( 14 July )
- Doll R, Peto R, Boreham J, et al; Mortality in relation to smoking: 50 years' observations on male British doctors.;BMJ. 2004 Jun 26;328(7455):1519. Epub 2004 Jun 22.[abstract]
- Beral V, Bull D, Reeves G; Endometrial cancer and hormone-replacement therapy in the Million Women Study.;Lancet. 2005 Apr 30-May 6;365(9470):1543-51.[abstract]
- Johannesson M; QALYs, HYEs and individual preferences--a graphical illustration.;Soc Sci Med. 1994 Dec;39(12):1623-32.[abstract]
Internet:
- Greenhalgh T How to read a paper: The Medline database BMJ 1997;315:180-183 (19 July)
- Greenhalgh T How to read a paper: getting your bearings (deciding what the paper is about) BMJ 1997;315:243-246 (26 July)
- Greenhalgh T How to read a paper: Assessing the methodological quality of published papers BMJ 1997;315:305-308 (2 August)
- Greenhalgh T How to read a paper: Papers that report diagnostic or screening tests BMJ 1997;315:540-543 (30 August)
- Greenhalgh T How to read a paper: Statistics for the non-statistician. I: Different types of data need different statistical tests BMJ 1997;315:364-366 (9 August)
- Greenhalgh T How to read a paper: Statistics for the non-statistician. II: "Significant" relations and their pitfalls BMJ 1997;315:422-425 (16 August)
- Greenhalgh T How to read a paper: Papers that tell you what things cost (economic analyses) BMJ 1997;315:596-599 (6 September)
- Jones R, Kinmonth A-L, eds. Critical reading for primary care. Oxford: Oxford University Press, 1995
- Sackett DL, Richardson WS, Rosenberg WMC, Haynes RB. Evidence-based medicine: how to practice and teach EBM. London: Churchill-Livingstone, 1996.
- NICE National Institute for Clinical Excellence
- The Cochrane Collaboration
- Bandolier
- SIGN Guidelines
- Centre for Reviews and Dissemination
Acknowledgements EMIS is grateful to the Mentor authoring team for writing this article. The final copy has passed peer review of the independent Mentor GP authoring team. İEMIS 2006.
Disclaimer: Patient UK has no control of the content of the above links. Inclusion does not imply endorsement by Patient UK.
Related pages in Patient UK
Your Experience (^ top of page)
Please add your experience about this condition / medicine
View patient experiences and discussions about this condition / medicine (1 there)Medical reference articles in PatientPlus related to this topic (^ top of page)
Clinical Negligence and the Electronic Patient RecordLinks to other selected websites related to this topic (^ top of page)
Medical Information ResourcesOther - Useful resources (^ top of page)
Pictures, diagrams, photos, images, etc.Evidence based medicine
Online textbooks and journals
A-Z of UK Guidelines
A-Z of Online Videos
Medline
Other good health sites
Medical equipment products related to this topic (^ top of page)

Books related to this topic (^ top of page)

Want to search some more? Use the Google Search box below to search our site.

Would you like to try our advanced on-line knowledge support system designed to provide professionals with relevant up to date information about recognition and management of disease or take the Mentor Challenge?
Disclaimer: Patient UK has no control of the content of the above links. Inclusion does not imply endorsement by Patient UK.
