3 hours test

UNIVERSITY COLLEGE LONDON

EXAMINATION FOR INTERNAL STUDENTS

MODULE CODE : STAT0014 ASSESSMENT : STAT0014A6UA, STAT0014A6UC PATTERN STAT0014A7UC STAT0014A7PC MODULE NAME : STAT0014 – Medical Statistics 1 LEVEL: : Undergraduate Undergraduate (Masters Level) Postgraduate DATE : 13/05/2021 TIME : 10:00

This paper is suitable for candidates who attended classes for this module in the following academic year(s):

2018/19, 2020/21

TURN OVER

Page 1 of 9

STAT0014 (Level 6 and 7): Medical Statistics I

2021

Instructions

• Answer ALL questions.

• You have three hours to complete this paper.

• After the three hours has elapsed, you have one additional hour to upload your solutions.

• You may submit only one answer to each question.

• The relative weights attached to each question are as follows: A1 (3), A2 (3), A3 (5), A4 (6),

A5 (8), B1 (12), B2 (12), B3 (11).

• The numbers in square brackets indicate the relative weights attached to each part

question.

• Marks are awarded not only for the final result but also for the clarity of your answer.

Administrative details

• This is an open-book exam. You may use your course materials to answer questions.

• You may not contact the course lecturer with any questions, even if you want to clarify

something or report an error on the paper. If you have any doubts about a question, make a

note in your answer explaining the assumptions that you are making in answering it. You

should also fill out the exam paper query form online.

• The overall word limit for this exam has been set at 3000.

• Some part-questions require text-based answers. You must adhere to this or you will lose

marks.

• Some questions may ask you to approach a problem in a particular way; please take note of

this. Failure to do so may result in marks being deducted.

Formatting your solutions for submission

• Some part-questions require you to type your answers instead of handwriting them. These

questions state [Type] at the start of the part-question. You must follow this instruction.

Failure to do so may result in marks being deducted. For questions without the [Type]

instruction, you may choose to type or hand-write your answer.

• You should submit ONE pdf document that contains your solutions for all questions/ part-

questions. Please follow UCL’s guidance on combining text and photographed/ scanned

work.

• Make sure that your handwritten solutions are clear and are readable in the document you

submit.https://egon.stats.ucl.ac.uk/slink/b7fa4615https://egon.stats.ucl.ac.uk/slink/b7fa4615

Examination Paper for STAT0014 (Level 6 and 7)

Page 2 of 9

Plagiarism and collusion

• You must work alone. In particular, any discussion of the paper with anyone else is not

acceptable. You are encouraged to read the Department of Statistical Science’s advice on

collusion and plagiarism.

• Parts of your submission will be screened via Turnitin to check for plagiarism and collusion.

• If there is any doubt as to whether the solutions you submit are entirely your own work you

may be required to participate in an investigatory viva to establish authorship.https://www.ucl.ac.uk/statistics/sites/statistics/files/shbpc.pdfhttps://www.ucl.ac.uk/statistics/sites/statistics/files/shbpc.pdf

Examination Paper for STAT0014 (Level 6 and 7)

Page 3 of 9

Section A

A1

A parallel two-group trial was to be carried out in a rural area of East Africa to ascertain whether giving

food supplementation during pregnancy increases birth weight. Women attending the antenatal clinic

were to be randomly assigned to either receive or not receive supplementation. The researchers

performed a sample size calculation and wrote the following statement.

“A sample size of 95 women per randomisation arm is sufficient to detect a clinically significant

difference of 0.25 kg using a z-test with 90% power at the 5% significance level. The variability in the

birth weights in the two groups of women was assumed to be the same and the sample size above

allows for a 10% dropout rate.

Read carefully the statement above and state which parameter value is missing in order to replicate

the sample size calculation above. Calculate this missing value. [3]

A2

Logistic regression was used to model the relationship between short-term mortality following CABG

surgery (1=died, 0=alive) and the predictors age as a categorical variable (0=less than 65 years, 1= 65

years or more) and sex (0=male, 1= female). The estimated intercept and coefficients for age and sex

were 𝛽0 = −3.10, 𝛽age = 1.5 and 𝛽sex = −0.5, respectively.

Data on two patients are given in the following table.

Patient ID Outcome Age (years) Sex

1 alive 65 Male 2 died 75 Female

a) Calculate the probability of short-term mortality for these two patients. [2]

b) Calculate the value of the likelihood for this model based on just these two patients. [1]

Examination Paper for STAT0014 (Level 6 and 7)

Page 4 of 9

A3

In a trial comparing Acupuncture (A) with Homeopathy (H), patients suffering from chronic headache

were allocated to treatment using a minimisation procedure based on sex (male/female) and type of

headache (migraine/tension/cluster). The number of patients with each characteristic for each

treatment group after eighty-nine patients have entered the trial are given in the table below.

Minimisation factors Acupuncture Homeopathy

Sex Male 19 16

Female 26 28

Type of headache

Migraine 23 24

Tension 17 17

Cluster 5 3

a) [Type] The 90th recruited patient is male and suffers from migraines. Given the information in the

table above, which group should randomisation be weighted towards? Give a reason for your

answer. [2]

b) [Type] If stratified randomisation was used instead of minimisation, how many strata would be

needed to balance on these same factors. Justify your answer. [1]

c) In this trial, each patient’s quality of life will be measured at baseline and again 1 month after

treatment. Fully specify the most suitable and efficient model for analysis of this trial. Define all

terms in the model. [2]

A4

A randomised trial compared a new cream to a placebo cream for the treatment of athlete’s foot.

Using randomisation, 39 patients were allocated to the new cream and 40 to the placebo. Patients

were assessed at the end of a two-week treatment period. The infection was eradicated for 29 patients

in the new cream group and 23 patients in the placebo group. A difference of 10% in the proportion

of patients with an eradicated infection was considered to be clinically important.

a) Calculate an absolute measure that could be used to estimate the effect of the new cream

amongst people with athlete’s foot and quantify the precision of this estimate. Show your

workings. [3]

b) [Type] Interpret the results from part a), state a conclusion about the clinical benefit of the

new cream and comment on the size of the study. [3]

Examination Paper for STAT0014 (Level 6 and 7)

Page 5 of 9

A5

A cohort study was carried out to investigate the long-term outcome ‘loss of vision’ for patients,

following treatment for radiologically confirmed retinal artery occlusion. The table shows the results

for the two most common treatments.

Loss of vision Person-months of follow-up

Rate of loss of vision (per year)

Yes No

Intravenous thrombolysis (IT)

71 50 A 7.07

Antithrombotic treatment (AT)

117 66 183.0 B

Relative rate of loss of vision (IT compared with AT)

C

For all calculations show your workings. Given the information above:

a) Calculate the values of A, B and C in the table. [2]

b) Calculate the relative risk of loss of vision. How do the relative risk and relative rate estimates

compare and what does this tell you about the study? [2]

c) Calculate a 95% confidence interval for the relative rate given in the table. Then, draw a conclusion

regarding the relative effect of IT compared to AT. [3]

d) [Type] Explain why a modelling approach might be preferred for analysis of data produced by this

study design. [1]

Examination Paper for STAT0014 (Level 6 and 7)

Page 6 of 9

Section B

B1

A clinical trial of treatment for chronic back pain allocated 184 subjects to either 10 weekly sessions of physiotherapy or a self-help booklet. Treatment was allocated using simple randomisation. The outcome was a pain score measured at 6-month follow-up and also at baseline. A score of 24 represents the worst imaginable pain and 0 represents no pain. The following table summarises baseline and follow up pain scores.

Pain score Baseline 6 Month follow up*

N Mean SD N Mean SD

Leaflet 96 5.646 3.931 88 3.739 4.484

Physiotherapy 88 6.591 3.996 81 2.988 3.441

* where available a) [Type] Comment on the imbalance in numbers randomised to the groups in this trial and how this

might have been prevented. [2] b) [Type] Based on the information in the table above, identify two potential sources of bias and

explain how such problems can impact the trial. [3] The data from the trial were analysed in 3 ways. Results are given in the table below:

N Treatment effect estimate Standard Error

Unadjusted 169 -0.751 0.619

Change (6 month – baseline) 166 -2.114 0.667

ANCOVA, adjusted for baseline 166 -1.351 0.587

Note: correlation between change scores and baseline scores = -0.53

c) [Type] Explain in detail why for this trial: i) the unadjusted treatment estimate is smaller than the estimates from the other analyses. [1] ii) the change analysis produces a larger treatment estimate than the ANCOVA analysis. [2]

d) [Type] From the table, identify the analysis you consider to be most appropriate for this trial and

interpret its results. [3]

e) [Type] A further analysis for this trial compared the change outcome between groups adjusting for baseline pain score. How would you expect the estimate from this analysis to compare with those given in the table? [1]

Examination Paper for STAT0014 (Level 6 and 7)

Page 7 of 9

B2

It is hypothesised that there is an association between melanoma and a high-fat diet in adults over 40

years of age. Researchers decided to investigate this hypothesis as follows. Firstly they identified 745

subjects with melanoma from oncology hospitals and 1718 subjects without melanoma from general

practitioners’ lists. Then they asked all subjects about their dietary habits in the past and recorded

their responses. Amongst the subjects with melanoma, 105 said they had a high-fat diet and amongst

the subjects without melanoma 98 said they had a high-fat diet.

a) [Type] What type of study design is used in this study? In the context of this particular study,

define one type of bias that may arise and describe how it might affect the results of this study.

[3]

b) Calculate an appropriate measure of association for this type of study, along with a 95%

confidence interval, to answer the research question. Interpret your results in the context of this

study. [4]

c) The following table shows the average age in years (SD) for subjects with and without melanoma,

along with the difference in mean age and a 95% Confidence interval.

Subjects with Melanoma

Subjects without Melanoma

Difference in Means (95% CI)

p-value

Mean Age (SD) 65.3 (15.2) 55.8 (14.9) 9.5 (8.2 -10.78) <0.01* * using a t-test

Based on these data, can it be concluded that age is a confounder in the relationship between

melanoma and a high-fat diet? Justify your answer. [2]

d) [Type] If age and gender were thought to be confounders, briefly describe how confounding could

be handled at the stage of study design. Name two possible approaches for the analysis of these

data. [2]

e) [Type] Explain why the approach you used in part b) would not be suitable for analysing data that

were collected after applying the process in part d)? [1]

Examination Paper for STAT0014 (Level 6 and 7)

Page 8 of 9

B3

A double-blind randomised clinical trial is carried out to investigate the efficacy of a new experimental

treatment in reducing time to recurrency mortality in patients with bowel cancer. Thirty patients in

Group A receive standard treatment, and thirty patients in Group B receive the new treatment.

Patients were followed up over a period of 5 years. The Table below shows the Survival time (months)

and sex for the first 10 patients in each group.

ID Survival time (days)

Group A

Sex ID Survival time (days)

Group B

Sex

1 5 F 11 1* F 2 9 M 12 13 F

3 10* F 13 15* M

4 12* M 14 19 M

5 14* M 15 21* M

6 18 F 16 22* F

7 23* F 17 24* F

8 25* M 18 26* M

9 27 M 19 30 F

10 35* F 20 32* M

Note: * denotes patients who were lost to follow-up.

a) Based on just these data, calculate the Survivor function for the group of female patients. [3]

b) The researchers plan to perform a non-parametric statistical test to compare the survival

times in the two groups. Which test are they planning to use? Complete the first four rows of

the following table designed to calculate the corresponding test statistic based on just the

patients in the table above. [3]

Group A Group B Group A Group B

Interval Died Total Died Total E(Death) E(Death)

c) In the complete dataset consisting of 60 patients, over the follow-up period of 5 years 14

patients died in group A and 10 patients died in group B. The expected number of events in

Group A was 12.56. Using the same statistical test as in part b) calculate the value of the test

statistic for the complete dataset and, based on this, make a statement regarding the

comparison of the two treatments. [2]

Examination Paper for STAT0014 (Level 6 and 7)

Page 9 of 9 END OF PAPER

d) The researchers suspect that the effect of treatment may differ by sex. Specify a regression model that could be used to investigate this. [2]

e) [Type] Some patients dropped out of the study because they felt too sick and decided to discontinue their participation. Comment on the appropriateness of the analyses in parts a)- d). [1]