Neovascular Age-related Macular Degeneration Visual Outcomes
Overview
Aim: to explore 12 year dataset of patients with treatment-naive neovascular age-related macular degeneration (AMD) who received intravitreal anti-VEGF therapy (Fu et al)
Session 1
Pre-requisites:
- Install R and RStudio, as per r4ds guidance
- Install the tidyverse set of packages (run install.packages(“tidyverse”) in the R console)
Topics:
- Starting a new R project
- R basics (data types, functions, variables)
- Scripts
- Importing tabular data from a csv to a data frame in R
- Selecting columns and filtering for rows using select() and filter() (from the dplyr package, tidyverse)
- Viewing and summarising data
- Boxplots with ggplot
Start your notebook by loading any required packages and importing raw data
Explore data
Look at the first few rows using head() (in RStudio consider also using View() for an interactive table viewer)
Code
head(amd)# A tibble: 6 × 16
...1 X gender ethnicity age_group va_inj1 va_inj1_group date_inj1
<dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 1 f unknown/other 70-79 55 50-69 Pre-2013
2 2 2 f unknown/other 70-79 55 50-69 Pre-2013
3 3 3 f unknown/other 70-79 55 50-69 Pre-2013
4 4 4 f unknown/other 70-79 55 50-69 Pre-2013
5 5 5 f se_asian 70-79 75 >=70 Post-2013
6 6 6 f se_asian 70-79 75 >=70 Post-2013
# ℹ 8 more variables: mean_inj_interval <dbl>, loaded <chr>, time <dbl>,
# injnum <dbl>, injgiven <dbl>, va <dbl>, regimen <chr>, anon_id <chr>
Summarise with summary()
Code
summary(amd) ...1 X gender ethnicity
Min. : 1 Min. : 1 Length:118255 Length:118255
1st Qu.: 29564 1st Qu.: 29564 Class :character Class :character
Median : 59128 Median : 59128 Mode :character Mode :character
Mean : 59128 Mean : 59128
3rd Qu.: 88692 3rd Qu.: 88692
Max. :118255 Max. :118255
age_group va_inj1 va_inj1_group date_inj1
Length:118255 Min. : 0.00 Length:118255 Length:118255
Class :character 1st Qu.: 46.00 Class :character Class :character
Mode :character Median : 58.00 Mode :character Mode :character
Mean : 55.41
3rd Qu.: 69.00
Max. :100.00
mean_inj_interval loaded time injnum
Min. : 14.0 Length:118255 Min. : 0.0 Min. : 1.000
1st Qu.: 28.0 Class :character 1st Qu.: 147.0 1st Qu.: 3.000
Median : 30.5 Mode :character Median : 447.0 Median : 7.000
Mean : 51.4 Mean : 633.9 Mean : 8.883
3rd Qu.: 45.5 3rd Qu.: 930.0 3rd Qu.:12.000
Max. :1600.0 Max. :4042.0 Max. :83.000
NA's :1807
injgiven va regimen anon_id
Min. :1 Min. : 0.00 Length:118255 Length:118255
1st Qu.:1 1st Qu.: 48.00 Class :character Class :character
Median :1 Median : 62.00 Mode :character Mode :character
Mean :1 Mean : 58.66
3rd Qu.:1 3rd Qu.: 73.00
Max. :1 Max. :100.00
NA's :38286 NA's :1725
Selecting columns and filtering rows
- Select visual acuity, age group and injection number columns
- Filter for first injections only
- N rows in full dataset: 118255
- N rows in filtered dataset: 7802
Plots
Box and whisker plots
Code
amd_first_injection_vas |>
ggplot(aes(x = age_group, y = va)) +
geom_boxplot(aes(fill = age_group)) +
theme_bw()Histogram
Code
amd_first_injection_vas |>
ggplot(aes(x = va)) +
geom_histogram(aes(fill = age_group), alpha = 0.7, binwidth = 5) +
theme_bw()Session 2
Topics:
- More data wrangling
- Creating a manuscript ‘Table 1’
- Linear regression
More wrangling - VA at month 12
A common workflow is to summarise data by groups e.g. N eyes by treatment regimen:
Code
amd |>
group_by(regimen) |>
# `n_distinct(anon_id)` ensures we get number of eyes by treatment regimen, rather than number of observations which would be `summarise(n_observations = n())` or `summarise(n_observations = length(anon_id))`
summarise(n_ids = n_distinct(anon_id))# A tibble: 2 × 2
regimen n_ids
<chr> <int>
1 Aflibercept only 3951
2 Ranabizumab only 3851
Here we will use group_by() with mutate() to create a new column recording time from baseline, grouped by eye ID:
Code
amd_12_month <- amd |>
# first group by eye
group_by(anon_id) |>
# create a new column with time (in days) from 1 year. We use `abs()` to convert negative values to positive
mutate(time_from_12_months = abs(365 - time)) |>
# take one row of data per eye, the one closest to 1 year
slice_min(time_from_12_months, with_ties = FALSE, na_rm = TRUE) |>
# remember to ungroup
ungroup() |>
# VA must be within one month of 1 year to be included
filter(time_from_12_months < 30) |>
# rename va column to indicate this is now VA at 12 month only
rename(va_12m = va)Baseline vision is already recorded under va_inj1. If it wasn’t then we could have achieved the same result using a join, instead of slice_min():
Code
# data frame with just (i) ID and (ii) vision at baseline columns (for demonstration)
amd_0_month <- amd_first_injection_vas |>
select(anon_id,
va_0m = va)
amd_12_month |>
# remove baseline vision column for demonstration purposes
select(-va_inj1) |>
# join data frame with baseline vision
left_join(amd_0_month,
by = "anon_id") |>
# select columns (for demonstration)
select(anon_id,
starts_with("va_"))# A tibble: 4,252 × 4
anon_id va_inj1_group va_12m va_0m
<chr> <chr> <dbl> <dbl>
1 00081b40ab38fa7bada9c48fc3eed2e4c1e1741548bfd4039… 36-49 35 46
2 0018ee619b6a77d07e2466e55ca84d9d20041ede600858580… >=70 61 70
3 0024430dd21ec7d2bf85b4722f629852132cbc29de700c0fe… 50-69 77 65
4 0036e526db61d937c27b6fe3977cae8ac1ef021ca5874584c… 50-69 20 50
5 0059f04c0892d8148063b2259161d74db885e30531206ba8c… >=70 65 83
6 006ef90502f583eb8d93753e536e681da423c309ef1d1c618… >=70 70 73
7 008843ff08f8321eb9bc435b204696880c2aa7fce3528c1c7… 50-69 69 60
8 008d118de9f1ac67424401d079ca0904741906a9ac2978820… >=70 77 80
9 00b8ae6605b5854a429cb9cb1cb51276999f582f5d6f84c32… 50-69 74 68
10 00e95e77dcb22ff50775c69a78028c8ac0f18cdb9975fefe3… 36-49 65 48
# ℹ 4,242 more rows
Create table 1
By hand:
Code
# A tibble: 1 × 2
median_baseline_va median_12month_va
<dbl> <dbl>
1 58 65
Using gtsummary::tbl_summary():
Code
amd_12_month |>
select(gender,
age_group,
ethnicity,
va_inj1,
va_12m,
date_inj1,
regimen) |>
tbl_summary()| Characteristic | N = 4,2521 |
|---|---|
| gender | |
| f | 2,587 (61%) |
| m | 1,665 (39%) |
| age_group | |
| >80 | 2,109 (50%) |
| 50-59 | 123 (2.9%) |
| 60-69 | 558 (13%) |
| 70-79 | 1,462 (34%) |
| ethnicity | |
| afrocarribean | 84 (2.0%) |
| caucasian | 2,651 (62%) |
| mixed | 15 (0.4%) |
| se_asian | 367 (8.6%) |
| unknown/other | 1,135 (27%) |
| va_inj1 | 58 (46, 69) |
| va_12m | 65 (50, 73) |
| Unknown | 80 |
| date_inj1 | |
| Post-2013 | 2,622 (62%) |
| Pre-2013 | 1,630 (38%) |
| regimen | |
| Aflibercept only | 2,419 (57%) |
| Ranabizumab only | 1,833 (43%) |
| 1 n (%); Median (Q1, Q3) | |
…and by date of baseline injection, with group comparison:
Code
amd_12_month |>
select(gender,
age_group,
ethnicity,
va_inj1,
va_12m,
date_inj1,
regimen) |>
tbl_summary(by = "date_inj1") |>
add_p() |>
bold_p()| Characteristic |
Post-2013 N = 2,6221 |
Pre-2013 N = 1,6301 |
p-value2 |
|---|---|---|---|
| gender | <0.001 | ||
| f | 1,537 (59%) | 1,050 (64%) | |
| m | 1,085 (41%) | 580 (36%) | |
| age_group | 0.2 | ||
| >80 | 1,323 (50%) | 786 (48%) | |
| 50-59 | 72 (2.7%) | 51 (3.1%) | |
| 60-69 | 354 (14%) | 204 (13%) | |
| 70-79 | 873 (33%) | 589 (36%) | |
| ethnicity | <0.001 | ||
| afrocarribean | 50 (1.9%) | 34 (2.1%) | |
| caucasian | 1,468 (56%) | 1,183 (73%) | |
| mixed | 8 (0.3%) | 7 (0.4%) | |
| se_asian | 238 (9.1%) | 129 (7.9%) | |
| unknown/other | 858 (33%) | 277 (17%) | |
| va_inj1 | 59 (46, 69) | 55 (46, 65) | 0.020 |
| va_12m | 65 (51, 75) | 62 (50, 73) | <0.001 |
| Unknown | 74 | 6 | |
| regimen | <0.001 | ||
| Aflibercept only | 2,419 (92%) | 0 (0%) | |
| Ranabizumab only | 203 (7.7%) | 1,630 (100%) | |
| 1 n (%); Median (Q1, Q3) | |||
| 2 Pearson’s Chi-squared test; Wilcoxon rank sum test | |||
Linear regression
Is baseline vision associated with vision at 12 months?
Visualise:
Code
amd_12_month |>
ggplot(aes(x = va_inj1,
y = va_12m)) +
geom_point(alpha = 0.3,
position = "jitter") +
geom_smooth(method = "lm") +
theme_bw()`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 80 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).
Linear regression model with one predictor (baseline VA) for VA at 12 months (outcome), summarised using summary():
Call:
lm(formula = va_12m ~ va_inj1, data = amd_12_month)
Residuals:
Min 1Q Median 3Q Max
-68.878 -7.525 1.475 8.983 54.274
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.26517 0.80469 25.18 <2e-16 ***
va_inj1 0.72304 0.01399 51.69 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14 on 4170 degrees of freedom
(80 observations deleted due to missingness)
Multiple R-squared: 0.3905, Adjusted R-squared: 0.3903
F-statistic: 2671 on 1 and 4170 DF, p-value: < 2.2e-16
Can also summarise using tbl_regression():
Code
tbl_regression(lm_12m_va) |>
bold_p()| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| va_inj1 | 0.72 | 0.70, 0.75 | <0.001 |
| 1 CI = Confidence Interval | |||