Neovascular Age-related Macular Degeneration Visual Outcomes

Author

Me

Overview

Aim: to explore 12 year dataset of patients with treatment-naive neovascular age-related macular degeneration (AMD) who received intravitreal anti-VEGF therapy (Fu et al)

Session 1

Tip

Pre-requisites:

  • Install R and RStudio, as per r4ds guidance
  • Install the tidyverse set of packages (run install.packages(“tidyverse”) in the R console)

Topics:

  • Starting a new R project
  • R basics (data types, functions, variables)
  • Scripts
  • Importing tabular data from a csv to a data frame in R
  • Selecting columns and filtering for rows using select() and filter() (from the dplyr package, tidyverse)
  • Viewing and summarising data
  • Boxplots with ggplot

Start your notebook by loading any required packages and importing raw data

Code
library(tidyverse)
library(dplyr) # sdf

amd <- read_csv("MEH_AMD_survivaloutcomes_database.csv")

Explore data

Look at the first few rows using head() (in RStudio consider also using View() for an interactive table viewer)

Code
head(amd)
# A tibble: 6 × 16
   ...1     X gender ethnicity     age_group va_inj1 va_inj1_group date_inj1
  <dbl> <dbl> <chr>  <chr>         <chr>       <dbl> <chr>         <chr>    
1     1     1 f      unknown/other 70-79          55 50-69         Pre-2013 
2     2     2 f      unknown/other 70-79          55 50-69         Pre-2013 
3     3     3 f      unknown/other 70-79          55 50-69         Pre-2013 
4     4     4 f      unknown/other 70-79          55 50-69         Pre-2013 
5     5     5 f      se_asian      70-79          75 >=70          Post-2013
6     6     6 f      se_asian      70-79          75 >=70          Post-2013
# ℹ 8 more variables: mean_inj_interval <dbl>, loaded <chr>, time <dbl>,
#   injnum <dbl>, injgiven <dbl>, va <dbl>, regimen <chr>, anon_id <chr>

Summarise with summary()

Code
summary(amd)
      ...1              X             gender           ethnicity        
 Min.   :     1   Min.   :     1   Length:118255      Length:118255     
 1st Qu.: 29564   1st Qu.: 29564   Class :character   Class :character  
 Median : 59128   Median : 59128   Mode  :character   Mode  :character  
 Mean   : 59128   Mean   : 59128                                        
 3rd Qu.: 88692   3rd Qu.: 88692                                        
 Max.   :118255   Max.   :118255                                        
                                                                        
  age_group            va_inj1       va_inj1_group       date_inj1        
 Length:118255      Min.   :  0.00   Length:118255      Length:118255     
 Class :character   1st Qu.: 46.00   Class :character   Class :character  
 Mode  :character   Median : 58.00   Mode  :character   Mode  :character  
                    Mean   : 55.41                                        
                    3rd Qu.: 69.00                                        
                    Max.   :100.00                                        
                                                                          
 mean_inj_interval    loaded               time            injnum      
 Min.   :  14.0    Length:118255      Min.   :   0.0   Min.   : 1.000  
 1st Qu.:  28.0    Class :character   1st Qu.: 147.0   1st Qu.: 3.000  
 Median :  30.5    Mode  :character   Median : 447.0   Median : 7.000  
 Mean   :  51.4                       Mean   : 633.9   Mean   : 8.883  
 3rd Qu.:  45.5                       3rd Qu.: 930.0   3rd Qu.:12.000  
 Max.   :1600.0                       Max.   :4042.0   Max.   :83.000  
 NA's   :1807                                                          
    injgiven           va           regimen            anon_id         
 Min.   :1       Min.   :  0.00   Length:118255      Length:118255     
 1st Qu.:1       1st Qu.: 48.00   Class :character   Class :character  
 Median :1       Median : 62.00   Mode  :character   Mode  :character  
 Mean   :1       Mean   : 58.66                                        
 3rd Qu.:1       3rd Qu.: 73.00                                        
 Max.   :1       Max.   :100.00                                        
 NA's   :38286   NA's   :1725                                          

Selecting columns and filtering rows

  • Select visual acuity, age group and injection number columns
  • Filter for first injections only
Code
amd_first_injection_vas <- amd |>
  select(anon_id,
         va, 
         time, 
         age_group) |>
  filter(time == 0)
  • N rows in full dataset: 118255
  • N rows in filtered dataset: 7802

Plots

Box and whisker plots

Code
amd_first_injection_vas |>
    ggplot(aes(x = age_group, y = va)) +
    geom_boxplot(aes(fill = age_group)) +
    theme_bw()

Histogram

Code
amd_first_injection_vas |>
    ggplot(aes(x = va)) +
    geom_histogram(aes(fill = age_group), alpha = 0.7, binwidth = 5) +
    theme_bw()

Session 2

Code
# Note: while we call library() here, in general you should always put *all* library() calls and variables at the start of your script/notebook
library(gtsummary)
library(flextable)
Tip

Topics:

  • More data wrangling
  • Creating a manuscript ‘Table 1’
  • Linear regression

More wrangling - VA at month 12

A common workflow is to summarise data by groups e.g. N eyes by treatment regimen:

Code
amd |>
  group_by(regimen) |>
  # `n_distinct(anon_id)` ensures we get number of eyes by treatment regimen, rather than number of observations which would be `summarise(n_observations = n())` or `summarise(n_observations = length(anon_id))`
  summarise(n_ids = n_distinct(anon_id))
# A tibble: 2 × 2
  regimen          n_ids
  <chr>            <int>
1 Aflibercept only  3951
2 Ranabizumab only  3851

Here we will use group_by() with mutate() to create a new column recording time from baseline, grouped by eye ID:

Code
amd_12_month <- amd |>
  # first group by eye
  group_by(anon_id) |>
  
  # create a new column with time (in days) from 1 year. We use `abs()` to convert negative values to positive
  mutate(time_from_12_months = abs(365 - time)) |>
  
  # take one row of data per eye, the one closest to 1 year
  slice_min(time_from_12_months, with_ties = FALSE, na_rm = TRUE) |>
  
  # remember to ungroup
  ungroup() |>

  # VA must be within one month of 1 year to be included
  filter(time_from_12_months < 30) |>
  
  # rename va column to indicate this is now VA at 12 month only
  rename(va_12m = va)

Baseline vision is already recorded under va_inj1. If it wasn’t then we could have achieved the same result using a join, instead of slice_min():

Code
# data frame with just (i) ID and (ii) vision at baseline columns (for demonstration)
amd_0_month <- amd_first_injection_vas |>
  select(anon_id,
         va_0m = va)

amd_12_month |>
  # remove baseline vision column for demonstration purposes
  select(-va_inj1) |>
  
  # join data frame with baseline vision
  left_join(amd_0_month,
            by = "anon_id") |>
  
  # select columns (for demonstration)
  select(anon_id,
         starts_with("va_"))
# A tibble: 4,252 × 4
   anon_id                                            va_inj1_group va_12m va_0m
   <chr>                                              <chr>          <dbl> <dbl>
 1 00081b40ab38fa7bada9c48fc3eed2e4c1e1741548bfd4039… 36-49             35    46
 2 0018ee619b6a77d07e2466e55ca84d9d20041ede600858580… >=70              61    70
 3 0024430dd21ec7d2bf85b4722f629852132cbc29de700c0fe… 50-69             77    65
 4 0036e526db61d937c27b6fe3977cae8ac1ef021ca5874584c… 50-69             20    50
 5 0059f04c0892d8148063b2259161d74db885e30531206ba8c… >=70              65    83
 6 006ef90502f583eb8d93753e536e681da423c309ef1d1c618… >=70              70    73
 7 008843ff08f8321eb9bc435b204696880c2aa7fce3528c1c7… 50-69             69    60
 8 008d118de9f1ac67424401d079ca0904741906a9ac2978820… >=70              77    80
 9 00b8ae6605b5854a429cb9cb1cb51276999f582f5d6f84c32… 50-69             74    68
10 00e95e77dcb22ff50775c69a78028c8ac0f18cdb9975fefe3… 36-49             65    48
# ℹ 4,242 more rows

Create table 1

By hand:

Code
amd_12_month |>
  summarise(median_baseline_va = median(va_inj1, na.rm = TRUE),
            median_12month_va = median(va_12m, na.rm = TRUE))
# A tibble: 1 × 2
  median_baseline_va median_12month_va
               <dbl>             <dbl>
1                 58                65

Using gtsummary::tbl_summary():

Code
amd_12_month |>
  select(gender,
         age_group,
         ethnicity,
         va_inj1,
         va_12m,
         date_inj1,
         regimen) |>
  tbl_summary()
Characteristic N = 4,2521
gender
    f 2,587 (61%)
    m 1,665 (39%)
age_group
    >80 2,109 (50%)
    50-59 123 (2.9%)
    60-69 558 (13%)
    70-79 1,462 (34%)
ethnicity
    afrocarribean 84 (2.0%)
    caucasian 2,651 (62%)
    mixed 15 (0.4%)
    se_asian 367 (8.6%)
    unknown/other 1,135 (27%)
va_inj1 58 (46, 69)
va_12m 65 (50, 73)
    Unknown 80
date_inj1
    Post-2013 2,622 (62%)
    Pre-2013 1,630 (38%)
regimen
    Aflibercept only 2,419 (57%)
    Ranabizumab only 1,833 (43%)
1 n (%); Median (Q1, Q3)

…and by date of baseline injection, with group comparison:

Code
amd_12_month |>
  select(gender,
         age_group,
         ethnicity,
         va_inj1,
         va_12m,
         date_inj1,
         regimen) |>
  tbl_summary(by = "date_inj1") |>
  add_p() |>
  bold_p()
Characteristic Post-2013
N = 2,6221
Pre-2013
N = 1,6301
p-value2
gender

<0.001
    f 1,537 (59%) 1,050 (64%)
    m 1,085 (41%) 580 (36%)
age_group

0.2
    >80 1,323 (50%) 786 (48%)
    50-59 72 (2.7%) 51 (3.1%)
    60-69 354 (14%) 204 (13%)
    70-79 873 (33%) 589 (36%)
ethnicity

<0.001
    afrocarribean 50 (1.9%) 34 (2.1%)
    caucasian 1,468 (56%) 1,183 (73%)
    mixed 8 (0.3%) 7 (0.4%)
    se_asian 238 (9.1%) 129 (7.9%)
    unknown/other 858 (33%) 277 (17%)
va_inj1 59 (46, 69) 55 (46, 65) 0.020
va_12m 65 (51, 75) 62 (50, 73) <0.001
    Unknown 74 6
regimen

<0.001
    Aflibercept only 2,419 (92%) 0 (0%)
    Ranabizumab only 203 (7.7%) 1,630 (100%)
1 n (%); Median (Q1, Q3)
2 Pearson’s Chi-squared test; Wilcoxon rank sum test

Linear regression

Is baseline vision associated with vision at 12 months?

Visualise:

Code
amd_12_month |>
  ggplot(aes(x = va_inj1,
             y = va_12m)) +
  geom_point(alpha = 0.3, 
             position = "jitter") +
  geom_smooth(method = "lm") +
  theme_bw()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 80 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).

Linear regression model with one predictor (baseline VA) for VA at 12 months (outcome), summarised using summary():

Code
lm_12m_va <- lm(va_12m ~ va_inj1, data = amd_12_month)
summary(lm_12m_va)

Call:
lm(formula = va_12m ~ va_inj1, data = amd_12_month)

Residuals:
    Min      1Q  Median      3Q     Max 
-68.878  -7.525   1.475   8.983  54.274 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 20.26517    0.80469   25.18   <2e-16 ***
va_inj1      0.72304    0.01399   51.69   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14 on 4170 degrees of freedom
  (80 observations deleted due to missingness)
Multiple R-squared:  0.3905,    Adjusted R-squared:  0.3903 
F-statistic:  2671 on 1 and 4170 DF,  p-value: < 2.2e-16

Can also summarise using tbl_regression():

Code
tbl_regression(lm_12m_va) |>
  bold_p()
Characteristic Beta 95% CI1 p-value
va_inj1 0.72 0.70, 0.75 <0.001
1 CI = Confidence Interval