ABCDscores • ABCDscores

Setup

After installing the package, you can load it with:

library(ABCDscores)

Alternatively you can call functions directly, without loading the package, using ::, e.g., ABCDscores::name_of_function(...)

Data preparation

To compute summary scores, you’ll need to have downloaded data from the ABCD Study^®. To request access to the data, visit the NIH Brain Development Cohorts (NBDC) Data Hub. Once you have access, you can use different tools to access and download the data; they are described in more detail in the ABCD data documentation.

Here we assume that you created a dataset containing the variables you want to summarize in DEAP and downloaded it in the rds format. Afterwards, unzip the dataset.rds.zip file to the working directory (or move the zip file to the working directory and use utils::unzip("dataset.rds.zip") to extract all files). The unzipped files should consist of a dataset.rds file and an Excel file with the data dictionary and categorical levels.

Load the data into R using the following command:

data <- readRDS("dataset.rds")

Computing summary scores

Score naming convention

Before computing summary scores, it is important to understand the structure and nomenclature of the functions in the package:

For any given summary score, the function to compute it is named compute_<score_name>(). For example, the function to compute the score fc_p_psb_mean is named compute_fc_p_psb_mean().¹
For any given measure/table, there exists a high-level compute_<table_name>_all() function that computes all scores for that measure/table. For example, the function to compute all scores for the fc_p_psb measure/table is named compute_fc_p_psb_all().
For any given summary score function, certain columns—the columns that are being summarized and in some cases additional columns like age or sex—are required to be present in the data for the score to be computed. The function documentation lists the required columns for a given function. In addition, the columns that a summary score function summarizes are typically provided as a character vector named vars_<measure_name>. For example, the vector with the columns that are summarized by the fc_p_psb_mean function is named vars_fc_p_psb.

The references page provides a list of all available functions and their parameters.

Basic usage

After reading in the data, we can start to compute summary scores. As an example, we will demonstrate how to compute the two summary scores for the fc_p_psb measure/table (fc_p_psb_mean and fc_p_psb_nm) in two different ways:

using the specific functions to compute one score at a time
using the _all() function to compute all scores for the measure/table at once.

When we refer to the documentation for compute_fc_p_psb_mean(), we see that it requires the following variables: fc_p_psb_001, fc_p_psb_002, and fc_p_psb_003. If these variables are part of the dataset created in and downloaded from DEAP, they should be present in the data after reading in dataset.rds as demonstrated above.

Here, for demonstration purposes, we will create a dummy data frame with these columns:

data <- tibble::tibble(
  fc_p_psb_001 = c("1", "2", "3", "4", "5"),
  fc_p_psb_002 = c("1", NA, "3", "4", NA),
  fc_p_psb_003 = c("1", "2", "2", "4", NA)
)
data
#> # A tibble: 5 × 3
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003
#>   <chr>        <chr>        <chr>       
#> 1 1            1            1           
#> 2 2            NA           2           
#> 3 3            3            2           
#> 4 4            4            4           
#> 5 5            NA           NA

For most summary score functions, only the data argument (input data frame) is required, i.e., we can just use the function like this:

compute_fc_p_psb_mean(data)
#> # A tibble: 5 × 4
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_mean
#>   <chr>        <chr>        <chr>                <dbl>
#> 1 1            1            1                     1   
#> 2 2            NA           2                    NA   
#> 3 3            3            2                     2.67
#> 4 4            4            4                     4   
#> 5 5            NA           NA                   NA

We can do the same using fc_p_psb_nm():

compute_fc_p_psb_nm(data)
#> # A tibble: 5 × 4
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_nm
#>   <chr>        <chr>        <chr>              <int>
#> 1 1            1            1                      0
#> 2 2            NA           2                      1
#> 3 3            3            2                      0
#> 4 4            4            4                      0
#> 5 5            NA           NA                     2

We can also compute both scores at the same time by chaining the function calls using the pipe operator:

data |>
  compute_fc_p_psb_mean() |>
  compute_fc_p_psb_nm()
#> # A tibble: 5 × 5
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_mean fc_p_psb_nm
#>   <chr>        <chr>        <chr>                <dbl>       <int>
#> 1 1            1            1                     1              0
#> 2 2            NA           2                    NA              1
#> 3 3            3            2                     2.67           0
#> 4 4            4            4                     4              0
#> 5 5            NA           NA                   NA              2

Lastly, if we want to compute all scores for the measure with one function call, we can use the compute_<table_name>_all() function for the fc_p_psb table:

compute_fc_p_psb_all(data)
#> # A tibble: 5 × 5
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_mean fc_p_psb_nm
#>   <chr>        <chr>        <chr>                <dbl>       <int>
#> 1 1            1            1                     1              0
#> 2 2            NA           2                    NA              1
#> 3 3            3            2                     2.67           0
#> 4 4            4            4                     4              0
#> 5 5            NA           NA                   NA              2

Important parameters and customization

`data`

The data argument is the input data frame that contains the columns required to compute the score. The required columns are documented in the function documentation for each score.

`name`

The name argument is used to specify the name of the output score. The default default value for this parameter is the official name of the column in the released data, but it can be overridden by users with a custom name.

compute_fc_p_psb_mean(data, name = "my_custom_name")
#> # A tibble: 5 × 4
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 my_custom_name
#>   <chr>        <chr>        <chr>                 <dbl>
#> 1 1            1            1                      1   
#> 2 2            NA           2                     NA   
#> 3 3            3            2                      2.67
#> 4 4            4            4                      4   
#> 5 5            NA           NA                    NA

For example, this is useful when the data frame specified in data contains the official summary score that one is trying to reproduce. In this case, the user is required to specify a different name; otherwise the function will return an error.

`combine`

The combine argument is used to specify whether to combine the output score with the input data frame. The default value is TRUE, which means the output score is appended as a new column on the right hand side of the input data frame. If the argument is set to FALSE, the output score is returned as a single-column data frame:

compute_fc_p_psb_mean(data, combine = FALSE)
#> # A tibble: 5 × 1
#>   fc_p_psb_mean
#>           <dbl>
#> 1          1   
#> 2         NA   
#> 3          2.67
#> 4          4   
#> 5         NA

`max_na`

The max_na argument is used to specify the maximum number of missing values across all summarized variables a given row (or participant/event) can have for the summary score to still be computed. If the number of missing values in a row exceeds the specified value, the score for that row is set to NA. Depending on the summary score, the number of missing values allowed may vary and not all summary score functions have this argument.

NULL: No limit on missing values.
0: No missing values allowed.
1: At most one missing value allowed.
…

For most summary scores in the ABCD data resource, max_na is set to a number that ensures that >=80% of the variables that the given score summarizes have a non-missing value. Users can use the max_na argument if they want to compute the summary score in a more lenient or more restrictive manner.

As an example, let’s explore how the summary score changes when we set max_na argument to 1 (above we used the default, which in the case of compute_fc_p_psb_mean() is 0). Now a score is computed for the second row which has one missing value but not for the last row which has two missing values:

compute_fc_p_psb_mean(data, max_na = 1)
#> # A tibble: 5 × 4
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_mean
#>   <chr>        <chr>        <chr>                <dbl>
#> 1 1            1            1                     1   
#> 2 2            NA           2                     2   
#> 3 3            3            2                     2.67
#> 4 4            4            4                     4   
#> 5 5            NA           NA                   NA

When we change max_na to 2, a score is also computed for the last row:

compute_fc_p_psb_mean(data, max_na = 2)
#> # A tibble: 5 × 4
#>   fc_p_psb_001 fc_p_psb_002 fc_p_psb_003 fc_p_psb_mean
#>   <chr>        <chr>        <chr>                <dbl>
#> 1 1            1            1                     1   
#> 2 2            NA           2                     2   
#> 3 3            3            2                     2.67
#> 4 4            4            4                     4   
#> 5 5            NA           NA                    5

`exclude`

The exclude argument is used to specify values that should be excluded from the computation of the score. Some specific values in the data might be considered as missing values, e.g., coded non-responses like “Don’t know” (999), “Decline to answer” (777), etc. This argument allows the user to specify these values so that they are treated as missing values during the computation of the score (importantly, the max_na argument applies to all values that are either missing, NA, or specified as values to be excluded using the exclude argument). Not all score functions have this argument.

In this example we use another score function compute_mh_p_abcl__afs__frnd_sum which has the exclude argument. We first construct a dummy data frame:

data <- tibble::tibble(
  mh_p_abcl__frnd_001 = c(1, 2, 3, 4, 5),
  mh_p_abcl__frnd_002 = c(1, 777, 3, 4, 777),
  mh_p_abcl__frnd_003 = c(1, 2, NA, 4, 777),
  mh_p_abcl__frnd_004 = c(1, 2, 3, 4, 999),
)
data
#> # A tibble: 5 × 4
#>   mh_p_abcl__frnd_001 mh_p_abcl__frnd_002 mh_p_abcl__frnd_003
#>                 <dbl>               <dbl>               <dbl>
#> 1                   1                   1                   1
#> 2                   2                 777                   2
#> 3                   3                   3                  NA
#> 4                   4                   4                   4
#> 5                   5                 777                 777
#> # ℹ 1 more variable: mh_p_abcl__frnd_004 <dbl>

When we compute the score, only the 1, 4 rows are computed, because other rows contain 777 or 999 or NA values.

compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999"))
#> # A tibble: 5 × 5
#>   mh_p_abcl__frnd_001 mh_p_abcl__frnd_002 mh_p_abcl__frnd_003
#>                 <dbl>               <dbl>               <dbl>
#> 1                   1                   1                   1
#> 2                   2                 777                   2
#> 3                   3                   3                  NA
#> 4                   4                   4                   4
#> 5                   5                 777                 777
#> # ℹ 2 more variables: mh_p_abcl__frnd_004 <dbl>, mh_p_abcl__afs__frnd_sum <int>

We can also exclude custom values, for example, we can exclude 4, and then only the first row is computed.

compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999", "4"))
#> # A tibble: 5 × 5
#>   mh_p_abcl__frnd_001 mh_p_abcl__frnd_002 mh_p_abcl__frnd_003
#>                 <dbl>               <dbl>               <dbl>
#> 1                   1                   1                   1
#> 2                   2                 777                   2
#> 3                   3                   3                  NA
#> 4                   4                   4                   4
#> 5                   5                 777                 777
#> # ℹ 2 more variables: mh_p_abcl__frnd_004 <dbl>, mh_p_abcl__afs__frnd_sum <int>

Utility functions

The compute_<score_name>() functions are the main functions to compute summary scores, with one summary score function for each score (besides a few exceptions that are documented in the other vignettes). However, to be more concise, the main functions often use utility functions. These utility functions are not necessarily meant to be used directly by users of this package, but they are documented and exported for transparency and reproducibility. For the documentation of these functions, see the reference page.