Quickstart¶
There are two options for using nbdctools to create datasets:
- The R-backed
create_datasetwrapper, which has full parity with the Rcreate_datasetbehavior and forwards all extra keyword arguments toNBDCtools::create_dataset. - The pure-Python
create_dataset_py, which provides metadata-driven joins from tabulated files but only support joining the dataset but does not support further processing steps like variable type coercion, labeling, missingness handling, etc.
Read the R wrapper guide and pure-Python guide for more details on each workflow, including practical guidance on choosing between them.
R-backed wrapper¶
from nbdctools import NBDCtoolsRError, create_dataset
try:
df = create_dataset(
dir_data="/path/to/tabulated/data",
study="abcd",
vars=["var1", "var2", "var3"],
tables=["table1", "table2"],
release="latest",
)
print(df.shape)
except NBDCtoolsRError as exc:
print(exc)
print(exc.r_message)
Pure-Python¶
from nbdctools import create_dataset_py, load_metadata, download_metadata
# Download metadata only for the first time
download_metadata(type="dds")
# If downloaded or have it already, load into session
dds = load_metadata("lst_dds.rds")
# choose the study and release of interest
dd = dds["abcd"]["6.1"]
df = create_dataset_py(
dir_data="/path/to/tabulated/data",
dd=dd,
tables=["table1", "table2"],
vars=["var1", "var2", "var3"],
)
print(df.shape)