Administrative Supplement to Support Collaborations to Improve AI/ML-Readiness

description

SUMMARY This application is for an administrative supplement (revision) to an existing award, R33AG068931, "Advanced Development and Utilization of Assembled Aging Trajectory Files from Multiple Datasets." The goal of the parent study is to create a comprehensive research repository of aging trajectory datasets and to demonstrate their utility for aging research at Rutgers University through 4 specific aims: 1) Harmonizing and merging multiple data sets to generate the data infrastructure needed to understand change over time in care settings, geriatric syndromes, physical functioning, and shared risk factors at multiple levels and across multiple domains, 2) Developing state-of-the-art analytic methods to identify patterns of aging trajectories experienced by older adults during the final years of life and their association with shared risk factors and distal outcomes, 3) Discovering multilevel and potentially interactive predictors of trajectories using both model-based approaches and machine learning algorithms to predict specific outcomes, and 4) Disseminating resources generated including datasets, documentation, source code, and methodology. For the supplement, new work in the CMS Virtual Research Data Center (VRDC) will create AI/ML-ready datasets, workflows, and source code for data cleaning and pre-processing, breaking the siloed barriers between researchers working in the VRDC and institutional data enclaves at Universities. Data harmonization procedures need to be customized to the server architecture and resources of each data warehouse, necessitating VRDC-specific workflows and code to ensure timely access and reproducibility. In this project, data are made AI/ML-ready in four stages: 1) the cohort of patients to be studied is defined and key inclusion and exclusion criteria variables are selected; 2) data pre-processing steps include data cleaning, data annotation, formatting, standardizing taxonomies, variables transformation, data rescale/normalization, variable aggregating, variable decomposing and variable selection with a focus on variables important to measure health disparities and improve minority health and reduce health disparities; 3) feature extraction and engineering include generating derived variables (e.g., intercept, slope, average, etc.) from irregularly spaced individual trajectories; and 4) Medicare data sets are merged with publicly available data to add socioeconomic and environmental context, and data variable relationships are mapped to produce a final, AI/ML-ready data. Supplement Aim. Develop and implement code for data pre-processing, data fine-tuning and precision, missing data imputation, data connectivity and fully established hierarchical relationships for the AI/ML framework to interactively model late-life aging trajectories and selected outcomes in a cohort of Medicare beneficiaries. Completion of this work will contribute to the NIH vision of a modernized and integrated biomedical data ecosystem that adopts the latest data science technologies, and best practice guidelines including FAIR (findable, accessible, interoperable, reusable) principles and open-source development.

Clin-STAR Database

Administrative Supplement to Support Collaborations to Improve AI/ML-Readiness Funded Grant

Overview

description

date/time interval

Affiliation

contributor