Utilization of PEDSnet to create a novel classifier for children with nephrotic syndrome using aggregated clinical and genetic testing data

Nephrotic syndrome (NS) is a leading cause of acquired end-stage kidney disease (ESKD) in children. The incidence of NS is approximately 2-5 per 100,000 children and its pathogenesis is not fully understood. Children with NS are classified based on: 1) Treatment (steroid) response 2) Renal biopsy and/or 3) Genetic mutation. However, the correlation among these classifications remains unclear and treatment guidelines are empiric resulting in “trial and error” approaches to therapy. Children with NS experience significant morbidity and mortality, including infections, dyslipidemia, hypertension, deep vein thrombosis, and stroke. They additionally have complications related to immunomodulatory therapies. The majority of children with steroid resistant nephrotic syndrome (SRNS) have a poor prognosis and progress to ESKD. A subset of these patients experience disease recurrence in the newly transplanted kidney. The overall goal of this study is to generate a novel classification tool to predict disease prognosis and therapeutic responsiveness for children with NS.

Precision medicine approaches to the treatment of SRNS have focused on monogenic forms of disease. Mutations in more than 53 genes are associated with NS and are detected in up to 30% of cases with SRNS8-12. The majority of these genes are integral to the structure of kidney “podocyte” cell, which is defective in NS. However, all forms of monogenic NS are not equal their disease phenotype10,12-14. While previously assumed that monogenic NS should be nonresponsive to immunomodulatory medications, previous studies from National and European cohort studies have shown that some patients with monogenic NS achieve disease remission with immunosuppressive therapy. While patients with monogenic NS generally progress to ESKD faster than non-genetic NS10, they rarely experience relapse post-kidney transplantation. These studies are foundational to overall precision approaches to SRNS, however, more granular information on clinical outcomes related to each affected gene is needed in order to truly create precision approaches to treatment.

Furthermore, the relationship between genotype and clinical phenotype is likely complex. Clinical characteristics, such as age and serum albumin level at presentation, degree of proteinuria, and rapid rate of progression may predict recurrence of disease post-kidney transplantation3. Complex interactions between genetic and clinical elements requires a data approach adept to multiple layers of data.The proposed study will utilize machine learning to analyze clinical and genetic data obtained from electronic health record (EHR) review in order to create a novel classifier for children with NS.

Hypothesis: Aggregated EHR and genetic data can be used to create a novel classifier for children with NS that predicts disease prognosis and therapeutic responsiveness.

Application of machine learning techniques requires a large population. The national PEDSnet network of 8 pediatric health systems with aggregated data on > 6.5 million patients provides a unique opportunity to learn from such a large clinical pediatric population. Previous work by Denburg et al developed a computable phenotype algorithm (PPV 92%) for the identification of approximately 3000 patients with NS15.This project will utilize the algorithm to identify and analyze data from children who have had genetic testing for monogenic forms of NS as part of their clinical care. Clinical and genetic data will be aggregated with clinical outcomes to establish a novel classifier for children with NS. Specific aims for this project are:

Aim 1. Describe children with SRNS in PEDSnet who have had clinical genetic testing.

Aim 1a. Describe monogenic forms of SRNS within PEDSnet database. We will describe the age of presentation, renal biopsy findings, medication exposures, and associated diagnoses for this population.

Aim 1b. Determine whether children with monogenic forms of SRNS achieve partial vs. complete remission of disease. If patients are able to achieve remission, we will determine therapeutic responsiveness.

Aim 1c. Identify the proportion of children with monogenic forms of SRNS who progress to ESKD as well as the rate of progression. Determine the proportion of children who experience relapse post-kidney transplant.

Aim 2. Apply machine learning to clinical EHR data, genetic testing results, and therapeutic outcomes to develop a novel classifier for children with NS.

Aim 2a. Determine biologically relevant signals for “feature selection” (eg. genetic mutation, age at presentation) that are associated with prognosis and/or therapeutic response and may be used for machine learning models.

Aim 2b. We will use approximately 80% of the cohort as “training” data, and 20% of the cohort as “test” data for classifier development. Future studies will be required to validate the classifier in a separate cohort.

Foundation for future clinical trials: This study will use machine learning techniques to create a novel classifier utilizing clinical and genetic testing results to predict patient prognosis and therapeutic responsiveness. The classifier may serve to identify those patients who are unlikely to respond to any therapy; these patients are most likely to benefit from early enrollment in clinical trials of novel therapeutics.