Shahriar Noroozizadeh, ML PhD at CMU

Hi, I'm Shahriar 👋

I am a joint Machine Learning and Public Policy Management Ph.D. student at the Machine Learning Department of the School of Computer Science and Heinz College of Information Systems and Public Policy at Carnegie Mellon University. I am very fortunate to be advised by Prof. George Chen (Heinz and MLD) and Prof. Jeremy Weiss (National Library of Medicine at NIH). I also work with Prof. Zachary Lipton as my MLD Mentor. During my Ph.D., I have also spent time in conducting research in industry. I spent a summer at Sanofi Inc. working on mRNA-Language Models, collaborating with Saeed Moayedpour, Sven Jager, and Ziv Bar-Joseph. I am also currently working at Google Research, and I am fortunate to be hosted by Vaishnavh Nagarajan and to collaborate with Elan Rosenfeld on understanding how deep sequence models such as Transformers and Mamba tend to memorize geometrically.
📄 View Resume (Oct 2025)

Research:

My research focuses on interpretable representation learning for temporal data with application in healthcare. I am specifically interested in developing machine learning methodology uncovering temporal representations that enhance our understanding of the evolving health status of patients, shedding light on the underlying mechanisms over time. I am also particularly interested in sequence models, especially how the next-token prediction paradigm in large language models shapes both their strengths and limitations, and how these models exhibit implicit forms of memory and reasoning that go beyond simple associative recall. In addition, I am interested in the intersection of machine learning and information systems management, and how we can develop and utilize machine learning tools for high-stakes decision-making scenarios such as those prominent in healthcare management, with a particular emphasis on causal inference and survival analysis.

Research Interests: Representation Learning, Machine Learning for Healthcare, Multimodal Machine Learning, Decision Making, Reinforcement Learning, ML for Temporal Data, Interpretability

Pre-historic:

I earned two master's degrees from CMU, Master's of Science in Biomedical Engineering (Thesis in Neuromodulation) in 2020 and Master's of Science in Machine Learning in 2022. Prior to joining CMU, I graduated with high honours from the University of British Columbia with the Bachelor of Applied Science in Engineering Physics (with Electrical Engineering and Computer Science specialization) in 2018. At UBC I researched on Automated Pathology and worked on GPU Accelerated Photoacousitc Tomography at the Robotics and Control Laboratory under the supervision of Prof. Tim Salcudean.

Recent Research and Publications

Deep sequence models tend to memorize geometrically; it is unclear why.

Deep sequence models tend to memorize geometrically; it is unclear why.
[Preprint Link Available Soon]

Abstract: We present a clean and analyzable phenomenon that contrasts the predominant associative view of Transformer memory with a nascent geometric view. Concretely, we present an in-weights path-finding task where a next-token learner succeeds in planning ahead, despite the task being adversarially constructed. This observation is incompatible with memory as strictly a storage of local associations; instead, training with gradient descent must have synthesized a geometry of global relationships from witnessing mere local associations. While such a geometric memory may seem intuitive in hindsight, we argue that its emergence cannot be easily explained by various pressures, be it statistical, or architectural, or supervisory. To make sense of this, we draw a connection to an open question in the simpler Node2Vec algorithm, and we provide empirical clues to a closed-form solution for what graph embeddings are learned. Our insight is that global geometry arises from a spectral bias that--in contrast to prevailing intuition--does not require low dimensionality of the embeddings. Our study raises open questions concerning implicit reasoning and the bias of gradient-based memorization, while offering a simple example for analysis. Our findings also call for revisiting theoretical abstractions of parametric memory in Transformers.

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
[Preprint Link Available Soon] [Data & Code]

Abstract: Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from causal survival forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE‐Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial. Across synthetic, semi-synthetic, and real-world settings, we provide the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations. SurvHTE‐Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods.

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare
[Preprint: ArXiv] [Code]

Abstract: Clinical case reports encode rich, temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings -- extracted via an LLM-assisted annotation pipeline -- serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.

PubMed Open Access Textual Times Series Corpus: Reconstructing patient trajectories from clinical case reports using LLMs

PubMed Open Access Textual Times Series Corpus: Reconstructing patient trajectories from clinical case reports using LLMs
[Preprint: ArXiv [PMOA-TTS]] [Code] [Data]
[Preprint: ArXiv [Textual Time Series corpus for Sepsis (T2S2)]]

Abstract: Understanding temporal dynamics in clinical narratives is essential for modeling patient trajectories, yet large-scale temporally annotated resources remain limited. We present PMOA-TTS, the first openly available dataset of 124,699 PubMed Open Access (PMOA) case reports, each converted into structured (event, time) timelines via a scalable LLM-based pipeline. Our approach combines heuristic filtering with Llama 3.3 to identify single-patient case reports, followed by prompt-driven extraction using Llama 3.3 and DeepSeek R1, resulting in over 5.6 million timestamped clinical events. To assess timeline quality, we evaluate against a clinician-curated reference set using three metrics: (i) event-level matching (80% match at a cosine similarity threshold of 0.1), (ii) temporal concordance (c-index > 0.90), and (iii) Area Under the Log-Time CDF (AULTC) for timestamp alignment. Corpus-level analysis shows wide diagnostic and demographic coverage. In a downstream survival prediction task, embeddings from extracted timelines achieve time-dependent concordance indices up to 0.82 ± 0.01, demonstrating the predictive value of temporally structured narratives. PMOA-TTS provides a scalable foundation for timeline extraction, temporal reasoning, and longitudinal modeling in biomedical NLP. The dataset is available at: https://huggingface.co/datasets/snoroozi/pmoa-tts.

The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis

The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis
[Paper: CHIL-2025] [Code]

Abstract: This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.

mRNA-LM: full-length integrated SLM for mRNA analysis

mRNA-LM: full-length integrated SLM for mRNA analysis
[Paper: Nucleic Acids Research Journal] [Code]
[Patent]

Abstract: The success of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) messenger RNA (mRNA) vaccine has led to increased interest in the design and use of mRNA for vaccines and therapeutics. Still, selecting the most appropriate mRNA sequence for a protein remains a challenge. Several recent studies have shown that the specific mRNA sequence can have a significant impact on the translation efficiency, half-life, degradation rates, and other issues that play a major role in determining vaccine efficiency. To enable the selection of the most appropriate sequence, we developed mRNA-LM, an integrated small language model for modeling the entire mRNA sequence. mRNA-LM uses the contrastive language–image pretraining integration technology to combine three separate language models for the different mRNA segments. We trained mRNA-LM on millions of diverse mRNA sequences from several different species. The unsupervised model was able to learn meaningful biology related to evolution and host–pathogen interactions. Fine-tuning of mRNA-LM allowed us to use it in several mRNA property prediction tasks. As we show, using the full-length integrated model led to accurate predictions, improving on prior methods proposed for this task.

T5-generated clinical-Language summaries for DeBERTa Report Analysis (TLDR)

T5-generated clinical-Language summaries for DeBERTa Report Analysis (TLDR)
[Paper: SemEval-2024 at NAACL] [Code]

Abstract: This paper introduces novel methodologies for the Natural Language Inference for Clinical Trials (NLI4CT) task. We present TLDR (T5-generated clinical-Language summaries for DeBERTa Report Analysis) which incorporates T5-model generated premise summaries for improved entailment and contradiction analysis in clinical NLI tasks. This approach overcomes the challenges posed by small context windows and lengthy premises, leading to a substantial improvement in Macro F1 scores: a 0.184 increase over truncated premises. Our comprehensive experimental evaluation, including detailed error analysis and ablations, confirms the superiority of TLDR in achieving consistency and faithfulness in predictions against semantically altered inputs.

Temporal-Supervised Contrastive Learning: Modeling Patient Risk Progression

Temporal-Supervised Contrastive Learning: Modeling Patient Risk Progression
[Paper: ML4H] [Paper: AAAI - R2HCAI Workshop] [Code]

Abstract: We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient’s data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to “data augmentation”, a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.

Contrastive Learning Based Interpretable Hospital Discharge Delay Prediction

Abstract: We addressed the significant challenge of delays in patient discharge across hospitals. Over an 11-month period, more than 63% of discharges at four UPMC hospitals were delayed, leading to costs of an estimated $6.6 million in the sampled hospital units. These delays adversely affect patient experience and health outcomes, exacerbated by issues like the lack of post-discharge patient transportation and ineffective capacity management in the health system. Throughout the CMLH fellowship, we aimed to mitigate these issues by developing a discharge delay prediction module. This initiative was divided into two phases: (1) Length of Stay Prediction: Various regression models were benchmarked using prehospital data. Predicting longer lengths of stays posed challenges, mainly due to their infrequent occurrence in the dataset. (2) Predictability Analysis: Building on initial insights, the prediction task was refined based on length of stay percentiles, identifying patients with more predictable stays versus those harder to forecast. A key innovation in this study was the application of a contrastive learning approach. This methodology significantly outperformed traditional models, including Random Forest, XGBoost, Support Vector Machines, Logistic Regression, and Fully-Connected Neural Networks. By leveraging the contrastive learning paradigm, the study offers a robust solution to predict patient discharge times, providing valuable guidance for hospital management and optimizing patient flow.

Pre-trained CLIP Encoder for Embodied Instruction Following in ALFRED

Pre-trained CLIP Encoder for Embodied Instruction Following in ALFRED
[Paper]

Abstract: We introduce a method employing pre-trained CLIP encoders to enhance model generalization in the ALFRED task. In contrast to previous literature where CLIP replaces the visual encoder, we suggest using CLIP as an additional module through an auxiliary object detection objective. We validate our method on the recently proposed Episodic Transformer architecture and demonstrate that incorporating CLIP improves task performance on the unseen validation set. Additionally, our analysis results support that CLIP especially helps with leveraging object descriptions, detecting small objects, and interpreting rare words.

Automatic Brain Pathology Analysis for Traumatic Brain Injury

Automatic Brain Pathology Analysis for Traumatic Brain Injury
[Paper]

Abstract: Traumatic brain injury (TBI) is one of the leading causes of death and disability worldwide. Detailed studies of the microglial response after TBI require high throughput quantification of changes in microglial count and morphology in histological sections throughout the brain. In this paper, we present a fully automated end-to-end system that is capable of assessing microglial activation in white matter regions on whole slide images of Iba1 stained sections. Our approach involves the division of the full brain slides into smaller image patches that are subsequently automatically classified into white and grey matter sections. On the patches classified as white matter, we jointly apply functional minimization methods and deep learning classification to identify Iba1-immunopositive microglia. Detected cells are then automatically traced to preserve their complex branching structure after which fractal analysis is applied to determine the activation states of the cells. The resulting system detects white matter regions with 84% accuracy, detects microglia with a performance level of 0.70 (F1 score, the harmonic mean of precision and sensitivity) and performs binary microglia morphology classification with a 70% accuracy. This automated pipeline performs these analyses at a 20-fold increase in speed when compared to a human pathologist. Moreover, we have demonstrated robustness to variations in stain intensity common for Iba1 immunostaining. A preliminary analysis was conducted that indicated that this pipeline can identify differences in microglia response due to TBI. An automated solution to microglia cell analysis can greatly increase standardized analysis of brain slides, allowing pathologists and neuroscientists to focus on characterizing the associated underlying diseases and injuries.

Working Papers

2025	Deep sequence models tend to memorize geometrically; it is unclear why., [Link Available Soon] - S. Noroozizadeh, V. Nagarajan, E. Rosenfeld, S. Kumar
2025	SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis, [Link Available Soon] [Data & Code], - S. Noroozizadeh , X. Shen , J. Weiss, G. Chen

Publications

2025	The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis [Code], Conference on Health, Inference, and Learning (CHIL) 2025 - S. Noroozizadeh, P. Welle, J. Weiss, G. Chen
2025	mRNA-LM: full-length integrated SLM for mRNA analysis [Code], Nucleic Acids Research - S. Li, S. Noroozizadeh, S. Moayedpour, L. Kogler-Anele, Z. Xue, D. Zheng, F. Ulloa Montoya, V. Agarwal, Z. Bar-Joseph, S. Jager
2024	TLDR at SemEval-2024 Task 2: T5-generated clinical-Language summaries for DeBERTa Report Analysis [Code], Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) at Association for Computational Linguistics (NAACL) - S. Das , V. Samuel , S. Noroozizadeh *
2023	Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression [Code], Machine Learning for Health (ML4H) 2023 - S. Noroozizadeh, J. Weiss, G. Chen
2023	Temporal Supervised Contrastive Learning with Applications to Tabular Time Series Data, AAAI 2023 R2HCAI Workshop - S. Noroozizadeh, J. Weiss, G. Chen
2022	ET tu, CLIP? Addressing Common Object Errors for Unseen Environments, CVPR 2022 Embodied-AI Workshop - Y.W. Byon , C. Jiao , S. Noroozizadeh , J. Sun , R. Vitiello *
2019	An end-to-end system for automatic characterization of iba1 immunopositive microglia in whole slide imaging, Neuroinformatics Journal - A.D. Kyriazis , S. Noroozizadeh , A. Refaee , W. Choi , L.T. Chu *, A. Bashir, W.H. Cheng, R. Zhao, D.R. Namjoshi, S.E. Salcudean, C.L. Wellington, G. Nir

Preprints

2025	PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus [Data], ArXiv - S. Noroozizadeh , S. Kumar , G. Chen, J. Weiss
2025	Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare [Code], ArXiv - S. Noroozizadeh , S. Kumar , J. Weiss
2025	Reconstructing sepsis trajectories from clinical case reports using LLMs: the textual time series corpus for sepsis, ArXiv - S. Noroozizadeh, J. Weiss

Honours and Awards

[2024-2025] Presidential Fellowship, Tata Consultancy Services (TCS)
[2023-2026] Doctoral Fellowship, Natural Sciences and Engineering Research Council of Canada (NSERC)
[2023] Suresh Konda Memorial, Ph.D. First Research Paper Award, (CMU Heinz College)
[2022-2023] Center for Machine Learning and Health, Digital Health Innovation Fellowship [Link]
[2018-2019] Carnegie Mellon University Presidential Fellowship (CMU College of Engineering CIT)
[2017] UBC Self-Directed Research Abroad Award (Computational Gene Sequencing Research at USC)
[2017] Award for Excellence in Biomedical Engineering Student Design and Innovation as Finalist, Medical Device Development Centre (MDDC), Vancouver, Canada [Link]
[2016] Coordinated International Experience Award (ETH Zürich)
[2016] The Google Games 2016 (UBC): Third Place / 1st Place in Coding Challenge (Google) [Link]
[2015] IEEEXtreme 9.0 Programming Contest: 1st Place at UBC, 4th in Canada (IEEE)
[2014-2018] Applied Science Dean’s Honour List (4x)
[2014] Science Scholar Designation (University of British Columbia)
[2014] First Year Enriched Physics Top Echelon (University of British Columbia)
[2013] Chancellor’s Scholar (University of British Columbia)
[2012] Certificate of Distinction, Canadian Senior Mathematics Contest (University of Waterloo)

Teaching Experience

[2024] PhD Probabilistic Graphical Models, CMU
[2023] Machine Learning for Problem Solving, CMU
[2022 - 2023] Unstructured Data Analytics, CMU
[2022 - 2025] PhD Microeconomics, CMU
[2020] Neural Signal Processing, CMU
[2020] Fundamentals of Computational BME, CMU
[2016 - 2017] Algorithms and Data Structures, UBC
[2014] Computer Science Fundamentals, UBC

Services

[2023, 2024, 2025, 2026]
Reviewer, ICLR
[2023, 2024, 2025]
Reviewer, NeurIPS
[2023, 2024, 2025] Reviewer, ML4H
[2023, 2024, 2025] Reviewer, CHIL
[2025] Reviewer, MLHC
[2023] Reviewer, AAAI

Work Experience

Google Research (New York, NY)

Doctoral Researcher

A.I. PhD Student Researcher [2025]: • Isolated a clean and analyzable instance of implicit in-weights reasoning in Transformers, demonstrating that their memory is better characterized by global geometric structure rather than purely local associative storage.
• Provided both empirical and theoretical evidence connecting this emergent geometric memory to spectral bias in Node2Vec-style dynamics, offering new insights into how Transformers generalize beyond local associations.
• Investigating the sufficiency of next-token prediction (NTP) as a training paradigm for large language models, and analyzing alternative multi-token objectives to enable deeper reasoning.
• Combining theoretical and empirical analysis in controlled synthetic settings and downstream language tasks to identify representational and learning bottlenecks inherent to NTP.
• Designing and executing large-scale experiments leveraging Google’s compute infrastructure, aiming to refine methods and produce publishable advances in sequence modeling objectives for language models.

Sanofi (Cambridge, MA)

Artificial Intelligence Research Scientist Intern

A.I. Research Scientist Intern [2024]: • Co-led the development of the mRNA-LM model, a language model built from scratch and pretrained on millions of full-length mRNA sequences, achieving state-of-the-art performance on various mRNA prediction tasks, including structure prediction, localization, and translation efficiency.
• Designed and implemented a contrastive learning-based multimodal joint representation inspired by CLIP, which enhanced the alignment of embeddings from different mRNA regions (5' UTR, CDS, and 3' UTR) and significantly improved the downstream predictive performance of the full-length mRNA language model.
• Spearheaded submitting a paper on mRNA-LM to Nucleic Acids Research journal (IF: 16.8) and supported the filing of a patent for the mRNA-LM project, showcasing innovative methodologies and findings. [Paper] [Patent]
• Contributed to the project Many-Shot In-Context Learning for Molecular Inverse Design, developing a semi-supervised learning method utilizing Large Language Models (LLMs) to improve molecular design and lead optimization. Implemented a multi-modal LLM framework for interactive molecular structure modification using text instructions.
• Collaborated on integrating Large Language Models (LLMs) into the Bayesian Optimization framework to guide optimization directions for reaction yield in drug discovery, achieving superior performance compared to human experts in selecting optimal reactions and refining design pipelines.

Microsoft

Software Development Engineering Intern

Software Development Engineering Intern [2015]: Main focus areas researched and worked on during this internship included: Windows 10 Universal Application Platform (UAP), Windows 10 NFL Application, Development of a Key Performance Indicator (KPI) System, Mocking Framework Development, Coded User Interface (UI) Automation and Build Machine Automation Development.

Philips Healthcare Research

Research and Development Intern

Research and Development Intern [2016]: Developed an electronic nose sensor that is capable of selectively and sensitively detect biomarkers in exhaled breath to improve the emergency diagnosis of lung infections for patients with respiratory diseases including Acute Respiratory Distress Syndrome (ARDS). I designed a standalone signal processing algorithm and application tailored for gas chromatography data, which effectively isolated the presence of octane—a critical biomarker of ARDS in exhaled breath. This application was instrumental in enhancing the accuracy and reliability of the electronic nose sensor, especially given the challenges of non-real-time data processing.

Past Research and Project Experiences

Many-Shot In-Context Learning for Molecular Inverse Design

“BERT, do you still love me?”
A painful perspective from CRF

Model-Based Reinforcement Learning with Probabilistic Ensemble and Trajectory Sampling

Semi-Supervised Support Vector Machine (S3VM)

Transcranial Focused Ultrasound Stimulation (tFUS)

A GPU-Accelerated Inversion Algorithm for Photoacoustic Tomography

Pre-clustering RNA sequences Database for Long-read de Novo Transcriptome Error Correction

Rescue-Bot: BatBot Rescuing Pets from Fire

📣 News

[September 2025] Paper accepted to NeurIPS 2025 Workshop on Foundations of Reasoning in Language Models! [Paper Link Available Soon]
[May 2025] Spending Summer and Fall 2025 as an AI PhD Researcher at Google Research!
Investigating how next-token learners reason and memorize.
[April 2025] Causal Survival-Analysis Paper accepted to Conference on Health, Inference, and Learning (CHIL)![Paper] [Code]
[March 2025] ML-Driven Glucose Prediction Paper accepted to Biosensors Journal! (Collaboration with Pardis Sadeghi)[Paper]
[February 2025] First Patent published for predicting mRNA properties using Large Language Models! (Work from Sanofi's Internship)[Paper] [Code]
[January 2025] mRNA-LM Paper accepted to Nucleic Acids Research (NAR) Journal! (Work from Sanofi's Internship)[Paper] [Code]
[Sep 2024] Awarded Tata Consultancy Services (TCS) Presidential Fellowship
[May 2024] Spending Summer 2024 as an AI Research Scientist Intern at Sanofi Inc.
[Apr 2024] TLDR Paper accepted to SemEval-2024 at NAACL![Paper] [Code]
[Nov 2023] Temporal-SCL Paper accepted to Machine Learning for Health (ML4H) Conference![Paper] [Code]
[Sep 2023] Awarded Natural Sciences and Engineering Research Council of Canada (NSERC) CGS-D/PGS-D Fellowship
[May 2023] Awarded best first paper award at Heinz
[Feb 2023] Oral Presentation at AAAI'23 Representation Learning for Responsible Human-Centric AI [Paper] [Video]
[Sep 2022] Awarded Fellowship in Digital Health Innovation from Center for Machine Learning and Health (CMLH) at CMU
[Jun 2022] Poster at CVPR'22 Embodied AI Workshop [Paper]
[May 2022] Graduated from Machine Learning Master's at CMU!
[Sep 2021] Started a joint PhD at Heinz College and Machine Learning Department at CMU!
[Dec 2020] Graduated from Biomedical Engineering Master's at CMU! [Thesis]
[Jan 2019] Paper accepted to Neuroinformatics Journal! (AShLAW 🎉) [Paper]
[Sep 2018] Awarded CMU Presidential Fellowship from College of Engineering
[May 2018] Graduated from Engineering Physics with EECS Specilization at UBC!

Recent Research and Publications

Deep sequence models tend to memorize geometrically; it is unclear why.

Deep sequence models tend to memorize geometrically; it is unclear why. [Preprint Link Available Soon]

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis [Preprint Link Available Soon] [Data & Code]

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare [Preprint: ArXiv] [Code]

PubMed Open Access Textual Times Series Corpus: Reconstructing patient trajectories from clinical case reports using LLMs

PubMed Open Access Textual Times Series Corpus: Reconstructing patient trajectories from clinical case reports using LLMs [Preprint: ArXiv [PMOA-TTS]] [Code] [Data] [Preprint: ArXiv [Textual Time Series corpus for Sepsis (T2S2)]]

The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis

The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis [Paper: CHIL-2025] [Code]

mRNA-LM: full-length integrated SLM for mRNA analysis

mRNA-LM: full-length integrated SLM for mRNA analysis [Paper: Nucleic Acids Research Journal] [Code] [Patent]

T5-generated clinical-Language summaries for DeBERTa Report Analysis (TLDR)

T5-generated clinical-Language summaries for DeBERTa Report Analysis (TLDR) [Paper: SemEval-2024 at NAACL] [Code]

Temporal-Supervised Contrastive Learning: Modeling Patient Risk Progression

Temporal-Supervised Contrastive Learning: Modeling Patient Risk Progression [Paper: ML4H] [Paper: AAAI - R2HCAI Workshop] [Code]

Contrastive Learning Based Interpretable Hospital Discharge Delay Prediction

Contrastive Learning Based Interpretable Hospital Discharge Delay Prediction

Pre-trained CLIP Encoder for Embodied Instruction Following in ALFRED

Pre-trained CLIP Encoder for Embodied Instruction Following in ALFRED [Paper]

Automatic Brain Pathology Analysis for Traumatic Brain Injury

Automatic Brain Pathology Analysis for Traumatic Brain Injury [Paper]

Working Papers

Publications

Preprints

Honours and Awards

Teaching Experience

Services

Work Experience

Google Research (New York, NY)

Sanofi (Cambridge, MA)

Microsoft

Philips Healthcare Research

Past Research and Project Experiences

Many-Shot In-Context Learning for Molecular Inverse Design

Many-Shot In-Context Learning for Molecular Inverse Design

“BERT, do you still love me?” A painful perspective from CRF

“BERT, do you still love me?” A painful perspective from CRF

Model-Based Reinforcement Learning with Probabilistic Ensemble and Trajectory Sampling

Model-Based Reinforcement Learning with Probabilistic Ensemble and Trajectory Sampling (PETS)

Semi-Supervised Support Vector Machine (S3VM)

Semi-Supervised Support Vector Machine (S3VM)

Transcranial Focused Ultrasound Stimulation (tFUS)

Transcranial Focused Ultrasound Stimulation (tFUS)

A GPU-Accelerated Inversion Algorithm for Photoacoustic Tomography

A GPU-Accelerated Inversion Algorithm for Photoacoustic Tomography

Pre-clustering RNA sequences Database for Long-read de Novo Transcriptome Error Correction

Pre-clustering RNA sequences Database for Long-read de Novo Transcriptome Error Correction

Rescue-Bot: BatBot Rescuing Pets from Fire

Rescue-Bot: BatBot Rescuing Pets from a Building on Fire

📣 News

Deep sequence models tend to memorize geometrically; it is unclear why.
[Preprint Link Available Soon]

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
[Preprint Link Available Soon] [Data & Code]

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare
[Preprint: ArXiv] [Code]

PubMed Open Access Textual Times Series Corpus: Reconstructing patient trajectories from clinical case reports using LLMs
[Preprint: ArXiv [PMOA-TTS]] [Code] [Data]
[Preprint: ArXiv [Textual Time Series corpus for Sepsis (T2S2)]]

The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis
[Paper: CHIL-2025] [Code]

mRNA-LM: full-length integrated SLM for mRNA analysis
[Paper: Nucleic Acids Research Journal] [Code]
[Patent]

T5-generated clinical-Language summaries for DeBERTa Report Analysis (TLDR)
[Paper: SemEval-2024 at NAACL] [Code]

Temporal-Supervised Contrastive Learning: Modeling Patient Risk Progression
[Paper: ML4H] [Paper: AAAI - R2HCAI Workshop] [Code]

Pre-trained CLIP Encoder for Embodied Instruction Following in ALFRED
[Paper]

Automatic Brain Pathology Analysis for Traumatic Brain Injury
[Paper]

“BERT, do you still love me?”
A painful perspective from CRF