2025

OLMo accuracy vs. Dolma estimated co-occurrence frequency on CASI dataset. Each dot shows a jargon-expansion pair.

Diagnosing our datasets: How does my language model learn clinical information?

Large language models (LLMs) have performed well across various clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational…

Cover of Critical Care Medicine

Development of a Core Critical Care Data Dictionary With Common Data Elements to Characterize Critical Illness and Injuries Using a Modified Delphi Method

OBJECTIVES: To develop the first core Critical Care Data Dictionary (C2D2) with common data elements (CDEs) to characterize critical illness and injuries. DESIGN: Group consensus process using modified Delphi approach. SETTING: Electronic surveys and in-person meetings. SUBJECTS: A multidisciplinary workgroup of clinicians and researchers with expertise in the care of the critically ill and injured….

Illustration of a non-invasive brain imaging system using SPAD arrays to measure cerebral blood flow.

Beneath the surface: revealing deep-tissue blood flow in human subjects with massively parallelized diffuse correlation spectroscopy

Diffuse correlation spectroscopy (DCS) allows label-free, non-invasive investigation of microvascular dynamics deep within tissue, such as cerebral blood flow (CBF). However, the signal-to-noise ratio (SNR) in DCS limits its effective cerebral sensitivity in adults, in which the depth to the brain, through the scalp and skull, is substantially larger than in infants.Therefore, we aim to…

Cover of Nature Reviews Electrical Engineering

Meet the winners of the 2024 Sony Women in Technology Award

Technology research is the driving force of the innovations that shape the world. Sony Group Corporation (Sony) and Nature partnered together to launch the Sony Women in Technology Award to recognize three outstanding early to mid-career researchers from the field of technology. They interviewed the winners of the inaugural 2024 award on the inspirations behind their outstanding…

Velocity-time curve with feature points and scatter plot correlations.

Impact of inlet velocity waveform shape on hemodynamics

Monitoring disease development in arteries, which supply oxygen and nutrients to the body, is crucial and can be assessed using hemodynamic metrics. Hemodynamic metrics can be calculated via computational fluid dynamic simulation of patient-specific geometries. These simulations are known to be heavily influenced by boundary conditions, such as time-dependent inlet flow. However, the effects of…

Workflow diagram of offline modeling and online planning for blood flow.

Real-time virtual intervention for simple and serial coronary artery disease using the HarVI framework

Virtual planning tools that provide intuitive user interaction and immediate hemodynamic feedback are crucial for cardiologists to effectively treat coronary artery disease. Current FDA-approved tools for coronary intervention planning require days of preliminary processing and rely on conventional 2D displays for hemodynamic evaluation. Immersion offered by extended reality (XR) has been found to benefit intervention…

T cells examples

Establishing a massively parallel computational model of the adaptive immune response

Parallel agent-based models of the adaptive immune response can efficiently recapitulate emerging spatiotemporal properties of T-cell motility during clonal selection across multiple length and time scales. Here, we present a distributed, three-dimensional (3D) computational model of T-cell priming, and associated parallel data structures and algorithms that enable fully deterministic cell simulations at scale. We demonstrate…

Hand holder a computer chip

High-performance computing at a crossroads

Over the past four decades, high-performance computing (HPC) has enabled considerable advances in scientific discovery and engineering, spurring technological development across the globe. However, with the demand for precision and fidelity of computational models continuing to grow, HPC faces bottlenecks in data handling, algorithm efficiency, and the scalability of new architectures, especially in fields such…

Origin of fusion oncoproteins

FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional…

PepPrCLIP model training and evaluation

De novo design of peptide binders to conformationally diverse targets with contrastive language modeling

Designing binders to target undruggable proteins presents a formidable challenge in drug discovery. In this work, we provide an algorithmic framework to design short, target-binding linear peptides, requiring only the amino acid sequence of the target protein. To do this, we propose a process to generate naturalistic peptide candidates through Gaussian perturbation of the peptidic…