Modeling Variability in Multimodal Speech Analysis Across the Psychosis Spectrum
Overview
This study presents a multimodal model integrating acoustic and linguistic speech features to predict symptom severity and psychosis-related traits across the psychosis spectrum. The model achieved robust performance with an F1-score of 83% and effectively estimated uncertainty to identify reliable speech markers such as pitch variability and fluency disruptions.
Background
Speech analysis offers valuable behavioral insights into psychosis but is challenged by variability across individuals and contexts, limiting its diagnostic utility. Psychosis exists on a spectrum from high schizotypy to clinical psychosis, necessitating models that can adapt to diverse speech patterns. Prior research has identified speech abnormalities as potential biomarkers, yet integrating multiple speech modalities and accounting for uncertainty remains underexplored. This study addresses these gaps by developing a calibrated multimodal approach to improve symptom prediction and interpretability.
Data Highlights
Metric
Value
Participants
114 (32 early psychosis, 82 low/high schizotypy)
Language
German
Tasks
Structured and narrative speech tasks
Model F1-score
83%
Expected Calibration Error (ECE)
0.045
Key Findings
The multimodal model integrates acoustic and linguistic features to predict psychosis symptom severity effectively.
Uncertainty estimation allows the model to adaptively weight speech modalities based on signal reliability and context.
Speech markers such as pitch variability, fluency disruptions, and spectral instability reliably indicate symptom expression.
The model demonstrated robust and well-calibrated performance across a sample spanning early psychosis and schizotypy.
Accounting for variability in speech improves both accuracy and interpretability of psychosis assessment tools.
Clinical Implications
Incorporating multimodal speech analysis with uncertainty estimation can enhance early detection and monitoring of psychosis spectrum disorders. Clinicians may leverage reliable speech markers identified by the model to support diagnostic evaluations and track symptom progression. This approach also offers potential for scalable, non-invasive assessment tools adaptable to diverse clinical contexts.
Conclusion
This study demonstrates that modeling variability and uncertainty in multimodal speech features significantly improves the prediction and interpretability of psychosis-related symptoms. Such advances pave the way for more precise and accessible speech-based biomarkers across the psychosis spectrum.
by Morteza Rohanian, Roya Hüppi, Farhad Nooralahzadeh, Noemi Dannecker, Yves Pauli, Werner Surbeck, Iris Sommer, Wolfram Hinzen, Nicolas Langer, Michael Krauthammer, Philipp Homan