2026 |
Tianle Yang, Chengzhe Sun, Phil Rose, Cassandra L. Jacobs, Siwei Lyu, : Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation. Computer Speech and Language 100. This study tests the ability of neural TTS models to reproduce consonant-induced f0 perturbation. Results show accurate reproduction for high-frequency words but poor generalization to low-frequency items, suggesting that the TTS architectures examined rely more on lexical-level memorization rather than abstract segmental-prosodic encoding. A copy of this paper can be downloaded before 5th May 2026 from: |
2025 |
Tianle Yang, Chengzhe Sun, Siwei Lyu, Phil Rose: Forensic deepfake audio detection using segmental speech features. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Hawaii. This study explores the potential of using acoustic features of segmental speech sounds to detect deepfake audio. The results demonstrate that certain segmental features commonly used in forensic voice comparison are effective in identifying deepfakes, whereas some global features provide little value. |
2024 |
Applications of the Likelihood Ratio Framework in Forensic Speech Science Cases Involving Disputed Utterances, Tampering and Voice Lineups. Peer-reviewed paper in Olga Maxwell & Rikke Bundgaard-Nielsen (eds.) 19th Australasian Int'l conf. on Speech Science and Technology, Melbourne. Examples are given of the use of the likelihood ratio framework in real world case-work involving disputed utterances and tampering, and its theoretical application in voice lineups. Some of the problems in application are pointed out, especially with respect to estimation of priors. [pdf] |
2022 |
Likelihood Ratio-based Forensic Semi-automatic Speaker Identification with Alveolar Fricative Spectra in a Real-world Case. Peer-reviewed paper accepted for 18th Australasian Int'l conf. on Speech Science and Technology, Canberra. [this is intended as a useful summary of the carbombing case in the 2017 report below] [pdf] |
2021 |
Forensic Semi-automatic Voice Comparison - an Explanation using Chinese Speech Sounds. This paper was originally commissioned as a chapter for the Palgrave Handbook of Chinese Language Studies, but I withdrew it due to incompetent copyediting. [pdf] |
2018 |
(with ZHANG Cuiling) Conversational Style Mismatch: its Effect on the Evidential Strength of Long-term F0 in Forensic Voice Comparison. Proc. 17th Australasian Int'l conf. on Speech Science and Technology, Sydney (SST 2018), pp. 157-160. [full paper peer review]. [pdf] |
2017 |
Likelihood ratio-based forensic voice comparison with higher level features: research and reality. In Eduardo Lleida & Luis J. Rodriguez-Fuentes (eds.) Recent Advances in Speaker and Language Recogniton and Characterisation. Computer Speech and Language Special Issue : 476-502. [This is an extended and revised version of Bruce Wang's and my Odyssey 2016 paper. It includes additional results of testing with another Cantonese higher level feature - bandlimited cepstral spectrum of nasal /m/ - and discussion of more real-world forensic voice comparison cases.] |
2017 |
Report in a real Forensic Voice Comparison case involving the (non-terroist) blowing up of a car. This is the annonymised report of an interesting forensic voice comparison I did in 2017, where the aim was exculpation by identification. The case was interesting for several reasons. One of which was the feature used: it compared questioned and known data with respect to the spectra of their alveolar fricatives /s/ and /z/, parametrised with band-limited cepstral coefficients between 1k and 8k. Another reason was the sparseness of the questioned data - one token each of [s] and [z]. The report also contains a section on validation, which many now consider an essential part of any case-work (you need to be able to demonstrate that your system does what you claim). |
2017 |
Evaluating Strength of Evidence in Voice Lineups This is a powerpoint presentation at a festschrift-symposium to honour Andy Butcher on his seventieth birthday. I address how to properly – that is rationally – evaluate the outcome of a voice lineup. If an earwitness picks out the suspect in a lineup of 10 – what evidential value should you logically assign to that? I make use of a 2002 paper by D. Hodgson (Judge of the NSW Court of Appeal): A LAWYER LOOKS AT BAYES’ THEOREM. The Australian Law Journal 76: 109-118. Hodgson’s paper is an early attempt to explain to the legal profession how to rationally evaluate the prosecution and defence hypotheses, given the (numerical) evidence adduced in their respective favour. Justice Hodgson uses a gedankenexperiment of a lineup to explain, with Bayes’ theorem, how identification through a properly-conducted lineup can be very powerful, and becomes more powerful as the number in the lineup increases. This presentation explains how to quantify the strength of evidence in a voice lineup and then adduces some real data on speaker recognition by naïve unfamiliar listeners to see how useful the strength of evidence can be in a real world ‘multiple auditory confrontation’. |
2016 |
(with Wang Xiao) Cantonese forensic voice comparison with higher-level features: likelihood ratio-based validation using F-pattern and tonal F0 trajectories over a disyllabic hexaphone. Accepted after peer review for Odyssey 2016 Speaker and Language Recognition Workshop, Bilbao. |
2016 |
Forensic speech science report in a case involving the alleged tampering of a recording. |
2015 |
Forensic Voice Comparison with Monophthongal Formant Trajectories - a likelihood ratio-based discrimination of "Schwa" vowel acoustics in a close social group of young Australian females. Proc. Int'l Conf. on Acoustics Speech & Signal Processing (ICASSP 40) Brisbane, pp.4819-4823. [pdf] |
2013 |
Where the Science Ends and the Law Begins: likelihood ratio-based forensic voice [pdf] |
2013 |
More is better: Likelihood ratio-based forensic voice comparison with vocalic segmental cepstra frontends. Int'l Journal of Speech Language and the Law 20/1: 77-116. [pdf] |
2013 |
David Vandyke, Phil Rose, Michael Wagner. The Voice Source in Forensic-Voice-Comparison: a Likelihood-Ratio based Investigation with the Challenging YAFM Database. Paper presented at International Association of Forensic Phonetics and Acoustics Conference, Tampa, Florida. |
2012 |
Morrison, G. S., Rose, P., Zhang, C. Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice. Australian Journal of Forensic Sciences, 44/2, 155–167. [pdf] |
2012 |
Where the Science Ends and the Law Begins: Theory and Reality in Likelihood Ratio-based Forensic Voice Comparison. In D. T. Toledano, A. O. Giménez, A.Teixeira,J. González-Rodríguez, L. Hernández-Gómez, R. S. Segundo-Hernández & D. Ramos-Castro (eds.) Proceedings of Iberspeech 2012 VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop, pp 27-39. ISBN: 84-616-1535-2. http://iberspeech2012.ii.uam.es/ |
2012 |
Caiyu Wang & Phil Rose (2012) Likelihood Ratio-based Forensic Voice Comparison with Cantonese /i/ F-pattern and Tonal F0. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 209-212. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
Jialin Pang & Phil Rose (2012) Likelihood Ratio-based Forensic Voice Comparison with the Cantonese Diphthong /ei/ F-pattern. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 205-208. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
Joanne Jingwen Li & Phil Rose (2012) Likelihood Ratio-based Forensic Voice Comparison with F-pattern and Tonal F0 from the Cantonese /eu/ Diphthong. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 201-204. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
Aishu Chen & Phil Rose (2012) Likelihood Ratio-based Forensic Voice Comparison with the Cantonese Triphthong /iau/. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 197-200. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
Ruijuan Zheng & Phil Rose (2012) Likelihood Ratio-based Forensic Voice Comparison with Cantonese Short-term Fundamental Frequency Distribution Parameters. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 153-156. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
Alex C.Y. Yim & Phil Rose (2012) Are Nasals Better? Likelihood Ratio-based Forensic Voice Comparison with Segmental Cepstra from Cantonese and Japanese Syllabic/Mora Nasals. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 5-8. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
"Yes, not too bad – Likelihood Ratio-Based Forensic Voice Comparison in a $150 Million Telephone Fraud." In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethrope, J. Shaw & I. Yuen (eds.), Proceedings of the 14th Australasian International Conference on Speech Science and Technology, pp. 161-164. Australian Speech Science & Technology Association: Sydney, ISSN 1039-0227. |
2012 |
"The Likelihood Ratio goes to Monte Carlo: the effect of reference sample size on likellihood-ratio estimates." University of New South Wales Forensic Speech Science Conference. |
2011 |
Forensic Voice Comparison with Japanese Vowels – a likelihood ratio-based approach using segmental cepstra. In Wai-Sum Lee & Eric Zee (eds.), Proc. 17th International Congress of Phonetic Sciences, Hong Kong: 1718-1721. |
2011 |
Forensic Voice Comparison with Secular Shibboleths – a hybrid fused GMM-Multivariate likelihood-ratio-based approach using alveoli-palatal fricative cepstral spectra. Proc. International Conference on Acoustics Speech & Signal Processing (ICASSP), IEEE. 5900-5903. pdf with embedded video [this will play when downloaded] |
2010 |
G.S. Morrison, J. Epps, P. Rose, T. Thiruvaran, C. Zhang. Measuring reliability in forensic voice comparison. Journal of the Acoustical Society of America 128/2378. |
2010 |
G.S. Morrison, C. Zhang, P. Rose. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system. Forensic Science International 208: 59-65.
|
2010 |
Bernard’s 18 – Vowel Inventory Size and Strength of Forensic Voice Comparison Evidence. In Tabain, Fletcher, Grayden, Hajek, Butcher eds. Proc.13 th Australasian International Conf. Speech Science and Technology, Australasian Speech Science and Technology Association: 30-33. |
2010 |
P. Rose & E. Winter. Traditional Forensic Voice Comparison with Female Formants: Gaussian mixture model and multivariate likelihood ratio analysis. In Tabain, Fletcher, Grayden, Hajek, Butcher eds. Proc.13th Australasian International Conf. Speech Science and Technology (full paper review), Australasian Speech Science and Technology Association: 42-45. |
2010 |
The Effect of Correlation on Strength of Evidence Estimates in Forensic Voice Comparison: Uni- and Multivariate Likelihood ratio-based discrimination with Australian English Vowel Acoustics. International Journal of Biometrics 2/4: 316 – 329. |
2010 |
Combining linguistic and non-linguistic information in likelihood-ratio-based forensic voice comparison. Invited presentation at Special Session on Forensic Voice Comparison, Acoustical Society of America Conference, Cancun, November 2010. The presentation contains some ideas on FVC with a combination of segmental and long-term cepstra. The approaches were developed in later publications on FVC with cepstral spectra of fricatives (2011) and vowels (2013). |
2010 |
M.Wagner et al. The Big Australian Speech Corpus (The Big ASC). Proc. Int'l Australiasian Conference on Speech Science & Technology. 166:170.
|
2009 |
P. Rose & G. S. Morrison. A response to the UK Position Statement on forensic speaker comparison. Intl. Journal of Speech Language and the Law, 16(1): 139 – 163. [pdf] |
2009 |
Y. Kinoshita S. Ishihara & P.Rose Exploring the Discriminatory Potential of F0 Distribution Parameters in Traditional Forensic Speaker Recognition. Intl. Journal of Speech Language and the Law, 16(1): 91 – 111. [This is a slightly revised version of our Odyssey 2008 paper on forensic F0] [pdf] |
2009 |
Report on Evaluation of Disputed Utterance Evidence in R v Bain, New Zealand. |
2009 |
Response to England and Wales Law Commission Paper: The Admissibili8ty of Expert Evidence in Criminal Proceedings in England and Wales: A New Approach to the Determination of Evidentiary Reliability. |
2008 |
Morrison G, Zhang C, Rose P. Forensic Speaker Recognition in Chinese: A Multivariate Likelihood Ratio Discrimination on /i/ and /y/. In Fletcher & Loakes (eds.) Proc. 9th Annual Conference of International Speech Communication Association (Interspeech ’08) ISSN 1990-9772: 1937-1940. |
2008 |
张翠玲 (Zhang C.), Rose, P 基于似然方法的率语音证据评价 Strength Evaluation of Forensic Speaker Recognition Evidence based on Likelihood Ratio Approach 证据科学 [Evidence Science] 16/3: 337-342. |
2008 |
Morrison, G.S., Rose, P., Kinoshita, Y. Extraction of likelihood ratio forensic evidence from the formant trajectory of diphthongs. Journal of the Acoustical Society of America 23(5): 3877. |
2008 |
Kinoshita Y, Ishihara S., Rose, P. Beyond the Long-term Mean: Exploring the Potential of F0 Distribution Parameters in Traditional Forensic Speaker Recognition. In Brummer (ed.) Proc. '08 Odyssey Speaker and Language Recognition Conference. (full paper review) ISBN 978-0-620-40331-3. |
2007 |
Forensic Speaker Discrimination with Australian English Vowel Acoustics Proc Intl Congress of Phonetic Sciences 07. |
2007 |
Gonzalez-Rodriguez, J., Rose, P., Ramos, D.,Torre, D.,Ortega-Garcia, J. Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition. IEEE Transactions on Audio Speech and Language Processing 15/7. [abstract] |
2006 |
Rose, P., Kinoshita, Y., Alderman, T. Realistic Extrinsic Forensic Speaker Recognition with the Diphthong /ai/. In Warren & Watson (eds.) Proc. 11th Australasian International Conf. Speech Science and Technology. |
2006 |
The Intrinsic Forensic Discriminatory Power of Diphthongs. In Warren & Watson (eds.) Proc. 11th Australasian International Conf. Speech Science and Technology. |
2006 |
Accounting for Correlation in Linguistic-Acoustic Likelihood Ratio-based Forensic Speaker Discrimination. In Berkling (ed.) Proc. Odyssey Speaker and Language Recognition Workshop, Puerto Rico. |
2006 |
Forensic Speaker Recognition at the Beginning of the Twenty-first Century – an Overview and a Demonstration. Invited paper. The Australian Journal of Forensic Sciences, 37/2. |
2006 |
Catching Criminals by their Voice - Combining Automatic and Traditional Methods for Optimum Performance in Forensic Speaker Identification. Australian Research Council Grant Proposal. |
2006 |
Technical Forensic Speaker Recognition: evaluation, types and testing of evidence. Invited paper for Computer Speech and Language 20/2-3: 159-191. [pdf] |
2004 |
Rose, P., Lucy, D., Osanai, T.) Linguistic-Acoustic Forensic Speaker Identification with Likelihood Ratios from a Multivariate Hierarchical Random Effects Model – A Non-Idiot’s Bayes’ Approach. In S. Cassidy (ed.) Proc. 10th Australian International Conference on Speech Science & Technology. Australian Speech Science and Technology Association: 492-497. |
2004 |
Technical Forensic Speaker Identification from a Bayesian Linguist’s Perspective. In J. Ortega-Garcia et al. (eds.) ‘Proc. of Odyssey-04, The Speaker and Language Recognition Workshop’, Toledo: 3-10. |
2003 |
The technical comparison of forensic voice samples. In Freckelton & Selby (eds.) Expert Evidence 99. Sydney, Thomson Reuters: 1051-6102. |
2003 |
Rose, P., Osanai T., Kinoshita, Y. Strength of forensic speaker identification evidence: multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. The International Journal of Speech Language and the Law10/2: 179-202. |
2002 |
Forensic Speaker Identification. Taylor & Francis. New York, London. Now also published by Taylor & Francis CRC Press on the web as part of their FORENSICnetBASE (http://www.forensicnetbase.com/). This book has been very favourably reviewed. Reviews can be accessed here. |
2002 |
Rose, P., Osanai T., Kinoshita, Y. Strength of forensic speaker identification evidence: multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. In C. Bow (ed.) Proc. 9th Australian Intl. Conf on Speech Science & Technology, Melbourne: Australian Speech Science & Technology Association: 303-308. |
2002 |
Rose, P. DNA Can't Talk - Some facts about Forensic Speaker Identifcation. Invited presentation, 2002 Media Science Forum - The Science of Terrorism, University of Technology, Sydney, October 29th. |
2001 |
Rose, P., Clermont, F. A Comparison of Two Acoustic Methods for Forensic Speaker Discrimination. Acoustics Australia 29/1: 31-35. |
2000 |
Rose, P., Clermont, F. Comparative performance of cepstrum- and formant-based analyses on similar-sounding speakers for forensic speaker identification. In Michael Barlow (ed.) Proceedings of the 8th Australian International Speech Science and Technology Conference Canberra: Australian Speech Science and Technology Association: 172-177. |
2000 |
Ann Kumar, P. Rose ‘Lexical Evidence for Early Contact between Indonesian Languages and Japanese’. Oceanic Linguistics 39/2: 219 – 255. |
1999 |
Long- and Short-term within-speaker differences in the formants of Australian hello. Journal of the International Phonetics Association.29/1: 1-31. |
1999 |
Differences and Distinguishibility in the Acoustic Characteristics of Hello in Voices of Similar-sounding speakers - a Forensic Phonetic Investigation. Australian Review of Applied Linguistics 21(2): 1-42. |
1998 |
A Forensic Phonetic Investigation into Non-contemporaneous Variation into the F-patern of Similar-sounding Speakers. In Robert Mannell and Jordi Robert-Ribes (eds.) Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP): Vol 2: 217-220. |
1997 |
Identifying Criminals by their Voice: the emerging applied discipline of Forensic Phonetics. Australian Language Matters 5/2: 6-7. |
1996 |
Observations on Forensic Speaker Recognition. Proc 6th International Criminal Law Congress, Organising committee. Organising Committee, 6th ICLC. |
1996 |
Speaker Verification Under Realistic Forensic Conditions. In Paul McCormak & Alison Russel (eds.) Proceedings of the 6th Australian International Conf. on Speech Science and Technology, Australian Speech Science and Technology Association: 109-114. |
1996 |
Rose, P. and Simmons, A. F-pattern variability in Disguise and Over the Telephone - Comparisons for Forensic Speaker Identification. In Paul McCormak & Alison Russel (eds.) Proceedings of the 6th Australian International Conf. on Speech Science and Technology, Australian Speech Science and Technology Association: 121-126. |
1995 |
Rose, P. and Duncan, S. 'Naive auditory identification and discrimination of similar voices by familiar listeners'. Forensic Linguistics 2/1: 1-17. |
1995 |
S. Ran, B. Millar, I.Macleod, P. Rose 'Automatic Vowel Quality Description Using Four Primary Cardinal Vowels'. Proc. 13th Intl. congress of Phonetic Sciences,Vol. 3, pp. 318-321. |
1994 |
S. Ran, B. Millar, I. Macleod, P. Rose 'Automatic Vowel Quality Description using a Cardinal Vowel Reference Model'. In Roberto Togneri (ed.) Proc. 5th Australian Intl. Conf. on Speech Science and Technology. Australian Speech Science and Technology Association, 387-392. |
1986 |
M. O'Kane, J. Gillis, P. Rose, M.Wagner 'Deciphering Speech Waveforms'. Proc. Int'l Conf. on Acoustics Speech & Signal Processing (ICASSP), vol. 11, IEEE, 2227-2230. |