Phil Rose - Numerical Data for Download

This page contains links to some of the numerical data-sets I have used in my publications on tonal acoustics and forensic speaker identification, and also in the tone demos on this web-site. Many of the data-sets contain multispeaker data, and many include data from non-contemporaneous recordings.

Since these data may be useful in testing hypotheses, I am making them freely available (but with no warranty whatsoever) to anyone wanting to do research on tones or forensic voice comparison.

If you do make use of the data, please reference them by including the following information:

Phil Rose , date, dataset name, URL address. E.g. Rose, Phil (2018) 'Multispeaker long-term F0 distribution parameters in Mandarin', https://philjohnrose.net/numerical_data/index.html

Clicking on "download data" will download a zip file with the data in .txt files.

Clicking on the paper's name will take you to its publication details and the link to the document itself.

NUMERICAL TONE DATA is HERE. NUMERICAL FORENSIC DATA is HERE .

Last updated on: May 21, 2020

ACOUSTIC DATA for TONE

2019

'VoiceSauce interharmonic noise and spectral slope measurements for Cantonese Tones' Data to accompany my Interspeech 2020 submission on Voice quality parameters in Cantonese tone.

There are two files. One contains 23 voice quality measurements of spectral slope and interharmonic noise for the three level Cantonese tones (high level Ia, mid level IIIa and lower-mid level IIIb) for 5 Cantonese males and 5 Cantonese females. The measurements were extracted with VoiceSauce. The other file contains the percent identification errors for each of the ten Cantonese speakers in an experiment trying to identify their three tones from the combinations of spectral slope and interharmonic noise in the first file. There are four different conditions: (1) linear discriminant analysis on raw data, (2) linear discriminant analysis on raw data using leave-one-out cross-validation, (3) linear discriminant analysis on bootstrapped data and (4) probabilistic Fisher discriminant analysis on bootstrapped data. The last condition gives the best results and is reported in the paper.

download data

 

2019

'Tiantai-Sanhezhen citation tone acoustics.' F0 and duration measurements for the citation tone tokens of the speaker in the demo of Tiantai-Sanhezhen tones. There are two data sets for citation tones elicited in two different ways ('shengyun' and 'grouped').

download data

 

2019

'Wencheng citation tone and disyllabic tone sandhi acoustics.' F0 and duration measurements for the citation tone and disyllabic tone sandhi tokens of the Wencheng 文成speaker in the demo of Wencheng tones and tone sandhi. These data were also used in Rose 2016 Complexities of Tonal Realisation in a Right-Dominant Chinese Wu Dialect - Disyllabic Tone Sandhi in a Speaker from Wencheng.

download data

 

2019

'Wenzhou citation tone and disyllabic tone sandhi acoustics.' F0 and duration measurements for the citation tone and disyllabic tone sandhi tokens of the Wenzhou 温州speaker in the demo of Wenzhou tones and tone sandhi. I have also used these data in several papers on Wenzhou tone sandhi: Wenzhou Dialect Disyllabic Lexical Tone Sandhi with First Syllable Entering Tones; Tonal Complexity as Conditioning Factor: More Depressing Wenzhou Dialect Disyllabic Lexical Tone Sandhi ; Independent depressor and register effects in Wu dialect tonology: Evidence from Wenzhou tone sandhi ; "Defying Explanation"? - Accounting for Tones in Wenzhou Dialect Disyllabic Lexical Tone Sandhi.

download data

 

2019

'Multispeaker Shanghai citation tone acoustics.' Mean F0 and duration measurements for the citation tones of 16 young Shanghai speakers (8 male, 8 female). Some speakers were measured by Prof Zhu Xiaonong for his 1999 Lincom book Shanghai Tonetics and its 2004 Chinese version 上海声调实验录. The data were used in several papers on normalisation, e.g. Rose 2016 Comparing Normalisation Stratagies for Citation Tone F0 in Four Chinese Dialects & Rose 1993 A Linguistic Phonetic Acoustic Analysis of Shanghai Tones.

download data

 

2019

'Multispeaker Cantonese citation tone acoustics.' Mean F0 and duration measurements for the citation tones of 10 young Cantonese speakers (5 male, 5 female). Used in several papers on normalisation, e.g. Rose 2016 Comparing Normalisation Stratagies for Citation Tone F0 in Four Chinese Dialects & Rose 2000 Hong Kong Cantonese Citation Tone Acoustics: A Linguistic-Tonetic Study.

download data

 

2019

'Multispeaker Zhangzhou citation tone acoustics.' Mean F0 and duration measurements for the citation tones of 21 Zhangzhou 漳州 speakers (9 male, 12 female). These data are based on the measurements in Huang Yishan's 2018 Australian National University Ph.D. thesis Tones in Zhangzhou: Pitch and Beyond (which should be acknowledged in all references). The data were described in Huang et al's 2016 paper Normalization of Zhangzhou Citation Tones and used in Rose 2016 Comparing Normalisation Stratagies for Citation Tone F0 in Four Chinese Dialects.

download data

 

2019

'Multispeaker Fuzhou citation tone acoustics.' Mean F0 and duration measurements for the citation tones of 10 Fuzhou 福州 speakers (5 male, 5 female). These were extracted and combined from two sources (which should be acknowledged in all references): Cathryn Donohue's 2013 Lincom book Fuzhou Tonal Acoustics and Tonology; and Peng Gongguan's 2011 City University of Hong Kong Ph.D. thesis A Phonetic Study of Fuzhou Chinese. The data were used in Rose 2016 Comparing Normalisation Stratagies for Citation Tone F0 in Four Chinese Dialects.

download data

ACOUSTIC DATA for SPEAKER IDENTIFICATION

2020

'Multispeaker segmental cepstra and formants in Japanese'.

These are the numerical data that were used in several early papers investigating, with Likelihood ratio-based approaches, the strength of evidence expected from the spectral properties of speech segments (i.e. how well speakers can be identified using likelihood ratios).

The data represent the quantification, in four different ways, of the spectral properties of three Japanese segments for 60 male speakers for two non-contemporaneous landline telephone recordings. They were extracted by Dr. Takashi Osanai of the Japanese National Research Institute of Police Science (NRIPS) from an early NRIPS database while he was a research fellow at the (then) Department of Linguistics at the Australian National University.

The segments are:
the syllable-final moraic nasal /N/
the voiceless alveolopalatal fricative /ɕ/ (represented as /sh/)
the long mid back rounded vowel /oo/

The segments were taken from several different words in the database, like ginkoo bank, moshimoshi hello, san three.

Their spectra are quantified in the following ways:
First five formants/poles
First 12 LPCCs
First 12 cepstrally mean subtracted LPCCs
First 12 LPCCs band-limited over a nominal landline telephone channel (from 250 Hz to 3.5 kHz).

The files are named pretty transparently. E.g. N_cms_lpcc_60_spks.txt contains the cepstrally mean subtracted LPCCs for /N/, sh_raw_blcc_250_3500.txt contains the band-limited non cepstrally mean subtracted LPCCs for /ɕ/ etc.

The first three columns encode speaker (spk), recording (sess) and word from which segment was taken (tok).

The papers in which the data were used are:

2013 Combining linguistic and non-linguistic information in likelihood-ratio-based forensic voice comparison. [shows how you can get a better result by combining information from both segmental and long-term cepstra. The approaches were developed in later publications on FVC with cepstral spectra of fricatives and vowels 2011].

2004 Linguistic-Acoustic Forensic Speaker Identification with Likelihood Ratios from a Multivariate Hierarchical Random Effects Model – A Non-Idiot’s Bayes’ Approach

2003 Strength of forensic speaker identification evidence: multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold

download data

 

2018

'Multispeaker long-term F0 distribution parameters in Mandarin'. Long term F0 distribution parameters (mode, mean, skew, kurtosis, maximum probability density) from non-contemporaneous recordings of multi-style interactions (informal conversations, simulated police interrogations and information exchanges) of 90 male Chinese speakers. Used in Rose and Zhang 2018: Conversational Style Mismatch: its Effect on the Evidential Strength of Long-Term F0 in Forensic Voice Comparison.

download data

 

2015

'Multispeaker young female schwa formant pattern'. F1, F2 & F3 values & trajectory coefficients for the Australian English long schwa vowel from non-contemporaneous recordings of map-task interactions of young similar-sounding Australian female speakers. Used in Rose 2015: Forensic Voice Comparison with Monophthongal Formant Trajectories - A Likelihood Ratio-based Discriminination of "Schwa" Vowel Acoustics in a Close Social Group of Young Australian Females.

download data

 

2004

'Multispeaker General Australian female tense vowel formants'. F1, F2 & F3 values for the five tense quasi-monophthongal vowels (Well's FLEECE, GOOSE, BATH/START/PATH, NURSE, THOUGHT lexical sets), from non-contemporaneous recordings of 20 General Australian female speakers. Measurements made by Elaine Winter for her 2009 Australian National University MA thesis “Forensic speaker comparison with Australian female voices: A likelihood ratio-based discrimination using F-Pattern”. Used in Rose & Winter 2010: Traditional Forensic Voice Comparison with Female Formants: Gaussian mixture model and multivariate likelihood ratio analysis.

download data

 

2019

'Multispeaker Japanese long vowel cepstral coefficients'. First 12 LPC cepstral coefficients (no CC0), both raw and mean-cepstrally subtracted, for the five long Japanese vowels from non-contemporaneous telephone recordings of 297 Japanese male speakers. The recordings from which the data were extracted were made in ca. 1995 by Dr. Takashi Osanai of the Japanese National Research Institute of Police Science. His speakers were from 11 different prefectures all over Japan.

The raw CCs were extracted by Dr. Mehrdad Khodai-Joopari and used in his 2006 University of New South wales Ph.D. thesis: Forensic speaker analysis and identification by computer. A Bayesian approach anchored in the cepstral domain.

I derived the vocalic cepstrally mean subtracted data using mean CCs from the whole of the speakers' recordings (not just their vowels). The cms-CC data is also indexed according to the prefectures and dialect sub-groups (after Shibatani) of the speakers, which can be used for dialectological analysis.

The data were used in Rose 2013: More is better: Likelihood ratio-based forensic voice comparison with vocalic segmental cepstra frontends. The purpose of the paper was to demonstrate how automatic parameters like cepstral spectra can also be used to do likelihood ratio-based forensic voice comparison on segments, like vowels.

download data

 

2019

'Multispeaker Japanese sh cepstral coefficients'. First 12 cepstrally-mean-subtracted LPC cepstral coefficients (no CC0) for the spectra of voiceless alveopalatal fricatives [ɕ] from non-contemporaneous telephone recordings of 99 Japanese male speakers. The recordings from which the data were extracted were made in ca. 1995 by Dr. Takashi Osanai of the Japanese National Research Institute of Police Science. His speakers were from 11 different prefectures all over Japan. Used in Rose (2011): Forensic Voice Comparison with Secular Shibboleths – a hybrid fused GMM-Multivariate likelihood-ratio-based approach using alveolo-palatal fricative cepstral spectra. The purpose of this paper was to test whether likelihood ratio-based forensic voice comparison could be done using information from voiceless fricative spectra.

download data