TY - JOUR
T1 - The Italian Roots in Australian Soil (IRIAS) multilingual speech corpus : speech variation in two generations of Italo-Australians
AU - Galatà, Vincenzo
AU - Avesani, Cinzia
AU - Best, Catherine T.
AU - Di Biase, Bruno
AU - Vayra, Mario
PY - 2022
Y1 - 2022
N2 - We present and describe the Italian Roots in Australian Soil (IRIAS) speech corpus. Following a sociophonetic approach, our aim is to extend and complement the frequently investigated macro-structures of lexical, syntactic and morphological interactions among immigrants' languages and common sociolinguistic investigations about immigrants' language attitudes. We first discuss and motivate the creation of the IRIAS corpus. We then focus on the specific methodological issues we addressed in compiling a corpus of natural spontaneous speech collected in Veneto or Calabrese dialects, Italian and English from first and second generation Italo-Australian speakers originating from two specific regions in Italy (Veneto and Calabria). A detailed description of the IRIAS corpus follows, including its design, collection procedure and processing. The latter focuses on novel manual and automatic solutions we implemented to overcome the challenging dearth of existing resources. These solutions help advance work on spontaneous speech data. We conclude by providing some insights on what has been achieved thus far as well as the analyses currently being carried out on subsets of the IRIAS corpus.
AB - We present and describe the Italian Roots in Australian Soil (IRIAS) speech corpus. Following a sociophonetic approach, our aim is to extend and complement the frequently investigated macro-structures of lexical, syntactic and morphological interactions among immigrants' languages and common sociolinguistic investigations about immigrants' language attitudes. We first discuss and motivate the creation of the IRIAS corpus. We then focus on the specific methodological issues we addressed in compiling a corpus of natural spontaneous speech collected in Veneto or Calabrese dialects, Italian and English from first and second generation Italo-Australian speakers originating from two specific regions in Italy (Veneto and Calabria). A detailed description of the IRIAS corpus follows, including its design, collection procedure and processing. The latter focuses on novel manual and automatic solutions we implemented to overcome the challenging dearth of existing resources. These solutions help advance work on spontaneous speech data. We conclude by providing some insights on what has been achieved thus far as well as the analyses currently being carried out on subsets of the IRIAS corpus.
UR - http://hdl.handle.net/1959.7/uws:59580
U2 - 10.1007/s10579-021-09539-3
DO - 10.1007/s10579-021-09539-3
M3 - Article
SN - 1574-020X
VL - 56
SP - 37
EP - 78
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
IS - 1
ER -