Tse sī tsi̍t pau siann-im ê gí-liāu-khòo. Tī 2018-2019 nî iau-tshiánn Ông Siù-iông(王秀容) lāu-su lâi lo̍k-im. Lóng-tsóng ū 3,467 tè im-tóng, ta̍k tè tn̂g-té ū 0.2 to 69.8, liōng-kî-iok tsóng 7 tiám-tsing.
Gí-kíng tsīn-liōng tsiâu-tsn̂g, tuì suànn-bûn kàu ji̍t-siông tuì-uē. Suànn-bûn sī 2015 nî tshut-pán--ê. Ta̍k kù lóng ū Hàn-jī kah Lô-má-jī.
Im-tóng ê siū-khuân sī CC BY-SA. M̄-koh thiann-phah-kó kan-na ē-táng iōng tī gí-im ha̍p-sîng, ha̍k-su̍t--ê a̍h-sī sing-lí--ê lóng ē-īng-tit. Put-tsún siu-kái thiann-phah-kó ê Hàn-jī. Nā-tsún lí m̄ sī beh iōng tī gí-im ha̍p-sîng, tshiánn liân-lo̍k bûn-tsiunn tsok-tsiá.
Pún kè-uē kám-siā Bûn-huà-pōo 2018 nî póo-tsōo.
This is a speech dataset consisting of 3,467 short audio clips of a single speaker Ms.Ông Siù-iông reading articles. A transcription is provided for each clip. Clips vary in length from 0.2 to 69.8 seconds and have a total length of approximately 7 hours.
The dataset are designed with situation balance, from articles to conversation. The articles were published in 2015. Each sentence pair contains Chinese line and Lomaji line.
The audio is CC BY-SA license. The transcription is only allowed in speech synthesis project, academically or commercially, and the remix, rewrite of Chinese lines are not allowed. If there is any other usage purposes, please contact the author of articles.
This project received financial aid by Taiwan Ministry of Culture in 2018.
To-guân bûn-huà beh siū lâng khíng-tīng, gí-giân tī kong líng-hi̍k ê thîng-hiān sī tsin tiōng-iàu ê khai-sí.
多元文化欲受人肯定,語言佇公領域的呈現是真重要的開始。
Tshing-tuann hō sui-siann.csv. Tóng-àn-lāi tsi̍t pit tuì tsi̍t tè im-tóng, iōng tāu-hō (,) keh--khui. Lāi-bīn:
Lo̍k-im keh-sik ū nn̄g khuán.
Im-tòng Sòo
|
3,467 |
Tsuân-pōo Sû-sòo(Hâm piau-tiám)
|
43,698 |
Tsuân-pōo Im-tsiat-sòo(Hâm piau-tiám)
|
62,355 |
Im-tsiat Tsióng-luī
|
2,048 |
Siann-ūn Tsióng-luī
|
957 |
Tsóng Sî-kan
|
4:44:02 (17042 bió) |
Im-tóng Pîng-kun Tn̂g-té
|
4.9 sec |
Siōng-té Im-tóng
|
0.2 sec |
Siōng-tn̂g Im-tóng
|
69.8 sec |
Sóo-tsāi
|
Muscene Studio 瓦器錄音室 Studio C |
Tshò-im-té
|
-65db |
RT60 |
25 ms |
Mái-khuh
|
Audio Technica AT4050 |
ADDA |
Antelope Zen Studio |
Preamp |
Antelope Zen Studio |
Gain |
33 |
Dynamic Range |
-2~-9db |
Im-tóng sī CC BY-SA 4.0 siū-khuân. Thiann-phah-kó tshiánn tsun-siú:
Kìng-siā tsia-ê tsē-tsē tsok-tsiá guân-ì siū-khuân in ê tsok-phín:
Kám-siā Tâi-uân tāi-ha̍k Gí-giân-sóo 謝舒凱、馮怡蓁、邱振豪 káu-siū, kah Tshing-huâ tāi-ha̍k Gí-giân-sóo 謝豐帆、張月琴 káu-siū sio-thīn. To-siā Muscene Studio瓦器錄音室 tsí-tiám. Siōng tiōng-iàu--ê tsē-tsē sia̋n-pái tàu tshuē tsok-tsiá ê liân-lo̍k hong-sik, lóo-la̍t.
Mā to-siā Taiwan Bûn-huà-pōo ê 語言多樣性友善環境補助.
0.2.1(2021/03/12):
Huat-ka̍k 0.2 pán sī 22kHz ê gí-liāu, tîng huē--tshut 48kHz 24bits gí-liāu.
Ka 長短(Tn̂g-té) nuâ-uī.
Lóng-tsóng 3,467 tóng, tn̂g-té 4:44:02 (17042 bió).
0.2(2019/11/30):
Kā kù lóng tshiat-khui.
Huân-uî kah 0.1 pán kāng-khuán, ū tsiàu lio̍k-im-kó kā kù lóng īng kaldi tshiat--khui.
Lóng-tsóng 3,467 tóng, tn̂g-té 4:44:02 (17042 bió).
0.1(2019/5/14):
Lio̍k-im-sik Tsuân-pōo gí-liāu.
Lóng-tsóng 2,897 tóng, tn̂g-té 7:00:54 (25254 bió).
CC BY-SA 4.0
Iau-tshiánn 瓦器錄音室 lâi kài-siāu gí-liāu-khòo lo̍k-im ê mê-mê-kak-kak. Hi-bāng hōo tsè-tsok jîn kah phuè-im uân ū kuá pang-tsān.