Deteksi suara kecerdasan buatan berbahasa Indonesia dengan Cross-Lingual Speech Representation Wav2vec2 dan Convolutional Neural Network

Yunus, Ahmad Juaeni (2026) Deteksi suara kecerdasan buatan berbahasa Indonesia dengan Cross-Lingual Speech Representation Wav2vec2 dan Convolutional Neural Network. Sarjana thesis, UIN Sunan Gunung Djati Bandung.

[img]
Preview
Text (COVER)
1_Cover.pdf

Download (335kB) | Preview
[img]
Preview
Text (ABSTRAK)
2_Abstrak.pdf

Download (332kB) | Preview
[img]
Preview
Text (SK BEBAS PLAGIARISME)
3_skbebasplagiarism.pdf

Download (382kB) | Preview
[img]
Preview
Text (DAFTAR ISI)
4_Daftar ISI.pdf

Download (339kB) | Preview
[img]
Preview
Text (BAB I)
5_BAB I.pdf

Download (742kB) | Preview
[img] Text (BAB II)
6_BAB II.pdf
Restricted to Registered users only

Download (711kB) | Request a copy
[img] Text (BAB III)
7_BAB III.pdf
Restricted to Registered users only

Download (790kB) | Request a copy
[img] Text (BAB IV)
8_BAB IV.pdf
Restricted to Registered users only

Download (2MB) | Request a copy
[img] Text (BAB V)
9_BAB V.pdf
Restricted to Registered users only

Download (296kB) | Request a copy
[img] Text (DAFTAR PUSTAKA)
10_DaftarPustaka.pdf
Restricted to Registered users only

Download (294kB) | Request a copy
[img] Text (LAMPIRAN)
11_Lampiran.pdf
Restricted to Repository staff only

Download (249kB) | Request a copy

Abstract

BAHASA INDONESIA : Perkembangan teknologi voice cloning dan deepfake audio berbasis kecerdasan buatan telah menimbulkan ancaman nyata terhadap keamanan digital di Indonesia, di mana kerugian akibat penipuan berbasis suara sintetis telah mencapai Rp7,8 triliun sejak akhir 2024, sementara penelitian deteksi suara AI masih sangat terbatas untuk Bahasa Indonesia sebagai low-resource language. Penelitian ini bertujuan mengembangkan dan mengevaluasi model deteksi suara AI berbahasa Indonesia (Indo-AD) yang mampu membedakan suara asli manusia dari suara sintetis secara otomatis. Metodologi yang digunakan mengikuti kerangka CRISP-DM, dengan dataset berupa rekaman suara asli dan suara buatan dari platform ElevenLabs, FishAudio, dan MinimaxAudio, diproses menggunakan model Self-Supervised Learning berbasis XLSR-Wav2Vec2 sebagai ekstraktor representasi akustik yang dikombinasikan dengan Convolutional Neural Network (CNN) sebagai classifier dua kelas, dan dievaluasi dengan skema Leave-One-Speaker-Out (LOSO) untuk menguji kemampuan generalisasi. Hasil eksperimen menunjukkan bahwa model Indo-AD mencapai akurasi dan F1-score sebesar 90,5%, membuktikan bahwa pendekatan Self-Supervised Learning efektif mengekstraksi pola akustik kompleks dari sinyal audio mentah, dengan potensi implementasi sebagai sistem keamanan digital berbasis audio untuk mendeteksi deepfake suara berbahasa Indonesia. ENGLISH : Advances in AI-based voice cloning and deepfake audio technology have posed a real threat to digital security in Indonesia, where losses from synthetic voice-based fraud have reached Rp7.8 trillion since the end of 2024, while research on AI voice detection remains very limited for Indonesian as a low-resource language. This study aims to develop and evaluate an Indonesian-language AI voice detection model (Indo-AD) capable of automatically distinguishing authentic human voices from synthetic ones. The methodology follows the CRISP-DM framework, utilizing a dataset comprising authentic and synthetic voice recordings from the ElevenLabs, FishAudio, and MinimaxAudio platforms, processed using a Self -Supervised Learning model based on XLSR-Wav2Vec2 as an acoustic representation extractor, combined with a Convolutional Neural Network (CNN) as a binary classifier, and evaluated using the Leave-One-Speaker-Out (LOSO) scheme to test generalization capabilities. Experimental results show that the Indo-AD model achieves an accuracy and F1-score of 90.5%, proving that the Self-Supervised Learning approach is effective at extracting complex acoustic patterns from raw audio signals, with the potential for implementation as an audio-based digital security system to detect Indonesian-language voice deepfakes.

Item Type: Thesis (Sarjana)
Uncontrolled Keywords: Deepfake audio; Self-Supervised Learning; XLSR-Wav2Vec2; CNN; Deteksi suara AI; Bahasa Indonesia
Subjects: Data Processing, Computer Science
Special Computer Methods > Artificial Intelligence
Special Computer Methods > Digital Audio
Divisions: Fakultas Sains dan Teknologi > Program Studi Teknik Informatika
Depositing User: Yunus Ahmad Juaeni
Date Deposited: 17 Jun 2026 07:12
Last Modified: 17 Jun 2026 07:12
URI: https://digilib.uinsgd.ac.id/id/eprint/132694

Actions (login required)

View Item View Item