Deyana, Mochamad Rajib (2019) Document embedding menggunakan paragraph vector untuk clustering terjemahan ayat-ayat Al-Qur'an. Sarjana thesis, UIN Sunan Gunung Djati Bandung.
|
Text (COVER)
1_cover.pdf Download (103kB) | Preview |
|
|
Text (ABSTRAK)
2_abstrak.pdf Download (154kB) | Preview |
|
|
Text (DAFTAR ISI)
3_daftarisi.pdf Download (150kB) | Preview |
|
|
Text (BAB I)
4_bab1.pdf Download (574kB) | Preview |
|
Text (BAB II)
5_bab2.pdf Restricted to Registered users only Download (1MB) | Request a copy |
||
Text (BAB III)
6_bab3.pdf Restricted to Registered users only Download (1MB) | Request a copy |
||
Text (BAB IV)
7_bab4.pdf Restricted to Registered users only Download (2MB) | Request a copy |
||
Text (BAB V)
8_bab5.pdf Restricted to Registered users only Download (131kB) | Request a copy |
||
Text (DAFTAR PUSTAKA)
9_daftarpustaka.pdf Restricted to Registered users only Download (276kB) | Request a copy |
Abstract
Perkembangan pesat dalam bidang teknologi informasi menghasilkan data yang sangat besar dan beragam. Penggalian data (data mining) kemudian dilakukan untuk mengekstrak informasi yang berguna. Ketika dihadapkan pada data teks, objek dapat berupa kata, kalimat, paragraf, maupun dokumen, di mana text embedding digunakan untuk mengonversi objek tersebut ke dalam bentuk numerik. Masih jarangnya penerapan metode-metode data mining menggunakan pendekatan embedding berbasis prediksi pada naskah-naskah keagamaan menjadi motivasi utama dari penelitian ini. Paragraph vector kemudian digunakan untuk menghasilkan sebuah representasi numerik dari teks berupa terjemahan dan tafsir-tafsir Al-Qur’an dalam Bahasa Indonesia. Dari tes analogi berupa 173.002 pertanyaan semantik dan 208.920 pertanyaan sintaksis yang diberikan, vektor kata yang dihasilkan paragraph vector mampu menjawab masing-masing sebesar 47,04% dan 56,75% pertanyaan dengan benar. Metode data mining seperti clustering selanjutnya digunakan untuk mengelompokkan vektor dokumen dari terjemahan pokok-pokok bahasan Al-Qur’an. Menggunakan CLARANS, diperoleh 8 kelompok pokok bahasan Al-Qur’an yang berkorelasi dengan nilai terbaik pada pengukuran internal cluster; Silhouette Coefficient dan Davies-Bouldin Index masing-masing sebesar 0,0965 dan 1,8038. The rapid development of information technology produces very large and diverse data. Data mining is then carried out to extract useful information. When dealing with text data, objects can be words, sentences, paragraphs, or documents, where text embedding is used to convert these objects into numeric form. The lack of application of data mining methods in predictive-based text embedding for religious texts is the motivation of this study. Word2vec and paragraph vector are then used to produce a numerical representation of the translations and interpretations (Tafseer) of the Qur'an in Indonesian. From the analogy test in the form of 173,002 semantic questions and 208,920 syntactic questions, word vectors obtained by paragraph vector is able to answer 47,04% and 56.75% of each question type correctly. Data mining methods such as clustering are then used to classify document vectors from the translation of the subjects of the Qur'an. Using CLARANS, 8 groups of Al-Qur'an subjects were obtained with the best value in the internal cluster measurements; Silhouette Coefficient and Davies-Bouldin Index of 0.0965 and 1.8038, respectively.
Item Type: | Thesis (Sarjana) |
---|---|
Uncontrolled Keywords: | data mining; text embedding; tafsir Al-Qur’an; word2vec; paragraph vector; tes analogi; clustering; CLARANS; pengukuran internal cluster |
Subjects: | Mathematics > Data Processing and Analysis of Mathematics Applied mathematics > Programming Mathematics Applied mathematics > Special Topics of Applied Mathematics |
Divisions: | Fakultas Sains dan Teknologi > Program Studi Matematika |
Depositing User: | Mochamad Rajib Deyana |
Date Deposited: | 09 Sep 2020 02:50 |
Last Modified: | 09 Sep 2020 02:50 |
URI: | https://digilib.uinsgd.ac.id/id/eprint/33167 |
Actions (login required)
View Item |