Perbandingan akurasi deteksi baris Min-Cost Flow Network dan Tesseract pada citra digital berdasarkan perbedaan metode segmentasi

Bahari, Andi Malia Fadilah (2023) Perbandingan akurasi deteksi baris Min-Cost Flow Network dan Tesseract pada citra digital berdasarkan perbedaan metode segmentasi. Sarjana thesis, UIN Sunan Gunung Djati Bandung.

[img]
Preview
Text (COVER)
1_cover.pdf

Download (131kB) | Preview
[img]
Preview
Text (ABSTRAK)
2_abstrak.pdf

Download (171kB) | Preview
[img]
Preview
Text (DAFTAR ISI)
3_daftarisi.pdf

Download (394kB) | Preview
[img]
Preview
Text (BAB I)
4_bab1.pdf

Download (426kB) | Preview
[img] Text (BAB II)
5_bab2.pdf
Restricted to Registered users only

Download (438kB) | Request a copy
[img] Text (BAB III)
6_bab3.pdf
Restricted to Registered users only

Download (862kB) | Request a copy
[img] Text (BAB IV)
7_bab4.pdf
Restricted to Registered users only

Download (1MB) | Request a copy
[img] Text (BAB V)
8_bab5.pdf
Restricted to Registered users only

Download (171kB) | Request a copy
[img] Text (DAFTAR PUSTAKA)
9_daftarpustaka.pdf
Restricted to Registered users only

Download (200kB) | Request a copy

Abstract

INDONESIA : Pada saat ini, Optical Character Recognition (OCR) telah berkembang hingga memiliki fokus penelitian yang berbeda berdasarkan tipe citra yang dideteksi, seperti Natural Scene Image dan Document Image. Pemfokusan berdasarkan tipe citra dilakukan karena tiap tipe citra memiliki masalah tersendiri sehingga memerlukan alur efektif pada tiap tipe. Pada penelitian terdahulu, masih terdapat deteksi teks dengan metode tradisional yang memungkinkan pendeteksian pada kedua tipe citra selama tidak mengalami distorsi. Sistem metode tradisional yang digunakan sendiri berfokus pada dua pendekatan yaitu Sliding Window dan Connected Component Analysis dalam proses deteksi karakter kandidatnya dengan penggunaan metode segmentasi yang beragam. Dikarenakan sistem-sistem tesebut belum pernah dibuktikan pengujiannya terhadap dataset Document Image dengan kandungan teks yang relatif banyak. Salah satu penelitian metode tradisional dengan penggunaan Min-Cost Flow Network (MCFN) untuk merapihkan barisan karakter dan Tesseract sebagai filter akhir deteksi karakter kandidat terhadap Document Image. Penelitian MCFN tersebut memberikan hasil yang lebih rendah dibandingkan pengujian terhadap Natural Scene Image pada penelitian terdahulu, dikarenakan permasalahan pada proses segmentasi karakter. Penelitian ini berfokus pada hasil akurasi deteksi baris teks metode tradisional yang menggunakan MCFN dan Tesseract pada infografis dengan mengintegrasikan metode segmentasi yang berbeda-beda untuk mengetahui metode segmentasi yang dapat memberikan hasil akurasi terbaik. Dari penelitian ini menghasilkan rata-rata akurasi tertinggi didapatkan dari penggunaan metode segmentasi Connected Component Labbeling dengan hasil Recall 35,29%, Precision 69,10%, dan F-Score 41,02% pada parameter TS jarak antara karakter sebesar 2. ENGLISH : Nowadays, Optical Character Recognition (OCR) has developed to have different research focuses based on the type of image detected, such as Natural Scene Image and Document Image. Focusing based on image type is done because each image type has its own problems so it requires an effective flow for each type. In previous research, there was still text detection using traditional methods which allowed detection of both types of images as long as they were not distorted. The traditional method system used focuses on two approaches, namely Sliding Window and Connected Component Analysis in the process of detecting candidate characters using various segmentation methods. Because these systems have never been tested on Document Image datasets with relatively large text content. One of the traditional research methods uses Min-Cost Flow Network (MCFN) to tidy up character sequences and Tesseract as the final filter for detecting candidate characters in Document Images. The MCFN research gave lower results compared to testing the Natural Scene Image in previous research, due to problems in the character segmentation process. This research focuses on the accuracy results of text line detection using traditional methods using MCFN and Tesseract on infographics by integrating different segmentation methods to find out which segmentation method can provide the best accuracy results. Based on the research, it was found that the highest average accuracy was obtained using the Connected Component Labbeling segmentation method with Recall 35,29%, Precision 69,10%, and F-Score 41,02% on the TS parameter, the distance between characters was 2.

Item Type: Thesis (Sarjana)
Uncontrolled Keywords: Optical Character Recognition; Segmentasi; Deteksi Baris; MCFN; Tesseract
Subjects: Systems > Computer Modeling and Simulation
Data Processing, Computer Science > Computer and Human
Special Computer Methods > Computer Pattern Recognition
Applied Physics > Computer Engineering
Office Services > Records Management
Color and Related Technology
Divisions: Fakultas Sains dan Teknologi > Program Studi Teknik Informatika
Depositing User: Andi Malia Fadilah Bahari
Date Deposited: 29 Dec 2023 04:52
Last Modified: 29 Dec 2023 04:52
URI: https://digilib.uinsgd.ac.id/id/eprint/83679

Actions (login required)

View Item View Item