Farzana, Aria Nalini (2026) Laporan Kerja Praktek Integrasi Fitur Text-to-Speech (TTS) Multilingual dalam Sistem Rekomendasi Ergonomis. Project Report. [s.n.], [s.l.]. (Unpublished)
|
Text
5025221016-Project_Report.pdf - Accepted Version Restricted to Repository staff only Download (1MB) | Request a copy |
Abstract
Artificial Intelligence mengalami perkembangan yang pesat sehingga dapat membuka peluang aksesibilitas pada teknologi, salah satunya terhadap sistem rekomendasi ergonomis yang dapat membantu mengurangi risiko gangguan muskuloskletal. Pada sistem eksisting, rekomendasi ergonomis disajikan terbatas dalam bentuk teks, sehingga kurang efektif dan aksesibel saat pengguna sedang melakukan aktivitas fisik dan membutuhkan pengalaman hands-free. Selain itu, dibutuhkan integrasi multilingual sehingga informasi dapat tersampaikan dengan lebih akurat dan menjangkau demografi pengguna yang lebih luas. Kerja praktik ini berfokus pada pengembangan modul multilingual Text-to-Speech (TTS) yang terintegrasi dengan penerjemahan otomatis sehinnga dapat mengubah teks rekomendasi ergonomis menjadi audio sesuai dengan bahasa pilihan pengguna. Metodologi meliputi perumusan masalah, studi literatur, analisis dan perancangan, implementasi, serta pengujian dan evaluasi. Pada tahap eksplorasi, dilakukan komparasi pendekatan TTS, termasuk uji coba model yang dilatih secara lokal (SpeechT5) dan layanan cloud-based, untuk menilai kualitas suara, kebutuhan sumber daya, dan kemudahan integrasi. Implementasi backend akhir menggunakan Flask melalui pipeline penerimaan teks rekomendasi (prototipe menggunakan teks dummy), tahap penerjemahan menggunakan API dari library deep-translator (GoogleTranslator), dan sintesis suara menggunakan TTS cloud-based (Google AI Studio dan ElevenLabs) dan Edge TTS sebagai opsi provider. Pengujian dilakukan dengan empat bahasa (indonesia (id), inggris (en), jepang (ja), dan korea (ko) serta mencatat performa end-to-end. Hasil menunjukkan bahwa Edge TTS membutuhkan waktu 1456-5903 ms, Google AI Studio 1335-6221 ms, dan ElevenLabs 3577-7768 ms. Penilaian kualitas audio dalam skala 1-5 menunjukkan bahwa Google AI Studio mempunyai skor tertinggi, sedangkan Edge TTS sudah unggul pada Bahasa Inggris, namun Bahasa Indonesianya masih cenderung robotik. Sistem dummy yang dibangun menyediakan endpoint untuk menghasilkan dan mengambil audio dari folder output dan ditampilkan ke client lewat interface. Rancangan modular tersebut memungkinkan integrasi yang mudah ke dalam sistem rekomendasi ergonomis eksisting karena berbasis framework yang sama.
==================================================================================================================================
Artificial Intelligence has developed rapidly, opening up opportunities for accessibility in technology, one of which is an ergonomic recommendation system that can help reduce the risk of musculoskeletal disorders. In existing systems, ergonomic recommendations are presented in a limited form of text, making them less effective and accessible when users are performing physical activities and need a hands-free experience. In addition, multilingual integration is needed so that information can be conveyed more accurately and reach a wider user demographic. This practicum focuses on developing a multilingual Text-to-Speech (TTS) module that is integrated with automatic translation so that it can convert ergonomic recommendation text into audio according to the user's preferred language. The methodology includes problem formulation, literature study, analysis and design, implementation, as well as testing and evaluation. In the exploration stage, a comparison of TTS approaches was conducted, including testing locally trained models (SpeechT5) and cloud-based services, to assess sound quality, resource requirements, and ease of integration. The final backend implementation uses Flask through a recommendation text reception pipeline (prototype using dummy text), a translation stage using an API from the deep-translator library (GoogleTranslator), and voice synthesis using cloud-based TTS (Google AI Studio and ElevenLabs) and Edge TTS as provider options. Testing was conducted with four languages (Indonesian (id), English (en), Japanese (ja), and Korean (ko)) and recorded end-to-end performance. The results show that Edge TTS takes 1456-5903 ms, Google AI Studio 1335-6221 ms, and ElevenLabs 3577-7768 ms. Audio quality assessment on a scale of 1-5 shows that Google AI Studio has the highest score, while Edge TTS excels in English, but its Indonesian is still somewhat robotic. The dummy system built provides an endpoint for generating and retrieving audio from the output folder and displaying it to the client via the interface. The modular design allows for easy integration into existing ergonomic recommendation systems because it is based on the same framework.
| Item Type: | Monograph (Project Report) |
|---|---|
| Uncontrolled Keywords: | ergonomi, rekomendasi postur, multilingual, text-to-speech, Flask, ergonomics, posture recommendations, multilingual, text-to-speech, Flask |
| Subjects: | T Technology > T Technology (General) > T56.8 Project Management T Technology > T Technology (General) > T58.6 Management information systems |
| Divisions: | Faculty of Industrial Technology > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
| Depositing User: | Aria Nalini Farzana |
| Date Deposited: | 22 Jan 2026 00:58 |
| Last Modified: | 22 Jan 2026 00:58 |
| URI: | http://repository.its.ac.id/id/eprint/130036 |
Actions (login required)
![]() |
View Item |
