Model-Based Clustering dengan Distribusi T-Multivariat menggunakan Kriteria Integrated Completed Likelihood dan Minimum Message Length (Pengelompokan Provinsi Di Indonesia Menurut Indikator Pasar Tenaga Kerja Tahun 2010-2015)

Agustini, Mety (2017) Model-Based Clustering dengan Distribusi T-Multivariat menggunakan Kriteria Integrated Completed Likelihood dan Minimum Message Length (Pengelompokan Provinsi Di Indonesia Menurut Indikator Pasar Tenaga Kerja Tahun 2010-2015). Masters thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
1315201717-Master-Theses.pdf - Published Version

Download (1MB) | Preview

Abstract

Analisis cluster merupakan alat statistik yang banyak digunakan untuk menentukan kelompok dalam satu kumpulan data. Metode clustering yang paling sering digunakan adalah clustering berdasarkan ukuran jarak. Namun pengelompokan menggunakan jarak akan sangat sulit dilakukan jika kondisi objek yang ada saling tumpang tindih. Penelitian ini menyarankan pendekatan model-based clustering (MBC) yang didasarkan pada model finite mixture. Metode clustering ini memiliki asumsi bahwa data yang dihasilkan berasal dari beberapa distribusi probabilitas dan kemudian kelompok yang terbentuk diwakili oleh masing-masing distribusi probabilitas tersebut. Distribusi t multivariat pada model-based clustering digunakan untuk mengakomodasi keberadaan outlier. Distribusi t multivariat dianggap lebih tepat mengatasi outlier dibandingkan distribusi normal multivariat. Pemilihan model terbaik dari beberapa model yang tersedia dilakukan melalui kriteria Integrated Completed Likelihood (ICL) dan Minimum Message Length (MML). Kelompok optimal MBC-ICL digunakan untuk analisis pasar tenaga kerja Indonesia berdasarkan indikator Bekerja menurut lapangan usaha (subset data k2 0815). Sedangkan kelompok optimal RMBC-MML digunakan pada analisis pasar tenaga kerja Indonesia berdasarkan indikator EPR, Pekerja rentan, dan Pekerja sektor informal (subset data k5 0815). ===================================================================================== The cluster analysis is a widely used statistical tool to determine subsets in a given data set. Clustering methods are used mostly based on distance measures. However, the measurement by the distance will be very difficult to do if the objects overlap. This research reviews recently suggested approaches to model-based clustering (MBC) which based on finite mixture models. It has an assumption that data are generated from several probability distributions and then a different cluster is represented by each probability distribution. The multivariate t-distribution in a model-based clustering is used to accomodate the existence of outlier. It is considered more appropriately overcoming the outlier than multivariate normal distribution. The best model from a list of candidate models is determined by the model-selection approach : integrated completed likelihood (ICL) and minimum message length (MML) criterion.

Item Type: Thesis (Masters)
Uncontrolled Keywords: distribusi t multivariat; indikator pasar tenaga kerja; integrated; completed likelihood; minimum message length; model-based clustering; integrated completed likelihood, key indicators of the labor market; multivariate t distribution
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics > QA278 Cluster Analysis
Divisions: Faculty of Mathematics and Science > Statistics > (S2) Master Theses
Depositing User: - METY AGUSTINI
Date Deposited: 10 Mar 2017 04:47
Last Modified: 10 Mar 2017 04:47
URI: http://repository.its.ac.id/id/eprint/3514

Actions (login required)

View Item View Item