Browsing by Author "Gunawardana, A."

Now showing 1 - 5 of 5

A comparison of distance-based and model-based clustering methods
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Nadeekantha, H. A. D. D.; Kavinga, H. W. B.; Gunawardana, A.; Dissanayaka, D. M. P. V.
Most of the statistical techniques assume the homogeneity of the sample data. However, not all the time, real-world samples are homogeneous. The existence of subgroups within a population leads to the non-homogeneity of the sample. In this case, it is not accurate to model the population using a single probability distribution. Hence it is essential to check the homogeneity of the sample. Clustering, an unsupervised learning technique, is being used to discover a population's subgroups and group each observation into a specific cluster. Mainly, clustering algorithms can be divided into two groups, namely model-based and distance-based algorithms. Model-based algorithms assume a probability distribution for clustering, while distance-based algorithms use a distance metric to classify observations into clusters. In the literature, it was suggested that the model-based clustering methods perform better than the distance-based methods using summary statistics and visualizations. In this study, an inference-based procedure has been used to assess the above claim. To compare the performances of model-based and distance-based algorithms, an extensive simulation study was conducted. In the simulation study, two univariate Gaussian mixtures with different parameter settings (mean, standard deviation, and sample size) were combined to generate a non-homogeneous sample. Then, model-based and distance-based algorithms were applied to the same simulated datasets with different cluster structures, knowing the actual cluster memberships. Further, the effect of bimodality conditions of Gaussian mixtures on both clustering methods was checked. To assess the performance of the two methods, identifying the correct number of clusters, Cluster Identification Ability (CIA), and categorizing the observations into the correct cluster memberships (clustering accuracy) were computed. CIA was computed using the percentage of iterations that identified the correct number of clusters, and clustering accuracy was measured using the Adjusted Rand Index (ARI). For most of the simulation settings, both methods required a sample size of less than 200 to achieve high clustering accuracy (approximately mean ARI value of 0.8). For example, a simulation setting with a mean difference of 3.1 and a standard deviation of 0.5 required sample sizes 20 and 10 for the model-based and distance-based methods, respectively. These minimum sample sizes vary depending on the method's high clustering accuracy, and in some cases, those are approximately the same. The inference-based study which is performed using the paired Wilcoxon signed-rank test indicated that the claim “model-based method outperforms distance-based method, or both performs similarly” is valid 82.7% of the time at a 5% level of significance. In conclusion, the CIA and clustering ability of the model-based method increased with the increment of sample size when the bimodality conditions were satisfied by the mixture. For the distance-based method, both abilities decreased as the sample size increased when the bimodality conditions were not satisfied by the sample.
Correlation Between the Teachers’ and Students’ Motivation in Second Language Achievement: A Review of Literature
(Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Jayarathne, P.; Gunawardana, A.
In the domain of second language (L2) learning achievement, motivation is considered as a key determinant. During past decades in the educational milieu, numerous researches have been conducted and a considerable number of models have been presented focusing mainly on students’ motivation. However, it raises a probing question whether students’ motivation alone is sufficient for the successful acquisition of L2. As Deci and Ryan (1985) claim to develop intrinsic motivation in learners which leads to better language performance, the learners need to perceive the learning environment to be "informational" rather than "controlling” and the learning context has to be autonomy supporting in that it facilitates self-determination on the part of the learner. Therefore, it postulates that teachers play a vital role in creating a conducive learning environment. Accordingly, the present study is based on the hypothesis that teachers’ motivation is a predominant variable in L2 achievement because teachers’ autonomous motivation towards teaching foretold students’ autonomous motivation towards learning. The key objectives of this study are to review the limited literature available on social-contextual conditions that have an impact on teachers’ motivation and to analyse the correlation between the teacher’s and students’ motivation in L2 achievement. The literature indicates the teachers are mostly motivated by factors such as; student achievement, teachers’ perception of their status in society, a positive atmosphere in school, constructive evaluation, the sense of self-fulfillment, effective administration and management, etc... Further, the previous studies affirm that the teacher’s motivational teaching practice leads to improved levels of L2 achievement. Thereby, consistent with these findings, it is concluded that autonomously motivated teachers stimulate their learners towards learning and when teachers are more supportive of autonomy and less controlling, students demonstrate higher levels of intrinsic motivation and self-determination.
Estimating COVID-19 prevalence in Sri Lanka
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Erandi, J. D. T.; Liyanage, U. P.; Gunawardana, A.
Throughout the ages, man has had to face numerous crises and diseases. Among them, the COVID-19 virus can be considered as one of the most fatal diseases ever, and it has caused significant damage to the entire world. Moreover, due to the nature of the virus transfer modes, controlling the COVID-19 infection among people is a challenging task, and thereby, the spread of the virus still persists globally with less severity. Hence, an effective and accurate controlling measure is essential. The profile of the coronavirus progression in a sub-region can be changed due to numerous factors such as population density, public mobility, and available health facilities. Thus, at a time, diverse prevalence status of virus spread on different sub-regions is highly probable. This study attempts to construct a suitable sampling design to capture the prevalence of COVID-19 by modifying the stratified sampling technique to estimate the sample size adapting to the changing population of infected cases. This adaptation is essential as the increase of infected cases boosts the virus spread, and the standard sampling techniques do not address such dynamic population conditions in determining the sample size. Further, the study bridges the gap between the reported and actual infections per day, thereby giving accurate estimates of virus distribution and prevalence. The coronavirus progression over a region has a skewed pattern, and it should also be considered in the weight allocation method. Thus, the weights are determined based on the first derivative of reported infected cases. This derivative information is based on the recent dynamics of the infected cases. Consequently, larger weights were assigned when the virus progression increased, and smaller weights were assigned when the virus progression decreased. After that, the sample size for each sub-region was calculated by the modified stratified sampling method. To illustrate the accuracy of the sampling design, simulated data from different epidemic scenarios, such as community spread, cluster spread, and border spread, was used. This simulation allowed us to test the robustness of the techniques for the different states of the virus progression based on the infected cases. The sample size obtained through this dynamic sampling technique exhibits a direct correlation with the fluctuations in the number of infected cases, increasing as the infection cases rise and decreasing as they decline. In conclusion, the study results in a novel sampling technique that is sensitive to the dynamic nature of population sizes, and it can be straightforwardly applied to real-world data as well. Thus, this modified stratified sampling technique can be considered as an accurate sampling technique to capture the actual prevalence of COVID-19.
Nonparametric multiple comparisons and simultaneous confidence intervals for multivariate designs
(4th International Research Symposium on Pure and Applied Sciences, Faculty of Science, University of Kelaniya, Sri Lanka, 2019) Gunawardana, A.; Konietschke, F.
Over the last half-century, the use of multivariate designs has grown rapidly in many scientific disciplines. Such designs can have more than two possibly correlated response variables (endpoints) observed on each experimental unit and should allow comparisons across different treatment groups. Existing parametric tests in multivariate data analysis are based on the assumption that the observations follow multivariate normal distributions with equal covariance matrices across the groups. Such assumptions, however, are impossible to justify in real observations, e.g., for skewed data or ordered categorical data. In fact, existing parametric methods that rely on the assumption of equal covariance matrices tend to be highly liberal or conservative when the covariance matrices of the different groups are actually different. Therefore, a nonparametric approach is desirable that is valid even when covariance matrices are different – even under the null hypothesis of no treatment effect. In this study, purely nonparametric methods that overcome the existing gaps have been introduced. The procedures are robust in the sense that they assume neither any specific data distribution nor identical covariance matrices across the treatment groups, flexible in the sense that the inference method can be adjusted to specific research questions and in particular, the methods are consonant, coherent and compatible. To test hypotheses formulated in terms of purely nonparametric treatment effects, pseudo-rank based multiple tests are derived. The results are achieved by computing the distribution of normalized rank-means under general but fixed alternatives. Instead of using quadratic forms as test statistics, the t-test type statistics are used and the joint distribution of them has been computed in a closed form, asymptotically. Small sample size approximations using methods-of-moments by multivariate t-approximation achieve accurate control of the multiple type-I error rate of the methods and comparable power to existing global testing procedures. To illustrate the application of the proposed tests, a part of an immunotoxicity study on the effects of silicone on the immune system is considered. There were three treatment groups of mice involved in the study and five clinical chemistry endpoints were measured on each mouse after the treatment. To answer the main question, that is, quantifying (significant) differences between the treatment groups under each endpoint for making biological conclusions on the effects of silicone, the multiple hypotheses are tested using many-to-one comparisons
Wild bootstrapping rank-based procedure: Multiple testing on multivariate data
(Faculty of Science, University of Kelaniya, Sri Lanka, 2020) Gunawardana, A.; Konietschke, F.
Multivariate data occur in many scientific applications, for example in agriculture, biology, clinical studies in medicine, or in social sciences. They are apparent if two or more possibly correlated response variables are measured on the same experimental unit. Besides, in a study design, the experimental units might be stratified into several treatment groups. Such a design is called a multivariate factorial design and should allow comparisons across different treatment groups. In statistical practice, the evaluation of a multivariate factorial design does not only include the question of whether there is a treatment effect between the groups in any of the responses but, if such a treatment effect is observed, between which groups and under which responses those differences exist. That is, testing only the global null hypothesis (all treatment groups have the same effect across all responses) is not of interest but in particular, multiple comparisons between the treatment groups are also of practical importance. To date, the available nonparametric methods of multivariate analysis are used to test hypotheses formulated in terms of the distribution functions of the data and thus, assume identical covariance matrices across the groups. Moreover, they cannot provide adjusted p-values and compatible simultaneous confidence intervals (SCIs) for the multiple tests. In the present work, rank-based tests that overcome the existing gaps have been derived to test hypotheses formulated in terms of purely nonparametric treatment effects. Thus, the new approaches can be used for testing the global null hypothesis as well as for performing multiple comparisons and for the computation of compatible SCIs. Due to the complexity of multivariate factorial designs and usually apparent small sample sizes in statistical practice, small sample size approximations of the test statistics are of particular importance. Therefore, a modern resampling method, namely, a wild bootstrap approach has been introduced. It can be seen from the resampling algorithm that the resampling version of the test statistic does not require the estimation of the correlation matrix of the test statistics. Also, the critical values from the resampling distribution are used in the construction of rank-based multiple contrast tests and SCIs. The asymptotic validity of the wild bootstrap approach has been derived and its behavior was analyzed in an extensive simulation study where different data distributions with different covariance structures and sample sizes were considered. The simulation results show that the wild bootstrap method tends to be more robust, controls the multiple type-I error rate quite accurately, and has comparable power compared to rank-based MANOVA-type tests in all the investigated scenarios. Furthermore, a real data example illustrates the application of the proposed tests.