International Conference on Applied and Pure Sciences (ICAPS)

Permanent URI for this communityhttp://repository.kln.ac.lk/handle/123456789/21779

Browse

Search Results

Now showing 1 - 10 of 14

Introducing a novel hybrid algorithm to resolve class imbalance problem for binary classification in two-dimensional space
(Faculty of Science, University of Kelaniya Sri Lanka, 2024) Madhuwanthi, U. S. P.; Chandrasekara, N. V.
Classification is a task that involves categorizing data into predefined classes or categories based on their features. The class imbalance problem (CIP) in which the number of instances within the classes of the response variable is unevenly distributed, is crucial in many real-world datasets when classifying the instances into class labels or categories. Typically, the number of minority class instances (positive class) which is often, the class of interest is significantly less than the number of majority class instances (negative class). The presence of the imbalance within the classes leads to biased predictions towards the majority class. Different techniques such as oversampling, under-sampling, and hybrid techniques can be used to handle CIP. Oversampling increases the number of instances in the minority class by either duplicating existing instances or generating synthetic examples while under-sampling lowers the number of instances in the majority class. However, applying oversampling alone causes data replication while under-sampling causes loss of valuable information. The objective of the study is to propose a novel hybrid resampling technique to handle CIP, overcoming those disadvantages caused by oversampling and under-sampling alone. Binary classification problems are related to cases where the target variable has only two classes. This study has mainly focused on such datasets where only two classes are present in the target variable. The proposed algorithm aims to an application of a hybrid resampling technique, that is oversampling and under-sampling the imbalanced data together and leveling the number of instances of both majority and minority classes to half the size of the original dataset using a quartile-based approach. The proposed hybrid resampling technique is evaluated using the Pima Indian Diabetes medical dataset with imbalanced class distributions. Logistic regression was employed to identify the two most influential variables for testing in two-dimensional space. Performance metrics including accuracy, recall, precision, and F-measure are employed to assess the effectiveness of the approach. To carry out the classification process, Support Vector Machine (SVM) with one of the simplest kernel functions, the polynomial kernel function has been applied as the classifier. A training-testing split of 85% to 15% was employed for the evaluation. To compare the performance with existing oversampling techniques; ROS, SMOTE, and ADASYN and undersampling techniques; RUS, NCL, and Tomek Links, and a hybrid technique; SMOTETomek were used. In the performance evaluation process, an average recall of 100 iterations was considered. The highest average recall, 86.96%, has been obtained by the proposed algorithm while that for ROS is 42%, SMOTE is 42.57%, ADASYN is 47.1%, RUS is 40.7%, NCL is 73.7%, TomekLinks is 27.46% and SMOTETomek is 49.95%. Experimental results demonstrate significant improvements in classification performance using this proposed algorithm compared to existing oversampling, under-sampling, and hybrid techniques for handling class imbalance. Future studies will extend this work to multi-class classification problems and increase the number of explanatory variables.
Modeling and forecasting global oil price on Sri Lankan inflation rate
(Faculty of Science, University of Kelaniya Sri Lanka, 2024) Priyadarshana, D. A. D. S.; Wijesekara, J. M. C. D.; Chandrasekara, N. V.
Inflation serves as a key indicator of overall economic well-being. Indeed, inflation is the continuing increase in the general level of prices for goods and services over time. Moderate inflation may connote high economic growth, while high inflation is usually damaging to both long-term economic growth and financial stability. Since 1977, Sri Lanka has been undergoing continuous inflationary pressure due to power outages, energy shortages, reduced production in the agricultural sector, and others. Furthermore, the prices of the World oil market have been fluctuating, owing to changes in the taxes of crude oil, costs of refining and transport, and other related factors. All these dynamics bear directly on Sri Lanka's inflation. Therefore, policymakers, corporate leaders, and the general public needed to understand the dynamics of inflation. Therefore, this study brings out a research gap in the study of the impact of Gasoline Unl 92 (PATROL) and Gasoil 500ppm (DIESEL) in the Singapore World Oil Market on the inflation rate in Sri Lanka (NCPI). This is the novelty of this research since previous studies have not covered it. The main objective of this study is to develop a predictive model illustrating the influence of global oil prices on the inflation rate in Sri Lanka. Data for this research was gathered monthly from the Central Bank of Sri Lanka and the Singapore Platts, covering the period from January 2015 to December 2021. Moderate relationships were observed among the inflation rate and prices of Gasoline Unl 92 and Gasoil 500ppm from the Pearson correlation matrix. All the time series variables have been made stationary through log transformation and first differencing, which were checked through the ADF, PP, and KPSS tests. Assumptions in the residual diagnostics procedure of the time series regression model have not violated the characteristics of the absence of multicollinearity, autocorrelation, serial Correlation, and heteroscedasticity among the residuals. In addition, the residuals are normally distributed. The final predictive model ∆[𝑙𝑜𝑔(𝑁𝐶𝑃𝐼)]𝑡 = 0.0040 + 0.3453 ∆[𝑙𝑜𝑔(𝑁𝐶𝑃𝐼)]𝑡-1 + 0.4247 ∆[𝑙𝑜𝑔(𝑁𝐶𝑃𝐼)]𝑡-3 + 0.0511 ∆[𝑙𝑜𝑔(𝐷𝐼𝐸𝑆𝐸𝐿)]𝑡 + 0.0355 ∆[𝑙𝑜𝑔(𝑃𝐴𝑇𝑅𝑂𝐿)]𝑡 + 0.0283 ∆[𝑙𝑜𝑔(𝑃𝐴𝑇𝑅𝑂𝐿)]𝑡-1 included lagged terms of past inflation and gasoline and gas oil prices, which ended up quite accurate; given the RMSE value was 1.729, the MAE value came to 1.289, and the MAPE value was 0.901. Further validation of the strength of the model was in the 𝑅2 value of 53%. This model can underline the strong influences of world oil prices in determining Sri Lanka's inflation dynamics at the same time. Also, this only considered the global oil price of petrol and diesel because of the inflation rate of Sri Lanka, and all the other factors were limited. It can give important facts for policymakers to devise appropriate strategies for the management of inflation in the economy. Future researchers can improve this model using different methodological approaches and consider more designs for the global oil market decision.
An application of time series techniques to forecast the Open market weekly average retail price of lime in Sri Lanka
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Wickramarathne, R. A. S.; Wickramanayaka, M. P. A. T.; Mahanama, K. R. T. S.; Chandrasekara, N. V.
Limes are known for their acidic and tangy flavour and are commonly used in cooking, as a garnish, or to add flavour to drinks. The lime market in Sri Lanka is highly volatile, with prices fluctuating significantly on a weekly basis. In this research study, the main objective is to forecast the weekly lime price in Sri Lanka. Even though some research has been conducted on forecasting fruit prices in Sri Lanka, there is currently a lack of research on forecasting lime prices. The weekly price of lime from 1st week of January 2010 to 3rd week of February 2023 was considered for this study (632 observations). The first 600 observations were used as the training set and reserved data were used as the testing set. The time series plot of the weekly lime price of Sri Lanka indicates a slight upward trend and a non-constant variance with a seasonal pattern. The presence of a seasonal pattern motivated the development of a Seasonal Autoregressive Integrated Moving Average (SARIMA) model. When comparing Akaike’s Information Criterion (AIC), ARIMA(1,1,2)(0,1,1)[24] generated the minimum AIC value (-1.125469). Assumptions of autocorrelation and heteroscedasticity were not violated and the normality was violated. Although, the performance measures of ARIMA(1,1,2)(0,1,1)[24] were very low, ARIMA(1,1,2)(0,1,1)[24] was identified as the better model with mean absolute error of 40.799, mean absolute percentage error of 7.543, and root mean squared error of 49.793. The results obtained from this analysis would be helpful to mitigate price risks and uncertainties in the lime industry.
The effect of food commodity price fluctuation on inflation in Sri Lanka
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Nadeekantha, H. A. D. D.; Lakshitha, W. A. D. M.; Lakshitha, W. A. D. M.; Chandrasekara, N. V.
In Sri Lanka, the intersection of inflation and food price fluctuations holds profound significance, affecting not only the nation's economic stability but also the daily lives of its citizens. While existing research has extensively focused on the impact of rice prices on inflation, no published studies have been found that specifically investigate the influence of fluctuations in vegetable and fish commodity prices on inflation. Hence, there is a research gap to have a comprehensive understanding about price fluctuation on inflation. Thus, the objectives of this research are to primarily consider the effect of price fluctuations in mostly consuming vegetable and fish commodities on inflation using suitable techniques. The study focuses on key commodities, including beetroot, cabbage, potato, and various fish types (Seer, Mullet, Kelawalla, and Hurulla). Monthly data from January 2014 to June 2022, sourced from the Central Bank of Sri Lanka and the Department of Census and Statistics, were utilized for the analysis, with no missing values. To measure inflation, the National Consumer Price Index (NCPI) was used. Since all the time series of monthly observations of fish and vegetable prices and NCPI were non-stationary, the first differencing of logarithm for all the series was used where it proved the stationary by both graphical and theoretical techniques. After investigating the lag structures for fish and vegetable models, the optimum and the better lags were found. The cointegration test for both models proved that there were correlations between several time series in the long run based on the optimal lag length. Hence, two Vector Error Correction (VEC) models were fitted for two groups of food commodity prices namely, Fish and Vegetables where VEC models are well-suited for examining the relationships between food commodity prices and inflation over time. Strong cointegration relationships were identified inside these two groups. According to the VEC Granger causality test, it was found that beetroot, cabbages and potatoes do Granger-cause in NCPI but cabbages and other selected fishes do not Granger-cause in NCPI. To study the impact on inflation, the impulse response function was used. It was found that price shocks of the Hurulla fish type have a significant positive impact on inflation than other fish types of Seer, Mullet, and Kelawalla. Beetroot price shocks have a significantly more positive impact on inflation than other vegetable types of potatoes, tomatoes, and cabbage. The model, which was fitted for fish prices, the percentage of forecasting errors for NCPI increases over time for each type of fish, according to the forecast error variance decompositions. In the model, which was fitted for vegetable prices, the percentage also increases with time, but it remains smaller compared to the fish. Sri Lanka needs effective strategies and policies to mitigate the challenges of unstable inflation, hence the understanding of price fluctuation on inflation empowers policymakers to craft targeted strategies to mitigate the impact of inflation on daily life.
Study on tension detection and acceptance of glove liners
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Pathirana, G. P. N. M.; Jayasundara, D. D. M.; Chandrasekara, N. V.
The glove industry plays a leading role in the Sri Lankan economy. The quality of the final product is crucial when it comes to mass production. A significant shrink or extension of a glove can cause great losses to the company by increasing the number of defective products. The dimensions of knitted liners vary due to various factors in the knitting process. In finding a solution to this problem, the Six Sigma “DMAIC” approach is being used. This research investigated how the tension of the main yarn and yarn conditioning time affect liner dimension changes in a controlled temperature and humidity level. As for finding the dimension changes, the total length, cuff length, and the cuff width of the liners were considered. Relevant data was gathered from a leading glove manufacturing company in Sri Lanka. The Randomized Complete Block Design with 9-12 replicates, considering yarn conditioning time as blocks and tension ranges as treatments, was set up. Analysis of Variance suggested that there is a significant difference among the population means in all three dimensions. Hence, a multiple comparison test (Tucky’s test) is used to compare means. The results confirmed that the changes in yarn conditioning time had a significant impact on total length and cuff width. Nonetheless, factorial designs suggested that the interactions of tension and yarn conditioning time had a significant impact on the dimensions of knitted glove liners. As the tension increased, the length of the liners decreased. As tension levels increased, cuff lengths began to shorten. In contrast, the increase in tension of the main yarn caused the cuff widths to lengthen. Low-conditioned yarns contained significantly different dimensions than the rest of the liners knitted with yarns that had been conditioned for at least 24 hours. Generally, industries determine the optimal tension values of the main yarn manually using test gloves, which is time-consuming and costly. As a solution, this research used statistical modelling concepts, which aided in the development of a model to predict the level of tension required when the relevant liner length parameters and conditioning times were provided. Multiple linear regression and data mining techniques were used, and the models were compared. By having the lowest Root Mean Square Error, the Generalized Regression Neural Network (GRNN) outperformed the regression model and decision tree model. The error of the implemented GRNN model is 0.1521, and the independent variables explained more than 90% of the mean tension.
Exploring data mining avenues in β-Thalassemia carrier identification
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Subasinghe, G. K.; Chandrasekara, N. V.; Premawardhena, A. P.
Thalassemia is a genetic blood disorder that affects the production of haemoglobin and is a global health problem. In comparison to many other nations in the region, Sri Lanka also has a high prevalence of thalassemia. The traditional methods for identifying thalassemia carriers, such as genetics and blood tests, are expensive and time-consuming and may not be available to all demographic groups. Nevertheless, the use of data mining models for thalassemia carrier detection is still in its infancy, and there are few studies on its efficacy. Therefore, it is vital to investigate the efficacy and accuracy of data mining approaches for detecting thalassemia carriers, as well as the viability of employing these methods in clinical practice. Thus, the objective of this study is to develop a time-efficient model to detect the β-thalassemia carriers, which can reduce the time to take a decision and develop the built model as a decision support tool. Also, the earlier detection will help individuals to refer to necessary treatments further. This study is carried out with the data obtained from Hemal's Adolescent and Adult Thalassemia Care Centre, Mahara, one of the treatments centres for thalassemia. As the study population, 343 individuals’ data values were considered from August 2019 to December 2019. When processing the dataset, 112 (36%) individuals were declared as β-thalassemia carriers, whereas 200 (64%) were identified as β- thalassemia non-carriers. Eight blood parameters, such as RBC, HGB, HCT, MCV, MCH, MCHC, RDW and HbA2 were identified by revealing the literature and the Chi-square and Mann- Whitney U tests were used to identify the association between the variables at 5% level of significance. A random over-sampling technique was used to overcome the class-imbalanced problem in the dataset, and based on that, model fitting was performed under the two data selection methods, i.e., Method 1: Model fitting before handling the class imbalance problem and Method 02: Model fitting with random over-sampling technique. Then 80% of the data was used for training the models, and 20% of the data was used for the evaluation. Support Vector Machine (SVM) and Probabilistic Neural Network (PNN) models were used to detect the β-thalassemia carriers. In comparison among methods, the better-performing models were given under Method 2, and the PNN model fitted under Method 2 (PNN Model 2) exhibits 98.75% overall classification accuracy. Here, the PNN model’s network architecture consisted of eight nodes in the input layer, 320 nodes in the pattern layer, two nodes in the summation layer, and two nodes in the output layer. Further, the fitted PNN Model 2 can be utilised as a cost-effective and timesaving option to detect β-thalassemia carriers in a few seconds with acceptable accuracy and can be implemented as a decision support tool. However, it is recommended to get advice from a medical doctor for further investigation.
Identification of factors leading to elephant deaths in human-elephant conflicts
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Lakshitha, W. A. D. M.; Chandrasekara, N. V.; Kavinga, H. W. B.; Withanage, N.
Human-elephant conflicts (HEC) have emerged as one of the main challenges that Sri Lanka faces throughout several decades. According to the official data of the Department of Wildlife Conservation (DWC), the number of elephant deaths is higher than the number of human deaths due to HEC per year. This research focused on the North Central Province, where the highest number of elephant deaths have been recorded. Hence, the objectives of this research are to identify the main factors that have affected the deaths of elephants and to identify suitable models to predict the causes of elephant deaths due to human-elephant conflict. Although there has been much research related to HEC worldwide, no published research studies were found in the literature that utilized advanced statistical techniques such as Multinomial Logistic Regression (MLR), LASSO regression, Decision Tree (DT), Support Vector Machine (SVM), and Probabilistic Neural Network (PNN) for their studies. However, this research will address that research gap by constructing models for classifying the causes of elephant deaths resulting from HEC. Data was collected from various departments, including DWC, the Department of Meteorology, and the crop calendar of the Department of Agriculture. Furthermore, Pearson's Chi-square and Fisher's exact tests were used to identify the association between the cause of death and influencing factors. Five variables, including the elephant age group, grass levels, gender, rainfall season, and place of death, were found to significantly influence the causes of death of an elephant. MLR and Data Mining (DM) techniques were initially utilized, but due to multicollinearity arising in MLR, the LASSO technique was employed as a remedial method. To overcome the class imbalanced problem, 90% of the data were randomly selected for model building while maintaining the class ratio of the response variable, and the remaining 10% of the data were used for testing. Performance measures, overall classification accuracy (OCA), and Misclassification Percentage of Critical Cases (MPCC) were used to evaluate and compare the classification potential of models. Models such as final MLR, LASSO, DT, SVM with Polynomial and Gaussian Kernels, and PNN with spread 0.801 illustrated 42.30%, 50%, 53.84%, 69.23%, 73.07%, and 73.07% of OCA. In addition, the above models showed 34.61%, 30.76%, 7.69%, 11.53%, 19.23%, and 26.92% MPCC respectively. Finally, the SVM model with Gaussian Kernel exhibited high OCA (73.07%) with 19.23% of MPCC as the better model since the PNN showed a high MPCC of about 26.92%. These findings will be helpful for authorities in their future and existing projects.
A statistical approach to assess faceted blue sapphire gemstones
(Faculty of Science, University of Kelaniya Sri Lanka, 2023) Mahanama, K. R. T. S.; Chandrasekara, N. V.; Ranatunga, G. D.
The gem industry is a promising contributor to Sri Lankan economic development. The gemstone market prices are set by professional gem evaluators based on their tacit knowledge. Although the valuation of gemstones is complex due to the high variability in their characteristics, establishing a standard model that minimizes overpricing or under-pricing of gemstones helps stakeholders and preserves the reputation of the gem industry. This research aims to develop a statistical model to assess faceted blue sapphires based on affecting factors of gemstones such as colour, inclusions, cracks, cut, weight, state of treatment, and calibration. All exported gemstone records from February to September 2022 were collected from the National Gem and Jewellery Authority. A total of 881 records composed of single (409) and batch assessments (472) of faceted blue sapphire were utilized for modelling. Multiple linear regression (MLR), quantile regression (QR), support vector regression (SVR), feedforward neural network (FFNN), and generalized regression neural network (GRNN) were employed in developing pricing models. However, MLR and QR models showed a reduction of some important variables from the model. Further, the MLR model was not adequate due to the violation of the assumptions for both heteroscedasticity and autocorrelation. The performances of SVR, FFNN, and GRNN models were compared using mean squared error (MSE), root mean squared error and mean absolute percentage error. MSE for SVR, FFNN, and GRNN were 0.0697, 0.0733, and 0.0730 respectively. Even though all three models exhibit similar performances, GRNN provided a closer approximation for most of the cases. Further SVR (MSE=0.0419) and GRNN (MSE=0.0700) models were separately developed to address the most common single-piece assessment. Results revealed that the SVR model with Gaussian kernel outperforms in single assessments while GRNN provides closer predictions to all assessments. Future studies can be conducted to develop a model using the generalized method of moments which is widely used in violation of both heteroscedasticity and autocorrelation. Moreover, this study can be extended to developing statistical models to assess other varieties of gemstones. Finally, developing and implementing an application decision support tool to assess gemstones would be highly beneficial.
Predicting a top rank batsman in an ODI match, using the first few balls faced: A case study
(Faculty of Science, University of Kelaniya Sri Lanka, 2022) Madhuranga, W. P. K.; Kavinga, H. W. B.; Chandrasekara, N. V.
Predicting the success of a top-rank batsman will play a crucial role in the decision-making process in the game of cricket, on the field as well as off the field. This research is carried out with the purpose of achieving the aforementioned task. The proposed procedure explicitly followed to rank one, two and three players in the world by August 2021. Therefore, the results cannot be generalized to a wider set of players. Among several models tried out, Decision Tree (DT) model with a training ratio of 0.9 showed the highest accuracy of 72% in predicting whether the batsman will be successful, i.e., scoring fifty or more runs on a given day. Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) models with a similar test ratio resulted in an accuracy of around 65% for the three players, Rohit Sharma, Babar Azam and Virat Kholi. PNN recorded a maximum accuracy of 64.2% when predicting the performance of Rohit Sharma and the SVM model recorded a maximum accuracy of 59% when predicting the success of Babar Azam. The aforementioned accuracy of the DT model was achieved using the first five balls for Virat Kholi and Rohit Sharma and the first seven balls for Babar Azam. The findings of the study can be used to make accurate decisions in the game of cricket.
Forecasting foreign exchange reserves in Sri Lanka
(Faculty of Science, University of Kelaniya Sri Lanka, 2022) Jayawardhana, K. J. U. M.; Wijesuriya, H. P. A. D.; Kaushalya, R. A. D.; Chandrasekara, N. V.
Foreign exchange reserves are mainly used by governments to stabilize the exchange rate and balance international payments. They play a major role in the current financial crisis in Sri Lanka too. The purpose of this study was to build a suitable forecasting model and to detect factors affecting foreign exchange reserves in the context of Sri Lanka. The findings of this study can be used to provide suggestions for some policy measures taken by the government for the overall improvement of foreign exchange reserves. Monthly data on the foreign exchange reserves, United States Dollar (USD) exchange rate, foreign direct investments (FDI), gold reserves, imports, inflation rate, remittance, and total exports from January 2010 to September 2021 were used for the model fitting procedure. To transform quarterly data on gold reserves into monthly data, the cubic spline interpolation approach was utilized. The preliminary analysis identified a significant association between the foreign reserves and predictor variables: exchange rate, FDI, gold reserves, imports, and remittance. Augmented Dicky Fuller (ADF), Kwiatkowski Phillips Schmidt Shin (KPSS), and Phillips-Perron (PP) unit root tests were used to examine the stationarity. A time series regression model was fitted, adhering to the assumptions of residual diagnostics: multicollinearity, homoscedasticity, serial correlation, and autocorrelation, except for the normality. Further, the presence of co-integration was tested with the Johansen cointegration test revealed long-run equilibrium. Hence a vector error correction (VEC) model was fitted which adhered to assumptions of model residuals, including serial correlation, heteroscedasticity, and except for normality. The forecasted VEC model has a Mean Absolute Percentage Error (MAPE) of 5.30%, indicating that the VEC model is better for forecasting compared to the fitted time series regression model with a MAPE of 9.52%. The results of the analysis further revealed that foreign exchange reserves have a positive significant impact on the remittance to Sri Lanka and foreign reserves of seven months ago.

International Conference on Applied and Pure Sciences (ICAPS)

Browse

Filters

Settings

Sort By

Results per page

Search Results