Software Engineering Teaching Unit

Permanent URI for this collectionhttp://repository.kln.ac.lk/handle/123456789/26470

Browse

Search Results

Now showing 1 - 7 of 7

A Comparative Study of Two-Stage Intrusion Detection Using Modern Machine Learning Approaches on the CSE-CIC-IDS2018 Dataset
(MDPI, 2025) Hewapathirana, I. U.
Intrusion detection is a critical component of cybersecurity, enabling timely identification and mitigation of network threats. This study proposes a novel two-stage intrusion detection framework using the CSE-CIC-IDS2018 dataset, a comprehensive and realistic benchmark for network traffic analysis. The research explores two distinct approaches: the stacked autoencoder (SAE) approach and the Apache Spark-based (ASpark) approach. Each of these approaches employs a unique feature representation technique. The SAE approach leverages an autoencoder to learn non-linear, data-driven feature representations. In contrast, the ASpark approach uses principal component analysis (PCA) to reduce dimensionality and retain 95% of the data variance. In both approaches, a binary classifier first identifies benign and attack traffic, generating probability scores that are subsequently used as features alongside the reduced feature set to train a multi-class classifier for predicting specific attack types. The results demonstrate that the SAE approach achieves superior accuracy and robustness, particularly for complex attack types such as DoS attacks, including SlowHTTPTest, FTP-BruteForce, and Infilteration. The SAE approach consistently outperforms ASpark in terms of precision, recall, and F1-scores, highlighting its ability to handle overlapping feature spaces effectively. However, the ASpark approach excels in computational efficiency, completing classification tasks significantly faster than SAE, making it suitable for real-time or large-scale applications. Both methods show strong performance for distinct and well-separated attack types, such as DDOS attack-HOIC and SSH-Bruteforce. This research contributes to the field by introducing a balanced and effective two-stage framework, leveraging modern machine learning models and addressing class imbalance through a hybrid resampling strategy. The findings emphasize the complementary nature of the two approaches, suggesting that a combined model could achieve a balance between accuracy and computational efficiency. This work provides valuable insights for designing scalable, high-performance intrusion detection systems in modern network environments.
Leveraging Artificial Intelligence for Ethical Social Media Influencer Communication
(2024) Hewapathirana, I. U.
This chapter explores the connections between artificial intelligence (AI) and the ethical dimensions of influencer communication on social media. The ethical aspects are evaluated according to the criteria outlined in the Professional Code of Ethics of the Public Relations Society of America (PRSA). The study reviews the multiple aspects of influencer communication, including emerging challenges and legal implications resulting from the continued development of AI in social media. Furthermore, a dataset was collected from the social media platform Reddit, and a case study analysis was performed using the NodeXL software. This empirical investigation aims to investigate social media users' perspectives on specific ethical concerns associated with integrating artificial intelligence (AI). The findings presented in this chapter provide scholars with an advanced understanding of AI capabilities, offer industry professionals valuable guidance for ethical decision-making, and offer lawmakers guidance for developing regulatory frameworks.
TourismXplorer: Interactive Dashboard for Data-Driven Decision-Making in Sri Lanka’s Tourism Industry
(2024) Thilakarathna, W. A. S. M. S.; Hewapathirana, I. U.
Abstract: The tourism industry is a critical component of Sri Lanka’s economy, necessitating advanced tools for data-driven decision-making to enhance strategic planning and operational efficiency. This study presents the development of a comprehensive tourism dashboard designed specifically for tourism businesses in Sri Lanka. The dashboard offers a holistic view of the tourism landscape by integrating diverse data sources, including annual statistical reports (2018-2023), climate variables from the Sri Lanka Meteorological Department, and TripAdvisor reviews. The novelty of this research lies in its multifaceted data integration, advanced visualization techniques, and predictive analytics capabilities. The dashboard provides stakeholders with real-time and historical insights into tourism dynamics. It includes key performance indicators (KPIs) such as tourist arrivals, revenue, expenditure, accommodation statistics, climate impact, visitor demographics, and sentiment analysis from reviews. Visualizations range from line, pie, and bar charts to shape maps, heat maps, and word clouds, enhancing data accessibility and interpretability. A standout feature of the dashboard is its predictive analytics page, which allows users to forecast tourist arrivals based on selected explanatory variables such as climate data and customer sentiments. This predictive ability enables stakeholders to simulate various scenarios and better prepare for future trends, making the dashboard an invaluable tool for strategic decision-making. The dashboard’s user-friendly interface and customizable filtering options allow users to tailor their analyses based on specific criteria, such as year, region, and visitor attributes. This targeted approach ensures that tourism businesses can leverage the dashboard for practical decision-making, aligning with sustainable tourism development goals by monitoring environmental and social impacts. This research advances the field of tourism analytics and provides a practical tool for enhancing the strategic and operational capabilities of tourism businesses in Sri Lanka. Future enhancements may include the incorporation of more sophisticated predictive models, which would further improve the dashboard’s utility.
Development of a machine learning model for air quality forecasting: leveraging long-term meteorological data analysis to predict air quality index in Colombo District
(2024) Rathnayaka, R.M.S.I.; Hewapathirana, I. U.
Air quality is a critical aspect of environmental health, directly impacting individuals and the broader ecosystem. Therefore, real-time monitoring and understanding the factors influencing air quality are crucial. The most typical reasons for air pollution are vehicle emissions, organic waste burning, and petroleum refining. However, other factors have arisen as causes of air pollution. Although meteorological factors are natural phenomena, they have been changing detrimentally due to human actions. Extreme meteorological events may significantly influence air quality. In Sri Lanka, a region with its own set of environmental challenges, understanding the dynamics of air quality is important. Over the past decade, Sri Lanka has witnessed notable shifts in weather patterns, with potential implications for human well-being. Available data indicates that Colombo often experiences high levels of air pollution. Recognizing these factors, this research introduced a model for real-time forecasting of the Air Quality Index (AQI) based on meteorological factors, emphasizing the Colombo district. The research focused on the period from 2020 to 2023, using a dataset that includes daily meteorological factors, wind speed, temperature, atmospheric pressure, rainfall, and relative humidity, alongside daily AQI values for the Colombo district. A temporal analysis identifies long-term trends and patterns in air quality. The study leveraged five machine learning algorithms: Linear Regression, Random Forest Regression, Gradient Boosting Regression, Support Vector Regression, and Long Short-Term Memory Network to develop models for predicting air quality based on meteorological factors. It also evaluated the performance of these machine learning models using metrics such as Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, and R-Squared to determine each model’s reliability in predicting the AQI. In conclusion, this research aims to discuss the role of weather variables in shaping air quality in the Colombo district. The outcomes contribute to understanding air quality in Sri Lanka and the broader global discourse on utilizing advanced technologies for environmental monitoring and forecasting. With insights into the predominant weather factors influencing air quality, decision-makers can formulate policies to improve the region’s air quality based on seasonal weather pattern changes.
Deep learning-based correctness assessment for the Tadasana (Mountain Pose) Yoga Asana
(2024) Gayan, V.G.N.; Hewapathirana, I. U.
Yoga has become increasingly popular worldwide, but practicing without proper guidance can lead to incorrect posture alignment, reducing effectiveness, and increasing injury risk. This research aimed to address this issue by developing a deep learning-based system that relies on the MediaPipe framework to assess the correctness of the Tadasana yoga asana and provide real-time feedback for improvement. A deep learning-based system was selected for the proposed study to implement the MediaPipe framework, for its outstanding real-time performance (75.9% mean average precision on the COCO dataset) and cross-platform efficiency. Using MediaPipe, a custom-developed web app analyzed more than 50 professional yoga instruction videos to extract crucial body angles for each Tadasana step, generating the dataset for the yoga pose angle calculation algorithm. This approach accounts for MediaPipe’s inherent variability in landmark detection, ensuring robust angle calculations. The primary goals of this study were to develop an accurate pose estimation and angle calculation algorithm specifically optimized for Tadasana, as well as a comprehensive, real-time feedback mechanism for pose correction. The proposed system integrated MediaPipe’s pose estimation capabilities with a custom angle calculation algorithm and a rule-based feedback system. An extensive evaluation was conducted using more than 100 images of correct and incorrect poses for each of the three Tadasana steps. The system demonstrated promising results, achieving accuracy scores of 78, 75, and 72% for steps 1, 2, and 3, respectively. It was observed that the system’s performance varied based on factors such as image quality and environmental conditions. This study demonstrates the feasibility and potential of using deep learning and computer vision techniques for precise yoga pose correction. Future work will focus on enhancing the system’s robustness across diverse conditions, expanding its capabilities to encompass a wider range of yoga poses, and implementing real-time video analysis for feedback generation. These advancements could significantly enhance the accessibility and effectiveness of remote yoga instruction, making proper technique more attainable for practitioners.
Empowering influence discovery: Utilizing machine learning for social media influencer identification
(2024) Devyanjalee, D.D.W.N.; Hewapathirana, I. U.
In today’s dynamic digital landscape, influencer marketing has become a cornerstone of marketing strategies, leveraging social media platforms to engage with audiences. Accurately identifying influencers within social media platforms poses a formidable challenge. Traditional machine learning approaches relying solely on metrics such as network analysis and user profile data, often fall short in capturing the dynamics of influencer impact and resonance with audiences. To address this gap, this study aimed to enhance influencer identification accuracy by leveraging both user profile and engagement metrics alongside text analysis. The methodology adopts a sequential explanatory design, combining quantitative analysis of user profile metrics with qualitative analysis of text-related factors. Data collection from social media platforms, particularly X, comprises user profile and social data. The quantitative phase employs established algorithms like the PageRank algorithm to identify top influencers based on user profile data, while machine learning models, logistic regression, decision trees, and random forest are trained using user profile data to discern influential user profiles. The qualitative phase involves text analysis techniques, including keyword matching and lemmatization, to extract valuable insights from tweets. Machine learning models are then trained using both user profile and social data alongside text analysis data to discern influential user profiles. The models are then compared to assess the impact of incorporating engagement metrics with text analysis. Findings from this study indicate that while user profile metrics alone exhibit high accuracy in influencer identification, with the random forest model achieving an F1 score of 0.90, the incorporation of engagement metrics introduce complexities affecting model performance, resulting in an F1 score of 0.70. The random forest model emerges as the most robust performer, maintaining high accuracy despite these challenges. This research contributes to advancing influencer identification strategies within digital marketing, offering insights into the effectiveness of integrating both user profile and engagement metrics with text analysis for capturing the true essence of influencer influence and resonance with audiences. The findings underscore the challenges of leveraging engagement metrics for influencer identification and highlight the need for further refinement of methodologies to empower marketers in navigating the complexities of the ever-evolving digital landscape.
A Case Study in Financial Fraud Detection using Big Data Analytics
(2021) Boteju, W. P. A.; Hewapathirana, I. U.
The financial industry is currently undergoing digital transformations across products, services and business models. This digitization is aimed at automating most of the manual financial transactions and other relevant services. Therefore, spotting fraud in financial transactions has become an important priority for all financial institutes. With the advances in modern technology and global communication, fraud has increased significantly, causing great damages. The focus of this paper is to experiment different approaches for detecting fraudulent activities in a real-world dataset of financial payment transactions. The dataset is obtained from Kaggle and consists of 6 million transaction records and 10 features with the transaction label as ‘fraudulent’ or ‘non-fraudulent’. These features are investigated using exploratory data analysis and only 6 are retained for the experiment such as payment-type, account-balance, transaction-amount etc. Two supervised machine learning algorithms, the random forest and the support vector classifier are employed for detecting fraudulent transactions. The dataset is large and requires high computational power to process and train machine learning algorithms. Furthermore, another challenge is the highly imbalanced distribution between fraudulent (0.1%) and the non-fraudulent (99.9%) classes. The goal of this research is to solve both these issues. In order to handle class imbalance, the effect of oversampling the minority class data using the synthetic minority oversampling technique (SMOTE), and undersampling the majority class using random undersampling are investigated. Computational efficiency is achieved through the Apache Spark implementation, which provides distributed processing for big data workloads. The best performance is obtained using the random forest algorithm on the oversampled dataset with an accuracy of 99.95%, F1-score of 0.9994, recall of 0.9994, Geometric mean of 99.94% and a model training time of 13.9 minutes. This paper provides valuable insights on dealing with large scaled highly imbalanced big datasets for predicting financial frauds and generating alerts.

Software Engineering Teaching Unit

Browse

Filters

Settings

Sort By

Results per page

Search Results