Annual Research Symposium (ARS)

Permanent URI for this communityhttp://repository.kln.ac.lk/handle/123456789/154

Browse

Search Results

Now showing 1 - 10 of 25

Performance of k-mean data mining algorithm with the use of WEKA-parallel
(University of Kelaniya, 2013) Gunasekara, R.P.T.H.; Dias, N.G.J.; Wijegunasekara, M.C.
This study is based on enhancing the performance of the k-mean data mining algorithm by using parallel programming methodologies. To identify the performance of parallelizing, first a study was done on k-mean algorithm using WEKA in a stand-alone machine and then compared with the performance of k-mean with WEKA-parallel. Data mining is a process to discover if data exhibit similar patterns from the database/dataset in the different areas like finance, retail industry, science, statistics, medical sciences, artificial intelligence, neuro science etc. To discover patterns from large data sets, clustering algorithms such as k-mean, k -medoid and, balance iterative reducing and clustering using hierarchies (BIRCH) are used. In data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k (where k is the number of selected groups) clusters in which each observation belongs to the cluster with the nearest mean. The grouping is done by minimizing the sum of squared distances (Euclidean distances) between items and the corresponding centroid (Center of Mass of the cluster). As the data sets are increasing exponentially, high performance technologies are needed to analyze and to recognize patterns of those data. The applications or the algorithms that are used for these processes have to invoke data records several times iteratively. Therefore, this process is very time consuming and consumes more device memory on a very large scale. During the study of enhancing the performance of data mining algorithms, it was identified that the data mining algorithms that were developed for the parallel processing were based on the distributed, cluster or grid computing environments. Nowadays, the algorithms are required to implement the multi-core processor to utilize the full computation power of the processors. The widely used machine learning and data mining software, namely WEKA was first chosen to analyze clusters and identify the performance of k -mean algorithm. k -mean clustering algorithm was applied to an electricity consumption dataset to generate k clusters. As a result, the dataset was partitioned into k clusters along with their mean values and the time taken to build clusters was also recorded. (The dataset consists of 30000 entries and it was collected from the Ceylon Electricity Board). Secondly to reduce the time consumed, we selected parallel environment using WEKA-parallel (Machine Learning software). This is a new option of WEKA used for multi-core programming methodology that can be used to connect several servers and client machines. Here, threads are passed among machines to fulfill this task. The WEKA parallel was installed and established for some distributed server machines with one client machine. The same electricity consumption dataset was used with k -mean in WEKA-parallel. The speed of building clusters was increased when the parallel software was used. But the mean values of the clusters are not exact with the previously obtained clusters. By visualizing both sets of clusters it was identified that some border elements of the first set of clusters have jumped to other clusters. The mean values of clusters are changed because of those jumped elements. The experiment was done on a single core i3, 3.3 GHz machine with Linux operating system to find the execution time taken to create k number of clusters using WEKA for several different datasets. The same experiment was repeated on a cluster of machines with similar specifications to compute the execution time taken to create k number of clusters in a parallel environment using WEKA-parallel by varying the number of machines in the cluster. According to the results, WEKA-parallel significantly improves the speed of k-mean clustering. The results of the experiment for a dataset on the consumption of electricity consumers in the North Western Province are shown in Table 1. This study shows that the use of WEKA-parallel and parallel programming methodologies significantly improve the performance of the k-mean data mining algorithm for building clusters.
Designing an Automatic Speech Recognition System to recognize frequently used sentences in Sinhala
(University of Kelaniya, 2013) Samankula, W.G.D.M.; Dias, N.G.J.
There are millions of people with visual impairments as well as motor impairments caused by old age, sickness or accidents. These people have to face a lot of challenges in their day to day lives. Even at home, if these people want to do a simple task such as control the radio, refrigerator, or fan, it becomes a difficult task because they have to use a white cane or wheel chair to move or get assistance from others. The aim of this research is to develop a speaker independent continuous speech recognition system which is embedded with the capability of understanding human speech in Sinhala language rather than foreign languages because the majority of people in Sri Lanka speak Sinhalese. In order to achieve this goal, human speech signals have to be recognized and converted into effective commands to operate equipment. The Hidden Markov Model Toolkit (HTK) based on Hidden Markov Model (HMM), a statistical approach, is used to develop the system. HTK is used for data preparation, training, testing and analysis phases of the recognition process. Twenty five sentences consisting of 2, 3 or 4 words in Sinhala which are frequently used in day to day activities at home were prepared. Recording process has been done with 10 native speakers (5 females and 5 males) in a quiet environment. Eight hundred speech samples have been collected for training from 4 males and 4 females by speaking each sentence 4 times. The experimental results show 94.00% sentence level accuracy and a 97.85% word level accuracy using a mono-phone based acoustic model and, also a 99.00% sentence level accuracy and a 99.69% word level accuracy using a tri-phone based acoustic model.
The use of Artificial Neural Network for the Prediction of Particular Subject Marks of Final Examination
(University of Kelaniya, 2012) Shanika, K.D.T.; Dias, N.G.J.
In the competitive world, a student must do the best thing to go ahead, and in a limited time, achieve good results. If someone is weak in a subject and if s/he can predict the final result for the examination before the examination, it is the best solution to win the challenge. The objective of this research is to predict particular subject marks of final examinations of a student. In this application, after inputs are given, the student can get the result s/he can obtain for the subject in the final examination. Since the output depends on the input details related to examination, such as the number of courses registered, number of assignments done, number of days to final examination, assignment marks and the stress of a student at that moment are variables. Because of difficulty in measuring stress of a student at the exam, we do not consider the stress of the student as a factor of dependence. Theoretical principles and the use of multilayer neural network training have been directed to predict results. The neural network approach for prediction is based on the type of the learning mechanism applied to generate the output from the network. The learning can be classified as Supervised learning in which the desired response is known to the system, i.e., the system is trained with the priori information available to obtain the desired output. In case of this type of learning, if the computed output does not match the desired output, then the difference between the two is determined which is eventually used to modify the external parameters required to produce the correct output. Backpropagation algorithm in Artificial Neural Network with bias is used in training the neural network until the error will be minimized as less than 0.001, according to the some sort of standards. When the Neural Network is trained, a data set will be encoded into weights and distributed over networks without storing in a particular location. Removing few neurons from a trained network, that is decreasing the inputs will not affect the overall performance of a network and will not handle/maintain a database to store trained data or data which is used to predict results. This research will be on more intelligent applications to predict with more trained data. At present, there is no such application for predicting the subject marks with minimum error before the examination. So this application will be more useful for students to get high marks knowing a predicted result before the examination.
Live Video Coverage over Wireless Ad-hoc Networks
(University of Kelaniya, 2012) Ratnayake, D.S.L.U.; Dias, N.G.J.
In the contemporary world, live video coverage usually works with more than one camera and one projector. Most of the devices may not be stationary. Wired composite or component video cables perform poorly with moving nodes (cameras/multimedia projectors) and need heavy video mixers to manage all nodes. With tools like Wi-Fi enabled micro-controller boards, laptops, netbooks and other lightweight devices we can replace the wired communication in live video coverage. By attaching each node (cameras and projectors) to a wireless enabled device and having a wireless enabled laptop(s) acting as video mixer(s) makes the whole network a wireless ad-hoc network. This means any node can be added when needed and any node can change its role as the need arises, i.e., Laptop attached to a multimedia projector can be used as a video mixer if needed. This increases mobility of every node and range of every node. In this study, we discuss our implementation of wireless video live coverage. Every node is put into zero configuration auto node detection network which is easy for the user as the user only needs to turn on the service and it automatically detects the network along with other nodes and configures itself. We have used python socket programming on Linux platform with Python programming language for implementation of this network. We also used Ubuntu Linux built-in media platform 'gstreamer' for handling media codes. We also discuss how we approached improving quality of service in contrast to utilizing bandwidth bottlenecks.
Evaluation of Stochastic Based Tagging Approach for Sinhala Language
(University of Kelaniya, 2012) Jayaweera, A.J.P.M.P.; Dias, N.G.J.
Part of Speech (POS) tagging is one of the fundamental and important steps of any Natural Language Processing (NLP) task, from speech recognition to machine translation, text to speech, spelling and grammar checking to language-based information retrieval on the Web, etc.Tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a sentence based on its morphological and syntactical properties. Sinhala is a morphologically complex and agglutinative language which has a lot of similar features to other South Asian Languages, such as Hindi, Tamil, Bengali, etc. In Sinhala language, words are inflected with various grammatical features; most words are postpositionally affixed to the root word. Automatically assigning a tag to each word in a language like Sinhala is very complex. So the objective of this paper is to evaluate the Stochastic based tagging approach for Sinhala language, which uses statistical methods to assign tags to each word in a sentence. The approach discussed in the paper is based on a well known stochastic based tagging approach, the Hidden Markov Model (HMM) which selects the best tag sequence for a complete sentence rather than tagging word by word. The historical evidence shows that HMM based approach is a widely used tagging approach in other research studies carried out for other languages. The tagger presented here takes a sentence, a tag set and a corpus as input and gives the tagged sentence as output. The tagging process is done by computing the tag sequence probabilityP(ti|ti-1) and a word-likelihood probability P(wi|ti) from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. In this research, we have used the tagset and the corpus developed by UCSC/LRTL (2005) under PAN Localization Project. The current tagset consists of 29 morpho-syntactic tags. An algorithm is presented in this paper for implementing POS tagging system for Sinhala language. The evaluation was done by using a 14549 word tagged corpus. Testing was done with text extracted from different sources. The approach was evaluated, and produced tag sequences with accuracy between 80% - 97%. With the result obtained from this research, we could say Stochastic based tagging approach is well suited for the Sinhala language. But still there is much more research needed to optimize the accuracy of tagging the Sinhala language.
An Android Application in Searching for Hospitals
(University of Kelaniya, 2012) Chandrasena, A.M.D.; Dias, N.G.J.
This research focuses on Android application development techniques needed to implement a mobile application that consists of features that can search information about hospitals with its exact or nearest location. Since there is no application available to developers to explain such techniques, this research presents such a development. We have been able to create a number of different applications where we provide the user with information regarding a place he or she wants to visit. But thes e applications are limited to desktops only. The objective of this research is to develop such an application for Android mobile devices. The application can help users to find the location of hospitals with the hospital and doctors’ information. From this application, Android users can search any hospital in the country with its exact or nearest location using the Google Maps in satellite or map view. This is an information service, accessible with Android mobile devices through the mobile network, and utilizing the ability to make use of geographical position of any hospital in the country. Also, from this application users can search for information about doctors such as day, time and the hospital that has the facility to channel them according to their specialty. Data is inserted into the database by the administrator through the developed service based web interface and then the Android application fetches that data according to the given details. This application is used as automated testing tools on the Android API to build a map for integrating Google maps to display the location of hospitals by using their coordinates. People have to face a lot of difficulties to find information about hospitals for a variety of reasons. Therefore, the Hospital Search Application for Android Mobiles is developed to find information about hospitals and doctors in order to provide a solution for people who face difficulties when they search for such service providers and places.
A Plug-in for Joomla which Plays Videos and Audios without External Links
(University of Kelaniya, 2012) Alexander, K.K.D.P.C.; Dias, N.G.J.
Joomla has become a most commonly used Content Management System which facilitates users to build their websites on their own. Plug-ins, Modules and Components extend the functionality of Joomla. Each of these extensions is rapidly improving to support various needs of the people who use Joomla. This research focuses on creating a plug-in which enables the user to play videos and audios without searching other websites for video and audio files. Today, almost all video players that have been invented for Joomla, play videos or audios once the player gets the external link to another website where the media is currently available. The objective of this research is to avoid that and load the video or audio from the local machine in order to hasten the process and make it more reliable. The plug-in provides separate folders once the plug-in is installed, to hold videos and audios, which can be saved initially by the administrator or uploaded later by the uploading method that is provided by the plug-in. Once a user decides to watch a specific media file what he/she has to do is select the desired file from the playlist that will be provided by the plug-in and the selected file will be then retrieved from the correct file location and the player will then play the file. The importance in this file retrieving is that the player reads the file not from an external link but from a folder that is installed together with the plug-in. When the site is first created, the administrator is capable of saving videos and audios that are related to the site’s definition in separate folders. It is more efficient to avoid external links to retrieve media files that are needed for a website in order to make it faster and easier for the users. Then connection problems and many other issues in playing media files on a site would not thwart the frequent use of a website and it would increase the popularity of the website gradually.
Genetic algorithm approach for sinhala speech recognition
(University of Kelaniya, 2011) Priyadarshani, P.G.N.; Dias, N.G.J.
Speech recognition is the ability to understand the spoken words and convert them into text. Nowadays there is a considerable tendency of developing ASR systems which are capable of tracking the human speech done in local specific languages and identifying them because the people prefer to use their native language. Even though there is a dire need of Sinhala speech recognition, it is still in the beginning. Here we have applied Genetic Algorithm (GA) for automatic recognition of isolated Sinhala words and Mel Frequency Cepstral Coefficients (MFCC) to model the speech signal. GA is not considered as a mathematically guided algorithm. In fact, GA is a stochastic nonlinear process. Generally, GA involves a three operation selection of crossover and mutation that emulate the natural genetic behavior. The purpose of selection is to determine the genes to retain or delete for each generation based on their degree of fitness. Even though there are several types of selection methods, we have used elitist selection as we observed that it allows to retain a number of best individuals for the next generation and improve the recognition capability. If the individuals are not selected to reproduce they may be lost. But the fittest individual survives. Crossover (reproduction) is a process to exchange chromosomes to create the next generation. Rather than two-point and uniform crossover, in this work we have used one point crossover with probability 0.80 to prevent unnecessary crossover. A mutation is a change of a gene found in a locus randomly determined. The altered gene may cause an increase or a weakening of the recognition. Mutation probability is usually very low. Each offspring is subjected to mutation with probability 0.01. The reference dictionary (learning corpora) is the population managed by our genetic algorithm. Initially we selected ten Sinhala words as the vocabulary with 24 repetitions for each word from three speakers. Therefore, the dictionary is made up of 240 individuals. This population is divided into 10 sub-populations (the number of words), the choice of the initial population is random for each word to be recognized. An initial population is made up of all occurrences of a word, i.e., 24 individuals. To evaluate the performance, we carried out two types of tests. We used 6 repetitions of each word made by three speakers who participated in the learning process and 10 repetitions of each word generated by a completely new speaker. First test proved that our GA is capable of handling multiple speakers. And the second test proved that our GA is independent of the speaker. Further, word recognition of registered speakers is dominant compared to a relatively unregistered speaker. However, results indicated a satisfactory precision even for speaker independent cases.
Part of Speech (POS) tagger for Sinhala language
(University of Kelaniya, 2011) Jayaweera, A.J.P.M.P.; Dias, N.G.J.
Sinhala is a morphologically complex and agglutinative language. Most of the features of the words are postpositionally affixed to the root word. This paper presents a POS (Part Of Speech) tagger for Sinhala language using Hidden Markov Model (HMM). Part Of Speech tagging is one of the fundamental and important steps of any natural language processing task, which is the process of assigning a part-of-speech or other lexical class marker to each word in a sentence. This is important in every area of natural language processing (NLP) from speech recognition to machine translation, spelling and grammar checking to language-based information retrieval on the web. The tagger takes a sentence, a tagset and a corpus as input and gives the tagged sentence as output. The tagging process is done by counting the tag sequence probability P(ti|ti-1) and a word-likelihood probability P(wi|t) form the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. In this research, we use the tagset and the corpus developed by UCSC/LRTL (2005) under PAN Localization project. The current tagset consists of 29 morphological syntactic tags. An algorithm is presented in this paper for the implementation of POS tagging system for Sinhala language, which would enable users to reach more than 80% of the success rate.
Development of a Linear-Model based Computer Software for Least Cost Poultry ration formulation
(University of Kelaniya, 2008) Piyaratne, M.K.D.K.; Dias, N.G.J.; Attapattu, M.
This study was based on the development of a user friendly, linear-model based computer software system for least cost poultry ration formulation. The software developed in this work used most recent advancements in the field of poultry nutrition and feeding, and developed to suit the local conditions. Sixty locally available feed ingredients were used and thirty nutrients which are most important to poultry growth were considered. Standard linear programming (LP) model for least cost ration formulation was used to analyze and determine the most efficient way of compounding the least cost ration. A mathematical model was constructed, taking into consideration nutrient composition of each of the available ingredient, costs and nutrient requirements of the birds2• Since the ideal protein (IP) concept is becoming popular as a mean of increasing the utilization efficiency of dietary proteins by poultry, NRC (N ational Research Council) and IICP (Ideal Illinois Chick Protein) ideal proteins were also included in broiler rations for calculations. Therefore, although the initial database was based on NRC recommendations users can freely customize ingredient levels and nutrient requirements as and when they required. Ration balancing can be done with 100% equal requirements up to 10-12 major nutrients based to least cost. The standard nutrient requirement levels can be customized and researchers can do experiments with different requirement levels. Therefore, this software can be a very useful tool for researchers, nutritionists as well as teachers. Amino acid profile selection feature allows researchers to formulate experimental rations with various amino acid levels and protein levels. The software can be run under Microsoft Windows environment and users are able to print and save results as well as initial database information. The software has been successfully installed, tested and evaluated successfully with several research projects.

Annual Research Symposium (ARS)

Browse

Filters

Settings

Sort By

Results per page

Search Results