IPRC - 2016

Permanent URI for this collectionhttp://repository.kln.ac.lk/handle/123456789/157

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    Comparison of Part of Speech taggers for Sinhala Language
    (Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Jayaweera, M.; Dias, N.G.J.
    Part of Speech (POS) tagging is an important tool for processing natural languages. It is one of the basic analytical model used in for many Natural language processing applications. It is the process of marking up a word in a corpus as corresponding to a particular part of speech like noun, verb, adjective and adverb. Automatic assignment of descriptors to the given tokens is called Tagging. The descriptor is called a tag. The tag may indicate one of the parts of speech category and the semantic information. So tagging is a kind of classification. The process of assigning one of the parts of speech to the given word is called parts of speech tagging. It is commonly referred to as POS tagging. In grammar, a part of speech (also known as word class, lexical class, or lexical category) is a linguistic category of words (or more precisely lexical items), which is generally defined by the syntactic or morphological behavior of the lexical item in the language. Each part of speech explains not what the word is, but how the word is used. In fact, the same word can be a noun in one sentence and a verb or adjective in another. In most of the natural languages in the world, noun and verb are common linguistic categories among others. Almost all languages have the lexical categories noun and verb, but beyond these there are significant variations in different languages. The significance of the part of speech for language processing is that it gives a significant amount of information about the word and its neighbours. There are different approaches to the problem of assigning a part of speech tag to each word of a natural language sentence. The most widely used methods for English are the statistical methods that is Hidden Markov Model (HMM) based tagging and the rule based or transformation based methods. Subsequent researches add various modifications to these basic approaches to improve the performance of the taggers for English. In this paper we present a comparison of the different researches that was carried out of POS tagging for Sinhala language. For Sinhala language, there were 4 reported work for developing a POS tagger. In 2004, a HMM based POS tagger was proposed using bigram model and reported only 60% of accuracy. Another HMM based approach was tried out for Sinhala language in 2013 and reported a 62% of accuracy. In 2016, another research was reported 72% of accuracy which was a hybrid approach based on bi-gram HMM and rules based approach in predicting the relevant tag for unknown words. The tagger that we have developed is based on a trigram based HMM approach, which used the knowledge of distribution of words and parts of speech categories in predicting the relevant tag for unknown words. The Witten-Bell discounting technique was used for smoothing and our approach gave an accuracy of 91.50% with a corpus of 90551 annotated words.
  • Item
    An improved method to isolate Vehicle License Plate
    (Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Ashan, M.K.B.; Dias, N.G.J.
    In a License Plate Recognition (LPR) system, vehicle license plate isolation is one of the major tasks. By sending this isolated vehicle license plate image into an Optical Character Recognition (OCR) system, the license plate can be recognized. Locating the license plate in a vehicle image, the non-uniformity of license plates and the captured images which consists of skewed license plates are the key problems when it comes to the license plate isolation problem. The work proposed in this paper is a solution to the vehicle license plate isolation problem. The first phase of license plate isolation process is the conversion of the input image into grayscale. This may help to reduce the luminance of the colour image. As the second phase, the boundaries of the objects in the image will be improved by filling any unwanted holes. This filling process is called dilation. Next the, edge processing is performed on the dilated image both horizontally and vertically and, by drawing histograms for these two processing, the probable candidates for the license plate locations are identified. However, there may be consecutive columns and rows which consists of drastically changing values in the histograms. These are smoothed in the next phase. Now, the low histogram value regions are identified as the unwanted regions and by removing these, the probable candidate regions are identified. The most probable candidate which may contain the license plate is considered to be the highest histogram valued region. Closely located line of letters in the license plate with a plain background colour causes to generate such higher histogram values rather than in other regions. Finally, our algorithm work on different levels of illumination and skewed images. The efficiency of our algorithm is significantly increased and it is around 80%.
  • Item
    New Processing Model for Operating Systems
    (Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Weerakoon, C.; Karunananda, A.; Dias, N.G.J.
    The computer plays a vital role in executing programs to solve problems. Further, for each and every such program, a process must be created, and all the required resources should be allocated to the process. In fact, the management of these processes is one of the most important jobs to be accomplished by an operating system. Moreover, by observing different behaviours that the processes display, the researchers have introduced variety of processing models such as two-state model, three-state model, five-state model, and seven-state model to increase the processing power of the computer. Here, the state of a process is related to the current task that the process does, and the term use for a state can be changed from one operating systems to another. Although, they have gained improvements, so far they have failed to produce a processing model to fully utilize the underline hardware architecture. Meanwhile, we made some observations on real world scenarios which revealed that how the human mind works is rather different from how the processing models incorporated in to the computers work till then. Furthermore, the human mind conditionally evolves with the time by drawing associations among the existing and newly arriving data and instructions. Having this insight, the research we conduct introduces a new eight-state processing model, which executes continuously depending on the presented conditions to enhance the processing power of the system. There, one additional state with the name “Terminate” with four new actions such as Ready-to-Ready, Ready-to-Terminate, Exit-to-Ready/Suspend, and Exit-to-Ready have been introduced to the existing seven-state processing model. In addition to those, two of the existing actions such as New-to-Ready/Suspend and New-to-Ready have been modified. In doing these changes, a set of fifteen from twenty four causal relations in Buddhist theory of mind, which can be exploited in explaining any phenomenon, has been applied. In order to depict the changes on each and every action, and to do the experiments, particular algorithms have being designing and these algorithms are to be integrated to the Kernel of the operating system. After doing these implementations, new processing model can be compared with the existing model by executing the same program for multiple times in the operating system with and without the new model and recording the time take in each round. Then the dependent two sample t-test which is more powerful and descriptive, can be applied on the results. Further, to check the quality of the new model a parametric test can be applied on the results of a survey conducted on a single group of users who has worked on the operating system with and without the new processing model.
  • Item
    Optimization of SpdK-means Algorithm
    (Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Gunasekara, R.P.T.H.; Wijegunasekara, M.C.; Dias, N.G.J.
    This study was carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. As a result, the Speedup k-means (SpdK-means) algorithm which is an extension of k-means algorithm was implemented to reduce the cluster building time. Although SpdK-means speed up the cluster building process, the main drawback was that the cumulative cluster density of the created clusters by the SpdK-means algorithm was different from the initial population. This means some elements (data points) were missed out in the clustering process which reduces the cluster quality. The aim of this paper is to discuss how the drawback was identified and how the SpdK-means algorithm was optimized to overcome the identified drawback. The SpdK-means clustering algorithm was applied to three datasets which was gathered from a Ceylon Electricity Board Dataset by changing the number of clusters k. For k=2, 3, 4 did not give any significant difference between the cumulative cluster density and the initial dataset. When the number of clusters were more than 4 (i.e., when k>=5), there was a significant difference on cluster densities. The density of each cluster was recorded and it was identified that the cumulative density of all clusters was different from the initial population. It was identified that about 1% of elements from total population were missing after clusters were formed. To overcome this identified drawback the SpdK-mean clustering algorithm was studied carefully and it was identified that there are elements which had equal distances from several cluster centroids were missed out in intermediate iterations. When an element had an equal distance to two or more centroids the SpdK-means algorithm was unable to identify to which cluster that the element should belong and as a result the element is not included in any cluster. If such element was included into all the clusters that had an equal distance and if this process is repeated to all such elements the cumulative cluster density will be highly increased from the initial population. Therefore, the SpdK-means was optimized by selecting one of the cluster centroids which had equal distance to one element. After many studies of selection methods and their outcomes, it was able to modify the SpdK-means algorithm to find suitable cluster to an equal distance element. Since, an element can belong to any cluster it is not possible give any priority to select a belonging cluster. As all centroids had equal distances from the elements, the algorithm will select one of the centroid from all equal centroids randomly. The developed optimized SpdK-means algorithm successfully solved the identified problem by identifying missing elements and including them in to the correct clusters. By analyzing the iterations when applied to the datasets, the number of iterations was reduced by 20% than the former SpdK-means algorithm. After applying optimized SpdK-means algorithm to above mentioned datasets, it was found that it reduces the cluster building time by 10% to 12% than the SpdK-means algorithm. Therefore, the cluster building time was further reduced than the former SpdK-means algorithm.