Science

Permanent URI for this communityhttp://repository.kln.ac.lk/handle/123456789/1

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Detecting plagiarism in multiple Sinhala documents
    (International Research Conference on Smart Computing and Systems Engineering - SCSE 2018, 2018) Ganepola, G.A.U.E.; Wijayasiriwardhane, T.K.
    Availability of unlimited information resources over the Internet and the advancement of the Internet search engines such as Google to locate those resources much easily have contributed to an increase of plagiarism. Though there are a number of software tools available for detecting plagiarism in multiple English documents, no such a tool is yet available for the Sinhala language. This paper presents a novel language dependent approach to detect plagiarism in multiple Sinhala documents. It uses stemming, stop word removal and synonym replacement for text preprocessing and term frequency-inverse document frequency (tf-idf) and cosine similarity for similarity comparison. A prototype software tool was developed and interlinked with an operational Sinhala WordNet to demonstrate the viability of the proposed approach. The prototype tool was validated against a sample of Sinhala assignments from secondary school students. The assignments were also examined by an expert to determine whether they had actually been plagiarized. When compared the results of the prototype tool against those of the expert judgment, we found that our proposed approach for plagiarism detection in multiple Sinhala documents performs with an accuracy of over 80%.
  • Thumbnail Image
    Item
    An algorithm for plagiarism detection in Sinhala language
    (Faculty of Science, University of Kelaniya, Sri Lanka, 2016) Basnayake, S.F.; Wijekoon, H.; Wijayasiriwardhane, T.K.
    According to the Merriam-Webster dictionary, the simple definition of the verb plagiarize is, “to use the words or ideas of another person as if they were your own words or ideas”. Many software tools to aid in detecting plagiarism is available for English language, but equivalent tools are not yet available specifically for Sinhala language. Though language independent tools that work on many languages are available, they generally give poor results as they do not consider language specific features. There are some detection methods proposed for Asian languages like Hindi, Malayalam, Arabic and Persian which have some close relationship and similar properties of Sinhala language. All of those methods use language specific rules and they even outperform the commercially available tools. These findings are evidence that the language specific plagiarism detection is more effective than the language independent plagiarism detection as some paraphrasing techniques can be used to mislead the language independent systems.Sinhala language is constitutionally recognized as the official language of Sri Lanka, along with Tamil. Due to the complexity of the language structure and rules of grammar, the language independent tools seem to provide poor results when used for plagiarism detection in Sinhala documents. In this research, we propose a novel plagiarism detection algorithm built around content based methods specific to Sinhala language. The methodology of this study follows both experimental and build approaches. The proposed plagiarism detection system has two modules namely, text pre-processing module and the similarity detection module. The text pre-processing module pre-process the text files to standardize the text sources using techniques such as stop word removal, number replacement, lemmatization, synonym recognition and creating n-grams. Then the similarity detection module analyses the pre-processed text using Jaccard coefficient and cosine similarity coefficient to measure the similarity between two documents. A prototype of Sinhala language plagiarism detection system will be implemented using the proposed method and several combinations of the above techniques will be used to discover the best combination. Testing and statistical performance evaluation will be carried out using a sample of source text files and plagiarized text files in Sinhala language by taking expert judgements also into the consideration. The final outcome of this research study is to develop an effective software application for plagiarism detection in Sinhala language documents.