ARS - 2009

Permanent URI for this collectionhttp://repository.kln.ac.lk/handle/123456789/167

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    An analysis of sound parameters for prosodic modeling in Sinhala text to speech synthesis
    (Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Dias, N.G.J.; Kumara, K.H.; Dolawattha, D.D.M.
    Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software and/or hardware. Text-to-Speech (TTS) is one of the speech synthesis technologies. Before a synthesizer can produce an utterance, several steps have to be completed. Among them, after computing the basic pronunciation from authographic text, prosody annotation should be performed. Finding correct intonation, stress, and duration from written text is the most challenging problem for most of the natural languages. These features together are called prosodic or suprasegmental features and may be considered as the melody, rhythm, and emphasis of the speech at the perceptual level. Unfortunately, written text usually contains very little information of these features and some of them change dynamically during speech. However, with some specific control characters this information must be given (at least some extend) to the speech synthesizer to produce enough natural speech of the target language. On the other hand timing at sentence level or grouping of words into phrases correctly is difficult; in many languages, prosodic phrasing is not always marked in text by punctuation, and phrasal accentuation is almost never marked. If there is no breath pauses in speech or if they are in wrong places, the speech may sound very unnatural or even the meaning of the sentence may be misunderstood. As an example, in Sinhala, the input string " wïu wdjo@ ” " can be spoken as three different ways changing the intonation patterns as angry, sadness and sarcastic; giving three different meanings to the listener. Here intonation means how the pitch pattern or fundamental frequency changes during speech. The prosody of continuous speech depends on many separate aspects, it may be twice as high as with male voice and with children it may be even three, such as the meaning of the sentence and the speaker characteristics and emotions. Therefore it is clear that prosody plays a major role in speech synthesis, and a deeper treatment of prosody is a must in any kind of speech synthesis. In this work, in order to develop generic models for prosodic synthesis in speech synthesis, we selected 150 possible sentences in Sinhala Language and recorded them according to the above three intonation patterns (i.e. angry, sadness and sarcastic) with a female native speaker who is a well trained person in Drama and Theater. Then we computed various speech parameters for above 150X3 sentences using PRAAT speech processing tool developed by www.praat.org. Hence we found that for all above 150 sentences there is an incremental pattern in the duration from Angry to Sarcastic. No regular pattern in Median, Mean, Standard Deviation, Minimum, and Maximum values of the Pitch parameter. Regarding the pulses, we computed the Number of pulses, Number of periods, Mean period, Standard deviation of period for each of the above sound files and we observed that there is no regular pattern in the parameter Pulses. For voicing parameter we computed the Fraction of locally unvoiced frames, Number of voice breaks and Degree of voice breaks. However for this parameter there were not regular patterns too. Then we computed the Harmonicity values as Mean autocorrelation, Mean noise-to-harmonics ratio, Mean harmonics-to-noise ratio and found that there is no regular pattern. After computing the mean-energy intensity of each sentences, we found that there is an incremental pattern in the Intensity by concerning the order Angry, Sarcastic and Sadness. Finally we computed the formant values as First formant, First Bandwidth, Second Formant, Second Bandwidth, Third formant, Third Bandwidth, fourth formant and forth bandwidth and found that there is no regular pattern in different formant parameters. Although there are no regular patterns in most of the above speech parameters, in order to develop a more natural sounding speech synthesizer, however these parameters should be annotated with basic pronunciation computed from the authograpich text in speech synthesis. Therefore in future we hope to develop more generic probabilistic models based on this analysis to model above speech parameters for Sinhala speech synthesis.
  • Item
    Designing and implementation of new computer software system for the Centre for Open and Distance Learning
    (Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Dias, N.G.J.; Dolawattha, D.D.M.
    Nearly 150000 students were qualifying for university education in Sri Lanka annually. But only 18000 students are selected to follow different undergraduate courses in local universities where we have free education. Remaining students have to follow external degree programmes conducted by National universities, professional courses conducted by private sector institutes or Government institutes and few are going abroad for higher education. Large portion of students are registered annually at the University of Kelaniya among the students who follows external degree courses at different national universities. Nearly 85500 students were registered from 1993 to 2008 and 13716 students were graduated from them so far. We have identified that after the year 2005 more than 10000 students are registering annually. Five different degree courses are offered and 16 exams and 16 seminars need to be conducted for them annually by the CODL. We require more robust, powerful, user friendly and reliable Computer Software System (CSS) by considering rapidly growing students capacity and services rendered to them. On the other hand we require a CSS, because a new exam evaluation system (NEES) has been introduced from the student batch 2007. In that NEES offered course units with particular credit value and each student needs to be completed specified no of credits within a specified period of time relevant to the degree followed. CSS is a Management information System (MIS) type Multi-user Computer System working in a local network environment and password restricted users will be operated the system. Main functionalities will be student registration, conducting exams, printing admissions, printing transcripts and certificates and other required sub functionalities come under above. All functional requirements, non-functional requirements and domain requirements were identified. System was designed by integrating concurrency control and user authorization. The authorized users will only be the CODL Staff and categorize them according to their job assigned. (i.e. Student registration user, Examination data entry user etc.). User authorization subsystem considers different functionalities of the CSS and gives access to each user category by considering their job assigned. Limitations and constraints have to be considered when developing the CSS. It will not be connected to the Campus wide network and run in a separate server with a view to avoid internet hacking and reduce the internet virus risk. Examination results are being published on the CODL web, which runs in a separate server. Storing data in the database is unlimited and the database backup facility is an important feature. Potential usefulness of the CSS are the Maintainability and Modularity. An Integrated software process model was used to model the CSS between two software process models, Incremental development and Rapid application development. More user friendly and interactively interfaces will be developed in CSS. Designing the CSS is done using Rational Rose with object oriented software design techniques. It was developed on .Net framework using VB.Net as the front-end tool and SQL Server as the back-end tool.
  • Item
    Design & implementation of an efficient SMS server
    (Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Dias, N.G.J.; Rathnasekara, P.L.A.U
    Short Message Service (SMS) is one of the most popular services provided by the telecommunication companies all over the world. Due to the low cost and efficiency of this service compared to the traditional ways of sending messages, companies now a days use this technology heavily to send business messages to their customers and employers. The main objective of this in this research is to implement a SMS server using open source software with minimum resources. Basically a SMS server consists of two main features. It can be used for sending messages and the other is it can be used for receiving messages and store them in a database. Apart from these two features the proposed server consists of many other features such as categorization of receiving messages according to the type, restricting number of messages sending for the administrator, prevent the user to login to the server in the administrator defined hours, create template messages, allow only to login to the server through authorized client machines be (IP address) and etc. In order to achieve a higher level of security, we have stored the encrypted password together with the usernames for validating the users‟ login to the server. These data is retrieved through SQL commands using „data decryption‟ methods. The main function of this server is sending and receiving messages using a GSM modem. The initial step was to configure the GSM modem to connect it to the server machine through a USB port. A connection should be established with the SIM card, since the functionality of the modem is handled completely by the SIM card. After a connection is established, SMS can be sent and received from the SIM card using the „AT‟ commands (Hayes commands) technology. Sending messages and receiving messages are stored in the outbox table and inbox table of the database respectively. The box messages are then classified according to the type. CSV file uploading technology was used to insert data to the database, since it is more convenient to the user. Using this method messages are stored in a queue table and then send one by one automatically in a user desired time. When sending a message, server checks whether the recipient number is restricted or in the correct format. This server was built on Apache Tomcat web server and the web pages are created using JSP technology. MySQL database server, JDK 1.5 and Rational Rose S/W were used in the development of the database. The server was built using only one modem; however, this can be developed to support several modems to increase the efficiency when sending messages for millions of customers using the Queue. However the server developed is efficient and can be used in any company or organization in a robust manner.
  • Item
    A tool for automatic derivation of phone transitions for the creation of a diphone database for Sinhala text to speech synthesis
    (Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Kumara, K.H.; Dias, N.G.J.
    Since the conventional user interfaces such as keyboard and monitors restrict the usage of computers, there is a dire need for an interface other than keyboard and screen-interface that is widely in use at present. Speech technologies promise to be the next generation user interfaces. In general, two technologies for processing speech are needed. One is speech recognition, and the other is speech synthesis. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software and/or hardware. Text-to-Speech (TTS) is one of the speech synthesis technologies. TTS can be defined as “the production of speech by machines, by way of the automatic phonetization of the sentences to utter”. Before a synthesizer can produce an utterance, several steps have to be completed. First, the right segments/units have to be selected. The units usually used are diphones, half-syllables, and triphones etc. Many synthesizers use diphones as their basic units of concatenation. A diphone is the transition between two speech sounds, obtained from natural speech. Creating a diphone database, which contains all the sound transitions in the target language, is critical in diphone TTS synthesis.
  • Item
    A study on Linux Live CD re-mastering
    (Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Priyadarshani, P. G. N.; Dias, N.G.J.
    Linux is a Unix-like operating system initially created by Linus Torvalds in 1991. This Operating System basically consisted of the kernel and some GNU tools. Thereafter Linux was developed progressively with the help of the people around the world. The most interesting thing is Linux provides 100% freedom to run, copy, distribute, study, change and improve because it is free and open source. As a consequence, some individuals and companies began distributing Linux with their own choice of packages bound around Linus' kernel aiming some user communities. Redhat Enterprise Linux, Debian, Suse, Ubuntu, Fedora, CentOS and Knoppix are some major distributions. In the users domain, Live CDs are very important because it is capable of trying out a distribution without installing and allows running the distribution on any computer without making any harm to the existing system. Along with the portability, it has a great demand over installation CDs. Moreover, Live CDs can be used to determine whether an operating system or version is compatible with specific hardware settings and certain peripherals, to know which computer or peripheral will function properly before purchasing it. People can also use a Live CD to troubleshoot hardware while many Live CDs can save user created files in a Windows partition, a USB drive, a network drive, or other accessible media. Even the Live CD s are already packed up with some software and capable of fulfilling the user requirements to some extent, the problem is, a preferred Live CD may not provide an environment that is perfectly suited for a specific user since Live CDs are dedicated to specific applications according to the requirements of thematic user communities. Therefore, it comes with the software that valuable to a specific user as well as some software that do not need at all. Further, some software that is essential for a specific user may not be included. On the other hand, although some Live CDs provide the facility of installing the operating system in to the computer, still it is impossible to install software that are not included with the Live CD without an Internet connection, because it needs to download the relevant dependencies which supports the software. The solution to the above mentioned problems of Live CDs is to create customized Live CDs according to the user requirements in order to acquire higher utility. Moreover, it is possible to upgrade a Live CD by including security patches and software updates etc. The main purpose of this study is to explain how to customize a Live CD by adding necessary software packages, plug-ins, removing some unwanted packages and changing the appearance while upgrading. First, we have obtained an image of the original Live CD. Then, we re-mastered the core of the Live CD by using built-in UNIX commands and some standard Linux tools. After recreating the ISO image, it was burned to a CD/DVD. Finally, the customized Live CD was successfully compatible with particular users‟ requirements. To demonstrate the procedure we have selected Ubuntu one of the famous Linux distributions in the world.