Feature Extraction from Old Tamil Newspapers Using Histogram Minima

Kasthuri, S.; Darsha, M.; Ranathunga, L.

Feature Extraction from Old Tamil Newspapers Using Histogram Minima

dc.contributor.author	Kasthuri, S.
dc.contributor.author	Darsha, M.
dc.contributor.author	Ranathunga, L.
dc.date.accessioned	2018-08-09T07:39:48Z
dc.date.available	2018-08-09T07:39:48Z
dc.date.issued	2018
dc.description.abstract	Archaeological records which provide information about the history of human cultures and past events. Newspapers can be considered as one of the main sources of gathering archaeological data. It can be seen that there exist only a few numbers of systems for the processing of old Tamil newspaper articles. An automated image processing system proposed as a suitable solution to the way of efficient and flexible searching approach, which can be used for old Tamil newspapers. In this paper is presented image processing technique to extract the features such as headlines and sub-headlines from old Tamil newspaper scanned images. Historical newspapers become damaged over time. The images of these newspapers become difficult to read the contents. The quality of the image improved by preprocessing techniques such as grayscale dilation, median filtering, and adaptive binarization. It helps to easily extract needed information on the image. Segment the article and identify the heading of the article will help to improve data manipulation. Feature extraction from old Tamil newspaper images followed these step processes; Horizontal smoothing is necessary to distinguish the paragraphs and empty space between each column; Vertical smoothing is implemented to distinguish between each paragraph and headlines; Logical AND operation combines the outcome of horizontal smoothing and vertical smoothing using AND operation; Height measurement of each block is followed by horizontal projection, that involves scanning of pixels through horizontal arrays to measure the black pixel density against index of each row by using horizontal histogram minima. This step identified horizontals breaking points of individual regions within an article. The four major horizontal regions are headlines, sub-headline, text, and graphics. The irregular block may contain images within texts. Vertical projection can be carried out to distinguish the images among text. In the evaluation process used fifty articles which have different format of paragraph arrangements and also include images. First, identified and got the count of regions manually. After that compared the result from identified regions and got the measurements. The region was identified with articles in the efficiency of 80.09%, headline extraction accuracy was 81.616%.	en_US
dc.identifier.citation	Kasthuri, S., Darsha, M. and Ranathunga, L. (2018). Feature Extraction from Old Tamil Newspapers Using Histogram Minima. 3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p4.	en_US
dc.identifier.uri	http://repository.kln.ac.lk/handle/123456789/18972
dc.language.iso	en	en_US
dc.publisher	3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka.	en_US
dc.subject	archaeological records	en_US
dc.subject	image processing	en_US
dc.subject	headline extraction	en_US
dc.subject	histogram analysis	en_US
dc.title	Feature Extraction from Old Tamil Newspapers Using Histogram Minima	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 04.pdf
Size:: 164.85 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

ICACT 2018