Feature Extraction from Old Tamil Newspapers Using Histogram Minima

dc.contributor.authorKasthuri, S.
dc.contributor.authorDarsha, M.
dc.contributor.authorRanathunga, L.
dc.date.accessioned2018-08-09T07:39:48Z
dc.date.available2018-08-09T07:39:48Z
dc.date.issued2018
dc.description.abstractArchaeological records which provide information about the history of human cultures and past events. Newspapers can be considered as one of the main sources of gathering archaeological data. It can be seen that there exist only a few numbers of systems for the processing of old Tamil newspaper articles. An automated image processing system proposed as a suitable solution to the way of efficient and flexible searching approach, which can be used for old Tamil newspapers. In this paper is presented image processing technique to extract the features such as headlines and sub-headlines from old Tamil newspaper scanned images. Historical newspapers become damaged over time. The images of these newspapers become difficult to read the contents. The quality of the image improved by preprocessing techniques such as grayscale dilation, median filtering, and adaptive binarization. It helps to easily extract needed information on the image. Segment the article and identify the heading of the article will help to improve data manipulation. Feature extraction from old Tamil newspaper images followed these step processes; Horizontal smoothing is necessary to distinguish the paragraphs and empty space between each column; Vertical smoothing is implemented to distinguish between each paragraph and headlines; Logical AND operation combines the outcome of horizontal smoothing and vertical smoothing using AND operation; Height measurement of each block is followed by horizontal projection, that involves scanning of pixels through horizontal arrays to measure the black pixel density against index of each row by using horizontal histogram minima. This step identified horizontals breaking points of individual regions within an article. The four major horizontal regions are headlines, sub-headline, text, and graphics. The irregular block may contain images within texts. Vertical projection can be carried out to distinguish the images among text. In the evaluation process used fifty articles which have different format of paragraph arrangements and also include images. First, identified and got the count of regions manually. After that compared the result from identified regions and got the measurements. The region was identified with articles in the efficiency of 80.09%, headline extraction accuracy was 81.616%.en_US
dc.identifier.citationKasthuri, S., Darsha, M. and Ranathunga, L. (2018). Feature Extraction from Old Tamil Newspapers Using Histogram Minima. 3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p4.en_US
dc.identifier.urihttp://repository.kln.ac.lk/handle/123456789/18972
dc.language.isoenen_US
dc.publisher3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka.en_US
dc.subjectarchaeological recordsen_US
dc.subjectimage processingen_US
dc.subjectheadline extractionen_US
dc.subjecthistogram analysisen_US
dc.titleFeature Extraction from Old Tamil Newspapers Using Histogram Minimaen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
04.pdf
Size:
164.85 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections