Some Methods for Annotation Localization and Writer Identification for Processing Annotated Documents

dc.contributor.advisorHarit, Gaurav
dc.creator.researcherPandey, Shilpa
dc.date.accessioned2023-12-06T10:42:05Z
dc.date.available2023-12-06T10:42:05Z
dc.date.awarded2019-05
dc.date.issued2019-05
dc.date.registered2012
dc.description.abstractDocuments containing mixed types of text content (printed and handwritten) are proliferating in business and academic environments. They result frequently from annotating printed documents such as bills, administrative forms, birth-certificates, letters, etc. A document can be annotated in many ways. Annotations can be handwritten text, underlines, cuts, marks, special symbols of irregular shapes, and handmade drawings. The annotated text can be multi-oriented and multi-script. Several methods for extracting annotations are outlined in the literature. Most of these systems extract annotations in controlled scenario wherein the layout is predictable, like bank checks, postal address, drafts, forms etc. Extracting handwritten annotations with non-predictable layouts in a real environment remains a difficult task because annotations can be complex due to multi-oriented handwritten text and marks which may overlap with printed text. This thesis is aimed at developing methods for localizing complex annotations in non-predictable layouts and identifying the writer for the handwritten words. We develop methods for localizing annotations, categorizing them as textual and symbolic annotations. We further sub-categorize symbolic annotations as underlines and encirclements; and textual annotations as marginal text and inline text. We apply our methods to localize annotations written on documents such as conference papers, articles, books, office documents etc. We use statistical spectral partitioning to segment out annotations from printed text. In this approach, we work with a reduced feature set to efficiently extract the annotations on a cluttered background. We develop a new feature called Envelope Straightness to enhance the feature set. This has improved performance over the state-of-the-art features. We then investigate the use of two top-down visual saliency models for categorizing annotations. The first model makes use of supervised learning in the form of conditional random fields with a sparse encoding of feature vectors. The second model makes use of a weakly supervised learning formulation for discriminant saliency. The experimental results corroborate our hypothesis that our attention gets directed towards annotated regions in an image, and therefore, top-down saliency models can be learned to give high saliency values for the annotated regions. Along with supervision, these models take advantage of the structure and context of the annotations. For scenarios where multiple writers annotate on the same page we develop a method to identify the writers for the handwritten words. A sliding window technique is used to extract allographic features for sub-word portions. We formulate a supervised framework and exploit the discriminative properties of the features that belong to the same cluster. We propose a new technique for separating ascenders and descenders of hand-written words from its core-region. We use the structural properties of ascenders and descenders to identify the writers of the handwritten words. The work also contributes towards dataset creation and ground truth generation for the various problems addressed in this thesis.en_US
dc.description.notecol. ill.; including bibliographyen_US
dc.description.statementofresponsibilityby Shilpa Pandeyen_US
dc.format.accompanyingmaterialCDen_US
dc.format.extentxv, 158p.en_US
dc.identifier.accessionTP00041
dc.identifier.citationShilpa Pandey. (2019). Some Methods for Annotation Localization and Writer Identification for Processing Annotated Documents (Doctor's thesis). Indian Institute of Technology Jodhpur, Jodhpur.en_US
dc.identifier.urihttps://ir.iitj.ac.in/handle/123456789/51
dc.language.isoen
dc.publisherIndian Institute of Technology Jodhpur
dc.publisher.departmentComputer Science and Engineeringen_US
dc.publisher.placeJodhpur
dc.rights.holderIIT Jodhpur
dc.rights.licenseCC-BY-NC-SA
dc.subject.ddcMethodsen_US
dc.subject.ddcAnnotationen_US
dc.subject.ddcLocalizationen_US
dc.subject.ddcWriter Identificationen_US
dc.subject.ddcProcessing Annotateden_US
dc.subject.ddcDocumentsen_US
dc.titleSome Methods for Annotation Localization and Writer Identification for Processing Annotated Documentsen_US
dc.typeThesis
Files
Original bundle
Now showing 1 - 5 of 14
Loading...
Thumbnail Image
Name:
01_title.pdf
Size:
54.65 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
02_prelim pages.pdf
Size:
317.62 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03_table of contents.pdf
Size:
67.79 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
04_abstract.pdf
Size:
32.72 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
05_chapter 1.pdf
Size:
4.55 MB
Format:
Adobe Portable Document Format
Collections