Doctoral Theses

Permanent URI for this collection

https://ir.iitj.ac.in/handle/123456789/10

Browse

Now showing 1 - 2 of 2

Fine Grained Feature Representation Using Computer Vision Techniques for Understanding Indoor Space
(Indian Institute of Technology Jodhpur, 2021-04) Chattopadhyay, Chiranjoy; Bhatnagar, Gaurav
Understanding the indoor spaces from images, videos, or other visual information has become an essential task in the current scenario. With the rapid urbanization of cities and the common user’s quick requirements, most real estate businesses have become web-based. It has become essential to understand an indoor space from its visuals and provide users with an interpretation. These users can be looking for a house with their desired features, architect, or an interior designer, trying to understand the given building. A person looking for a home or office space requires a quick solution for his/her property based needs with desired features. Hence, it is essential for showcasing the properties online, with the proper display of their indoor and outdoor environment and detailed interpretation. An interpretation provided in natural language is the most powerful way to communicate the information in the indoor space’s visuals. Another form of interpretation could be a line drawing or map for much complex indoor space visuals. While searching any property, every potential customer looks up for its detailed images and description available about the number of bedrooms, bathrooms, details of kitchen, halls, balconies, and global relations present in them. In case of lack of a proper description for any house listing, users tend to skip buying or renting it. A real estate website or any rental website may have thousands of listings. To reduce manual efforts for understanding and to give a correct interpretation for these listings, it is required to device an automation system that could simultaneously understand visual stimuli and generate an interpretation out of it. In case of textual interpretation, just by looking at the stream of indoor scene images for different rooms, it is impossible to find the exact number of bedrooms, bathrooms, etc. available in the house since these images could be randomly ordered. It is possible to have extra information about one part of the property and to miss out on details about the other. It is also difficult to identify the global relationship between the rooms and their possible arrangement. Hence, instead of directly taking indoor scene images as input, a representation is the next best input that contains all the information required. This representation could be information extracted from these images by keywords or a more precise line drawing/ map that connects all the information. In indoor space images, its floor plan image is the best representation that captures the intricate details and the heart of all the construction drawings. In this thesis, an automation model is proposed which understand indoor space images and generate interpretation in the form of textual description from a floor plan image. Analyzing the floor plan of a house or any other building is the research area covered under graphics analysis within the broad scope of document image analysis. Understanding every detail of the graphics involved in the floor plans and interpreting them in a form readable by the common user is not a trivial task. A floor plan image can compose of graphics such as symbols, various forms of lines (thin, medium, thick), text, and each of them may require different techniques for their understanding and analysis. Hence, the architectural floor plan image analysis aims to extract the structural details and understand the semantics involved. These extracted details should be converted into textual modalities to have a fair interpretation of the floor plan. There is a requirement of datasets for each specific task for the development of any machine learning algorithm. There are existing floor plan datasets in the public domain, which were constructed primarily for the task of symbol spotting, retrieval, and structural analysis in the floor plans. Examples include Systems Evaluation SYnthetic Documents (SESYD), Computer Vision Center- Floor Plan (CVC-FP), and Repository Of BuildIng plaNs (ROBIN). These datasets were necessary but not sufficient for the holistic understanding and interpretation of the floor plans in other modalities. Another dataset, namely Building plan Repository for Image Description Generation and Evaluation (BRIDGE), is proposed to fill those gaps. It is a large-scale floor plan dataset containing textual annotations corresponding to each image. This dataset was targeted for multiple tasks such as symbol spotting, caption generation, paragraph generation, and sufficient for a complete understanding and interpretation of the floor plan image. To build machine learning-based models for understanding floor plans, features such as Bag of Decors (BoD) and Local Oriented Feature Descriptor (LOFD) were proposed, which captured the features of rooms in a floor plan in the form of a sparse histogram. These features and the decor symbol spotting method generated attributes for a floor plan that were later used to create a grammar-based textual description. With the advent of advanced and mo e accurate artificial intelligent models for image understanding, it was the need of an hour to build such models for floor plan image understanding as well. Hence, in another attempt, experiments generate textual descriptions from floor plan images using just image cues by extracting features from the CNN-based model and using a hierarchical recurrent neural network-based model for generating paragraphs. To make these paragraphs more accurate and robust, another model was proposed which used a text layer in between image features and textual paragraphs, hence taking the image and word cues together to train a model. While generating an interpretation of the indoor space, keeping their applications in mind for real estate business and architectural solutions, there might be a case where a user might not have the floor plan of the building or property due to old structures or any modifications made in the past or rented property. In such cases, it might be required to used indoor space images directly instead of using a representation for generating an interpretation. Hence, in another part of the work, we proposed a system that could create a floor plan as a pictorial interpretation of the house, taking a stream of indoor space as input. The system uses a conventional mobile phone’s camera to capture images and data from IMU sensors of the phone to track the mobile device’s motion. The system uses captured RGB images and this data and generates a floor plan of the indoor space. The proposed approach has other potential applications, such as robot navigation and AR/VR applications. The generated description can also be used for door-to-door indoor navigation for a visually impaired person, robot. Using such a model a self-teaching system for engineering students can be developed, which could automatically generate interpretations for engineering drawings. Hence the proposed work in this thesis opens new avenues of research in engineering and architectural drawing connecting the two modalities of images and text.
Novel and Robust Methodologies for Image Security.
(Indian Institute of Technology Jodhpur, 2019-06) Bhatnagar, Gaurav
In the digital era, the substantial proliferation of open access network and communication technologies has influenced the way information is gathered and processed. This has lead to increase in/ increased sharing of multimedia data such as image, audio and video up to significant level & has become widespread practice between the end users. However, this convenience comes with some security and privacy concerns. The issues regarding illegal copying, distribution, duplication, malicious modification and forgery have increased due to the ease of accessing the sophisticated software. Therefore, protection of digital media is one of the major challenge to the information security in today's scenario. This motivates us to develop some standard and robust solution which enhance the overall security of the digital media by preventing these issues. One of the technical solutions is to make it law informants. This can be achieved by some practical solutions like image hashing, encryption and digital watermarking. Image hashing has been widely investigated in an attempt to solve the problems of image content authentication and content-based image retrieval. Moreover, perceptual hashing is advantageous in database search problem. The hashing technique helps to identify and examine the integrity of the data. Encryption is an efficient technique which transforms the original data into cipher data which can be decrypted at the later stage to reproduce the image data. This technique is effectively used to secure the confidentiality of data during the transmission and to protect the content during the storage of the data. Furthermore, digital watermarking is yet another efficient technique addressing the ownership issue for copyright protection. This technique helps to identify the original owner of the media and ensure the legitimate trustworthiness. The goal of this thesis is to investigate and analyse various aspects in the image security and explore the techniques for image security while ensuring the integrity, confidentiality and trustworthiness of the data. The flourishing image security technologies enable an information shift to the environment characterized by the constraints such as robustness, imperceptibility etc. The fundamental attribute of image security determines the trade-off between robustness and perceptual fidelity, robustness and discrimination. Our research is categorized mainly in the three parts: The first part aims to develop the comprehensive algorithmic framework for identification and authentication of the digital images. This framework provides the digital signature based on some appropriate hash functions to estimate the accurate image similarity. For this purpose, a secure perceptual hash function has been designed for content authentication, for which a new robust reference image hashing system has been proposed. The formal part includes a saliency based visual features detection whereas later one estimates structural features based on a reference image. In addition, a chaos based robust and secure image hashing technique has been developed. The second part of the research aims to protect the confidentiality of the image data to secure the content of the data using encryption technique. This framework provides the methodology to secure the medical image data during the transmission and storage. This can be achieved by designing a secure image encryption technique for content protection, for which a biometric inspired image encryption scheme has been proposed for medical images. This scheme presents a biometric based key generation to ensure the security. The next part of our research aims to tackle the ownership issue of the multimedia data using a number of copyright protection schemes using digital watermarking techniques. The research problem is addressed by designing a robust watermarking scheme for copyright protection, for which a robust watermarking system in integer DCT domain has been proposed. This scheme employs a DSR based phenomena to enhance the performance of the system. In addition, a simple watermarking algorithm based on lifting wavelet transform has been developed. This scheme includes the random number generation to signify the efficiency of the proposed scheme. Extensive experimental and comparative analysis have been conducted to validate the efficiency and performance of the proposed solutions for image security. Several conclusion can be drawn from the thesis. The research shows that data integrity, confidentiality and copyright protection are considered as the important value of image hashing, encryption and watermarking. Therefore, a combined framework of these techniques plays a vital role in image security. Moreover, a systematic development of the technologies can ensure the trustworthiness in a goal oriented approach such as the proposed framework for authentication while complying with relevant legislation. Finally, a number of insight used in the developing framework are provided. Also, future research direction are discussed.

Browse

Browsing Doctoral Theses by Supervisor "Bhatnagar, Gaurav"

Results Per Page

Sort Options