Doctoral Theses
Permanent URI for this collection
Browse
Browsing Doctoral Theses by Supervisor "Chattopadhyay, Chiranjoy"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Content Based Analysis and Retrieval of Architectural Floor Plans(Indian Institute of Technology Jodhpur, 2018-08) Chattopadhyay, ChiranjoyArchitects often refer to existing layouts, while designing new projects. This process aids in providing insight into how such similar architectural situations were solved in the past. By studying one or several previous reference projects the architect tries to derive a solution for the current problem. The manual look-up process for such similar projects through the layouts can be rather cumbersome. With several architectural projects archived in digital form, the research and development of fast automatic retrieval techniques in floor plans is the need of the hour. Floor plan analysis is a special case of document image understanding. It aims at extracting semantic and structural details of an architectural layout by analysis of the 2D image of the floor plan. Symbol spotting and retrieval in architectural layouts have been solved as individual problems in the past. Moreover, both image and sketch has been used as modalities for the symbol spotting task. Thus, upon performing a keen analysis related to the existing work in the area of architectural floor plans, it was concluded that retrieval in this particular domain is a challenging yet less researched area and has a variety of applications in today’s digital scenario. Specific requirements by buyers during property rent/sale can be met using a composite, automated framework that takes into account semantics, as well as the content inside a floor plan for retrieval. Existing floor plan datasets like Systems Evaluation SYnthetic Documents (SESYD) and Computer Vision Center- Floor Plan (CVC-FP) dataset are not fit for a retrieval task as they don’t suffice in number and variation in the floor plan samples for the task of floor plan retrieval. Keeping in mind the current status of floor plan analysis research, two publicly available benchmark datasets Repository Of BuildIng plaNs (ROBIN) and Sketched-Repository Of BuildIng plaNs (S-ROBIN) are proposed in this thesis that will aid research for the community in the area of floor plan analysis and retrieval. In this thesis, techniques to analyse architectural floor plans and extraction of different features to aid in content based retrieval in floor plans is proposed. In one such attempt, retrieval in floor plans under the query by example paradigm, based on similar overall layout designs as well as the arrangement of the room decor present in the layouts, is proposed. Here query is taken as a floor plan image. A room layout segmentation and adjacent room detection algorithm is presented to represent layouts as an undirected graph. The vertices of the graph represent the rooms, while the edges represent the connectivity between them. Also a novel graph spectral embedding feature is proposed to uniquely represent the layout of the architectural floor plan. This helps in effective and efficient matching of the room layouts. To match the semantic similarity between a pair of floor plans, a two stage matching technique is proposed and high retrieval accuracy is obtained. An interactive graphical user interface to aid users to select, analyse and retrieve similar floor plans is also proposed in this thesis. In another attempt a Convolutional Neural Network (CNN) framework to extract both low and high level semantic features is proposed for floor plan retrieval. Experiments were conducted on publicly available datasets as well as ROBIN. The key contributions in the proposed approach are, a novel deep learning framework to retrieve similar floor plan layouts from repository. Also, in this part of the work the effect of the individual deep convolutional neural network layers for floor plan retrieval task is analysed. Experiments have shown that the deep learning frameworks work very well when the target image itself has a lot of features. Examples include natural images, textual documents, etc. On the other hand for images, which are not that feature rich, the off-the-shelf CNNs are not that effective. Moreover, deep features mostly capture the global similarity in an image. It was envisaged that effective combination of domain specific features may give superior results. The effect of combining various extracted features in a weighted manner to aid in giving preference to a certain feature while retrieval is also proposed. A novel end-to-end framework for extracting high level semantic features like area and room-wise decor arrangement for the task of fine grained retrieval is proposed. Further, a technique to perform feature fusion to aggregate high-level semantic features extracted is also proposed. Weighted feature fusion helps in setting preferences to particular characteristics of the floor plan while retrieval and satisfying specific user demands. In an attempt to explore other query modes than query by example in the form of floor plan image, sketch based retrieval is proposed. Sketch based retrieval comes with its own set of challenges in terms of both representation and recognition. However, this mode of query can aid in better correspondence while capturing the user’s intent in a query. A composite network comprising of Cyclic Generative Adversarial Networks (Cycle-GAN) along with CNN is proposed to bridge the gap between sketch and image domains while retrieval. An improved approach using autencoders in conjunction with Cyclic Generative Adversarial Networks is proposed, which outperforms all other state-of-the-art techniques for sketch based floor plan retrieval by using an efficient domain mapping approach.Item Fine Grained Feature Representation Using Computer Vision Techniques for Understanding Indoor Space(Indian Institute of Technology Jodhpur, 2021-04) Chattopadhyay, Chiranjoy; Bhatnagar, GauravUnderstanding the indoor spaces from images, videos, or other visual information has become an essential task in the current scenario. With the rapid urbanization of cities and the common user’s quick requirements, most real estate businesses have become web-based. It has become essential to understand an indoor space from its visuals and provide users with an interpretation. These users can be looking for a house with their desired features, architect, or an interior designer, trying to understand the given building. A person looking for a home or office space requires a quick solution for his/her property based needs with desired features. Hence, it is essential for showcasing the properties online, with the proper display of their indoor and outdoor environment and detailed interpretation. An interpretation provided in natural language is the most powerful way to communicate the information in the indoor space’s visuals. Another form of interpretation could be a line drawing or map for much complex indoor space visuals. While searching any property, every potential customer looks up for its detailed images and description available about the number of bedrooms, bathrooms, details of kitchen, halls, balconies, and global relations present in them. In case of lack of a proper description for any house listing, users tend to skip buying or renting it. A real estate website or any rental website may have thousands of listings. To reduce manual efforts for understanding and to give a correct interpretation for these listings, it is required to device an automation system that could simultaneously understand visual stimuli and generate an interpretation out of it. In case of textual interpretation, just by looking at the stream of indoor scene images for different rooms, it is impossible to find the exact number of bedrooms, bathrooms, etc. available in the house since these images could be randomly ordered. It is possible to have extra information about one part of the property and to miss out on details about the other. It is also difficult to identify the global relationship between the rooms and their possible arrangement. Hence, instead of directly taking indoor scene images as input, a representation is the next best input that contains all the information required. This representation could be information extracted from these images by keywords or a more precise line drawing/ map that connects all the information. In indoor space images, its floor plan image is the best representation that captures the intricate details and the heart of all the construction drawings. In this thesis, an automation model is proposed which understand indoor space images and generate interpretation in the form of textual description from a floor plan image. Analyzing the floor plan of a house or any other building is the research area covered under graphics analysis within the broad scope of document image analysis. Understanding every detail of the graphics involved in the floor plans and interpreting them in a form readable by the common user is not a trivial task. A floor plan image can compose of graphics such as symbols, various forms of lines (thin, medium, thick), text, and each of them may require different techniques for their understanding and analysis. Hence, the architectural floor plan image analysis aims to extract the structural details and understand the semantics involved. These extracted details should be converted into textual modalities to have a fair interpretation of the floor plan. There is a requirement of datasets for each specific task for the development of any machine learning algorithm. There are existing floor plan datasets in the public domain, which were constructed primarily for the task of symbol spotting, retrieval, and structural analysis in the floor plans. Examples include Systems Evaluation SYnthetic Documents (SESYD), Computer Vision Center- Floor Plan (CVC-FP), and Repository Of BuildIng plaNs (ROBIN). These datasets were necessary but not sufficient for the holistic understanding and interpretation of the floor plans in other modalities. Another dataset, namely Building plan Repository for Image Description Generation and Evaluation (BRIDGE), is proposed to fill those gaps. It is a large-scale floor plan dataset containing textual annotations corresponding to each image. This dataset was targeted for multiple tasks such as symbol spotting, caption generation, paragraph generation, and sufficient for a complete understanding and interpretation of the floor plan image. To build machine learning-based models for understanding floor plans, features such as Bag of Decors (BoD) and Local Oriented Feature Descriptor (LOFD) were proposed, which captured the features of rooms in a floor plan in the form of a sparse histogram. These features and the decor symbol spotting method generated attributes for a floor plan that were later used to create a grammar-based textual description. With the advent of advanced and mo e accurate artificial intelligent models for image understanding, it was the need of an hour to build such models for floor plan image understanding as well. Hence, in another attempt, experiments generate textual descriptions from floor plan images using just image cues by extracting features from the CNN-based model and using a hierarchical recurrent neural network-based model for generating paragraphs. To make these paragraphs more accurate and robust, another model was proposed which used a text layer in between image features and textual paragraphs, hence taking the image and word cues together to train a model. While generating an interpretation of the indoor space, keeping their applications in mind for real estate business and architectural solutions, there might be a case where a user might not have the floor plan of the building or property due to old structures or any modifications made in the past or rented property. In such cases, it might be required to used indoor space images directly instead of using a representation for generating an interpretation. Hence, in another part of the work, we proposed a system that could create a floor plan as a pictorial interpretation of the house, taking a stream of indoor space as input. The system uses a conventional mobile phone’s camera to capture images and data from IMU sensors of the phone to track the mobile device’s motion. The system uses captured RGB images and this data and generates a floor plan of the indoor space. The proposed approach has other potential applications, such as robot navigation and AR/VR applications. The generated description can also be used for door-to-door indoor navigation for a visually impaired person, robot. Using such a model a self-teaching system for engineering students can be developed, which could automatically generate interpretations for engineering drawings. Hence the proposed work in this thesis opens new avenues of research in engineering and architectural drawing connecting the two modalities of images and text.