Communities in IITJ TR
Select a community to browse its collections.
Recent Submissions
Role of Scene text in Image Semantics
(Indian Institute of Technology Jodhpur, 2022-08) Harit, Gaurav
Since the advent of the printing press, text has slowly made inroads into the world we have
built for us. The symbolic nature of text allows it to explain ideas more succinctly. Thus scene
text content is often naturally occurring in images ( street or storefront images). Further, they are
also embedded into images to drive home clear takeaway points (e.g., printed posters, advertisement
images). In both cases, though, they bring in crucial contextual information that aids in interpreting
such images. However, despite this pervasion of scene text in our everyday images [Dey et al., 2021]
and the rich information source they entail, early works in visual understanding tasks like Image
Classification, Captioning, and Visual Question Answering (VQA) [Antol et al., 2015] did not
leverage the scene text content of images. This can be attributed to the challenges of detecting
and recognizing scene text in the wild. However, maturing research in Scene text recognition has
improved their ability to read the text in natural images, thus making the scene text content
more accessible. This easy accessibility of scene text content, coupled with the recent advances in
multimodal architecturesHu et al. [2020], provides a unique opportunity to incorporate scene text
into visual understanding tasks.
As our first point of the investigation [Dey et al., 2021], we propose to jointly use scene text
and visual channels for robust semantic interpretation of images. We not only extract and encode
visual and scene text cues but also model their interplay to generate a contextual encoding with rich
semantics. The contextual encoding thus generated is applied to retrieval and classification tasks
on multimedia images with scene text content, to demonstrate its effectiveness. In the retrieval
framework, we augment the contextual semantic representation with scene text cues to mitigate
vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant
or erroneous scene text recognition, we apply query-based attention to the text channel. We show
that our multi-channel approach, involving contextual semantics and scene text, improves upon
the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by
8.9% in the relevant statement retrieval task and by 5% in the topic classification task. Our results
confirm our initial hypothesis that scene text plays an essential role in the semantic understanding
of images. These results encourage us to extend our framework to more challenging tasks, like
Text-VQA Singh et al. [2019a], that explicitly require us to read and reason with the scene text of
an image. However, the scene text words come from a long-tailed distribution, giving such tasks
zero-shot characteristics. We hypothesize that the zero-shot nature of these tasks can benefit from
leveraging external knowledge corresponding to the scene text.
The open-ended question answering task of Text-VQA often requires reading and reasoning
about rarely seen or completely unseen scene text content of an image. We address this zero-shot
nature of the task by proposing the generalized use of external knowledge to augment our
understanding of the scene text. We design a framework Dey et al. [2022] to extract, validate, and
reason with knowledge using a standard multimodal transformer for vision language understanding
tasks. Through empirical evidence and qualitative results, we demonstrate how external knowledge
can highlight instance-only cues and thus help deal with training data bias, improve answer entity
type correctness, and detect multi-word named entities. We generate results comparable to the
state-of-the-art on three publicly available datasets under the constraints of similar upstream
Optical Character Recognition (OCR) systems and training data. Through our experiments, we
observe that this external knowledge not only provides invaluable information about unseen scene
text elements but also augments the understanding of the text in general with detailed verbose
descriptions. Our knowledge-enabled model is robust to novel text, predicts answers with improved
entity type correctness, and can even recognize multi-word entities. However, the knowledge pipeline is susceptible to erroneous OCR tokens, which can lead to false positives or complete
misses. This also explains how our performance on the datasets is correlated with the particular
OCR systems used.
Our investigation highlights the challenges and benefits of incorporating scene text into
image understanding tasks. We validate our various hypotheses through empirical evidence across
five different publicly available standard datasets. We conclude with a discussion on the implicit
bias in these datasets for scene text, and propose data augmentation and a novel training scheme
to deal with it.
Education for the Embodied Human: An Enquiry into Human Nature and Education
(Indian Institute of Technology Jodhpur, 2023-02) Hari Narayanan V
The present work is a philosophical inquiry concerning the interconnection between theories of human nature and education. It is often argued that the current mainstream education has an underlying dualistic assumption that the mind-body and the human-world are distinct and, therefore, do not value embodied forms of knowing. This dualistic notion creates a rift between learners and their environments, substantially impacting their learning. It also results in an exam-oriented and achievement-based education system that is not conducive to developing children's critical thinking and an exploratory mindset. One major factor that gives rise to this condition is an inadequate understanding of human nature. It is evident that any educational activity presupposes one or another conception of human nature because a particular philosophy of human nature shapes and influences a particular philosophy of education.
Therefore, understanding human nature is of paramount importance for designing or transforming education. An examination of several theories of human nature reveals that they are mostly the result of philosophical speculation and have underlying dualistic assumptions. However, recent empirical studies in cognitive science suggest that human beings are fundamentally embodied and embedded in the world. Being embodied means the mind is not separate from the body and the world, but it is dynamically coupled with both. Now, if human beings are fundamentally embodied, then education should also acknowledge this fact in its practices. But even though there is increasing evidence available in support of embodiment, we do not sufficiently appreciate it in our day-to-day lives or in educational discourse. This is the result of various psychological, neurological, and socio-factors.
To address this problem, a two-way approach is presented, which is termed as the outer and inner curriculum. The outer curriculum employs an "outside-in" approach, in which pedagogies are designed and imparted as per embodied principles. However, only changing external pedagogies will not help much without realizing our own embodied nature. For this purpose, an inner curriculum is required to make changes "inside out". The inner curriculum fundamentally helps us to realize our own embodied nature, which has got significant salutary effects. Therefore, the inner curriculum is seen as complementary to the outer curriculum. The core content of the inner curriculum is mindfulness meditative practices, which help us to become self-aware of our thoughts and sensations, which in turn helps to embrace our own embodied nature and inextricable relationship with the world. This kind of embodied approach to education, having a focus on both outer and inner curricula, helps to create a more democratic, collaborative, and holistic learning environment, thereby fulfilling the vision of educational thinkers such as John Dewey, Paulo Freire, and Jiddu Krishnamurti.
From primary microcephaly-associated CPAP as centriole size/number regulator to microtubulestargeting novel chemotype
(Indian Institute of Technology Jodhpur, 2023-07) Singh, Priyanka
Centrioles are cylindrical microtubule-based structures, embedded in a proteinaceous matrix called pericentriolar material (PCM). Together, this structure is referred as the centrosome. In animal cells, the centrosomes play a crucial role in cellular functions such as cell division and motility. Abnormalities in centrosome-associated proteins have been linked to human diseases, including cancer and neurodevelopmental disorders. Mutations in the core centriole protein, Centrosomal P4.1-associated protein (CPAP), have been associated with primary microcephaly (MCPH6), a disorder characterized by reduced brain size and cognitive disability. This study focuses on understanding the impact of CPAP mutations on centrosome and spindle organization in primary microcephaly. Specifically, it investigates the effects of two MCPH-associated mutations, E1235V and D1196N, in the CPAP G-box domain. The study reveals that E1235V causes increased centriole length, while D1196N leads to an increase in centriole number. Interestingly, E1235V does not localize at the centriole, whereas D1196N maintains its centriolar localization despite reduced interaction with the upstream centriole protein, STIL, similar to E1235V. This suggests the involvement of an alternate route involving the proximal parent centriole protein, CEP152. Moreover, we demonstrate that centriole abnormalities result in multipolar spindle formation and decreased cell viability. These findings shed light on the importance of regions within CPAP outside the direct microtubule-interacting domains in influencing centriole organization, providing valuable insights into the molecular mechanisms underlying primary microcephaly.
The second part of the thesis work explores the development of novel chemical scaffolds for chemotherapeutics. The cell division machinery, comprised of centrosomes and microtubules, is crucially regulated during the cell cycle. Dysregulation of these structures can lead to human diseases, including cancer. Paclitaxel, a microtubule-targeting anticancer drug, has clinical approval but faces challenges due to the development of resistance in many cancer types. Hence, there is a need to identify new chemical scaffolds for designing effective anticancer drugs. In this study, a novel S-aryl dithiocarbamate chemical scaffold is identified as a potent anticancer compound with promising pharmacophore properties. The lead compound exhibits an I*C_{50} of <0.5 µM in lung and cervical cancer cells. It stabilizes microtubules, resulting in p53-p21-dependent cell cycle arrest in the G_{2} / M stage and cellular apoptosis. Interestingly, the lead compound shows comparable docking parameters to paclitaxel in the taxol-binding pocket of ẞ-tubulin. These findings present a promising alternative scaffold that can be further modified to enhance efficacy and potency as an anticancer drug.
Study of Organic and Quantum Dots-Based Resistive Memory and Synaptic Devices
(Indian Institute of Technology Jodhpur, 2023-09) Sahu, Satyajit
The data storage requirement in the digital world is increasing day by day with the
advancement of the Internet of Things (IoT). The current generation of silicon-based memory
technology is facing serious problems in terms of performance, data storage density, power
consumption, data processing time, cost-effectiveness, and so on. Conventional flash memory
is one type of non-volatile memory that relay on tunneling through the oxide layer, consumes
high power, and response time is high. The hard disk drive (HDD) is a widely used memory
in the current era, but the main disadvantage is finding the particular magnetic domain where
the data is saved, which leads the memory response time to a few milliseconds. The limitations
of conventional memories can be tackled by next-generation memories. Ferroelectric random
access memory (FERAM), Phase change memory (PCM), Magnetic random access memory
(MRAM) and Resistive random access memory (RRAM) are considered as next-generation
memory and have the potential to solve the problem. Among the other memories, non-volatile
RRAM is an option that provides high-density and low power data storage capabilities. The
information is stored in terms of resistance, where the high resistance state (HRS) is 0 bit, and
the low resistance state (LRS) is 1 bit. The computers are based on von Neumann architecture,
where processor units are separated from the memory unit and connected via a data bus. This
causes a delay in response time and cannot go beyond a certain size limit, called von Neumann
bottleneck. The current resistive memory has the capability to solve the von Neumann
bottleneck, where the memory can simultaneously process and store data similar to the
biological brain.
The small molecule-based RRAM device is a point of interest because of its capability to be
used in high-density data storage devices. A small organic molecule 5-Mercapto-1-methyl
tetrazole (MMT) has been used with a polymer poly (4-vinyl pyridine) (PVP) matrix for the
active layer of RRAM device. The MMT molecule with a different weight ratio in PVP was
studied for RRAM application which reveals the invariant RRAM property. The maximum
on?off current ratio for all the devices is 105, suggesting that the MMT molecule does not show
any change in its characteristic properties when surrounded by an insulating material. When
the device was fabricated without the polymer matrix, the surface morphology of the device
completely changed as it was filled with large holes. These holes provide short-circuited
pathways for the current by forming the direct metal contact between the top and bottom
electrodes.
Size miniaturization of the electronic device can be done using organic small molecules as
well as inorganic nanoparticle QDs. The synthesis of CdS QDs and the study of RRAM
properties has been studied. Al/CdS+PVP/ITO like MIM structured device was fabricated
which shows extremely good switching properties. The data retention capability of 60000
seconds and 300 endurance cycles were studied. The charge-trapping mechanism is associated
with the RS property.
With the development of artificial intelligence and ultra-high speed computing the solution
of von Neumann bottleneck is needed. In this regards, resistive memory can provide solution
as RRAM has the capability to act as biological synapse that can store, process and transfer
data. This attracts researchers to study resistive memory devices. Here fabrication of a small
organic molecule Trimesic acid (TMA) and PVP composite-based resistive memory device. It
shows excellent resistive switching with a high on-off ratio, excellent stability and data
storage capability. Pulse transient measurements on the device demonstrated the capability
of neuromorphic computation. The gradual set and reset process and change of conductance
with an applied pulse confirmed the neuromorphic application. Paired pulse facilitation
shows that the device can behave like the human brain. The redox active molecule and its
change in conformation are the reasons for the switching behaviour of the device th led to
the neuromorphic application of the device. For an important application like RRAM, it is crucial to understand the mechanism in
nanoscale and control the Resistive switching (RS) by various means. Different models have
been proposed to explain the RS behaviour of the material. First, the electrode effect on the
switching, which includes contact type, and charge trapping/de-trapping near the electrodematerial
interface. The other mechanisms are conducting filament formation, electrochemical
metallization, ionic diffusion, and oxidation-reduction of the materials. So, there are many
disagreements on the proposed models of RS, and it requires understanding using different
experimental techniques. STM is one of the best tools for understanding the surface property,
as well as studying the local density of states (LDOS) of the material. So, using the scanning
tunnelling microscopy (STM) technique to understand the RS in materials in the nanoscale
range would be very helpful. The RS properties and capabilities of neuromorphic computing
of single AgInS2 quantum dot with the help of STM and scanning tunnelling spectroscopy
(STS) have been studied in this chapter. The bandgap of the material and its temperature
dependency has been studied and it suggests a nonlinear and linear variation at lower and
higher temperature than the Debye temperature respectively. The STS shows the change of
conducting states after applying localized pulses. The devices made from the quantum dots
replicate these properties as well. The neuromorphic application of the device was tested by
using the pulse transient measurement that mimics the learning and forgetting of information
through the gradual set and reset process. The localized ionic transport is involved in the RS
mechanism.
Development of metal oxide based formaldehyde (HCHO) sensors using laser ablated nanoparticles
(Indian Institute of Technology Jodhpur, 2023-07) Kumar, Mahesh; Singh, Jitendra
In this thesis, first time, we reported a state-of the art method for the synthesis of metal oxide nanoparticles in atmospheric air using laser ablation techniques for the rapid prototyping of formaldehyde (HCHO) gas sensor. Formaldehyde gas is most common indoor air pollutants which causes various adverse human health problems if its limit goes above 0.75 ppm as per OSHA (Occupational Safety and Health Administration (OSHA), USA) guidelines. It is also noticed that some dishonest fish merchants are using formalin solution (formaldehyde gas dissolved in water) to preserve freshly caught fish during their transportation to the fish selling market to prevent the spoilage. So, various health issues have been occurred due to the ingestion of formalin contaminated fish. Thus, we need a miniature, low cost, ultra-low sensitive formaldehyde sensor for the development of smart or IoT enabled portable system for the measurement of formaldehyde gas level in indoor area and as well, their presence in fish. Micro hotplate is an essential part of any metal oxide based gas sensors. Hence, in the first part of the study, co-planner Au microheater based gas sensor platform was fabricated by laser micropatterning using a 355 nm Q-switched solid state laser source. The heat distribution profile of the fabricated micro hotplate was observed via IR thermal imaging camera. Furthermore, the long-term reliability, power versus operating temperature of the microhotplate were systematically studied. In addition with this, heat distribuiton profile of a Nichrome heater based gas sensor platform having a hollow alumina tube on which two gold electrodes had been printed at each end, was also investigated. The next set of analysis, the formaldehyde gas sensing performance was studied using pristine SnO2 and ZnO metal oxide materials. Formaldehyde gas sensing behaviour was studied by depositing thin film of SnO2 layer onto the external surface of alumina tube based gas sensor platform. The sputtered deposited SnO2 thin film sensor exhibited gas response of 1.2 towards 1 ppm of formaldehyde vapor with a response time of ? 32 s and a recovery time of ?72 s at 300°C. Next, to explore the formaldehyde sensing capabilities of nanoparticles, we have synthesized pristine ZnO nanoparticles by scanning a high power laser beam on the top surface of ZnO pellet in open air atmosphere and the laser-ablated ZnO NPs were directly deposited onto the alumina tube based gas sensor platform. The gas-sensing properties of the ZnO NPs has been carefully investigated in the presence of formaldehyde gas molecules. ZnO NPs-based sensor exhibited the response of about 1.8 towards 50 ppm formaldehyde gas at 350°C with response time 25 s and recovery time 12 s. To further enhance the sensitivity and selectivity towards formaldehyde gas, we have fabricated n-ZnO/n-SnO2 n-n heterojunction by combined processes of physical vapor deposition (PVD) by sputtering SnO2 thin film on the alumina tube based gas sensor platform and decorated it with ZnO nanoparticles. After decoratif laser ablated ZnO nanoparticles on thin film SnO2 sensor, it exhibited high response of 20 towards 50 ppm of formaldehyde with quick response (4 s) and recovery time (30 s) at lower operating temperature (250°C) compared to that of pure SnO2. After obtaining good results from the previous investigations with heterojunction, the present research has further been extended. The p-type NiO NPs were
synthesized in atmospheric air by laser ablation of cylindrical shaped solid Ni pellet. We have fabricated p-NiO/n-SnO2 p-n heterojunction via decoration of laser ablated NiO nanoparticles over sputtered deposited n-type SnO2 thin film. We have explored the formaldehyde sensing behaviour of NiO/SnO2 sensor and compared with pristine SnO2 senso. The NiO/SnO2 sensor exhibited higher response of about 29.8 towards 50 ppm formaldehyde with fast response and recovery time (3 s and 90 s) at lower operating temperature (about 210°C) with good selectivity. In the last part of the thesis, enhanced formaldehyde sensing mecsm of ZnO/SnO2 and NiO/SnO2 sensors has been described. From the experimental gas sensing performance data of NiO/SnO2 sensors, we have also extracted the various gas sensing parameters such as response time (??res), recovery time (??rec), surface coverage (??), adsorption (Ka) and desorption rate constant (Kd) using Langmuir gas adsorption-desorption model via curve fitting method.