IIIT

MULTI AGENT GAME OF KABADDI: STRATEGIES AND SIMULATION

Author(s): Nagarjuna Samrat
Advisor(s): Kamalakar Karlapalem

Masters

May '15
Report no: IIIT/TH/2015/17
Center of Data Engineering

Abs PDF

Aligning Textual & Visual Data: Towards Scalable Multimedia Retrieval

Author(s): Pramod Sankar
Advisor(s): C V Jawahar

PhD

May '15
Report no: IIIT/TH/2015/19
Center of Visual Information Technology

Abs PDF

Time Frequency Analysis for Motion Magnification and Detection

Author(s): Sushma M
Advisor(s): Anubha Gupta , Jayanthi Sivaswamy

Masters
May '15
Report no: IIIT/TH/2015/15
Center of Visual Information Technology

Abs PDF

Time Frequency Analysis for Motion Magnification and Detection

Abstract
Motion can be defined as change in position of an object of interest with respect to time. This thesis explores the methods of analyzing motion using time frequency analysis. In this thesis, we address two problems: (i) Small Motion Magnification in Videos and (ii) Motion Detection in Perfusion Weighted Imaging (PWI). Human eye and its brain interface can visualize or detect the motion within a certain range of spatial and temporal frequencies. But in most of the cases, it might be possible that frequencies which are below this range also can have useful information. We can simplify this by saying that there can be small motions which are not visible to the naked eye. Even though these small motions are difficult to detect, they may contain useful information. In first part of thesis, we present a semi-automated method to magnify small motions in videos. This method amplifies invisible or hidden motions in videos. To achieve motion magnification, we process the spatial and temporal information obtained from the video itself. Advantage of this work is that it is application independent. Proposed technique estimates required parameters to get desirable results. We demonstrate performance on a few videos. Motion magnification performance is equivalent to existing manual methods. In second part of thesis, we present a novel automated method to detect motion in perfusion weighted images (PWI), which is a type of magnetic resonance imaging (MRI). In PWI, blood perfusion is measured by injecting an exogenous tracer called bolus into the blood flow of a patient and then tracking it in the brain. PWI requires a long data acquisition time to form a time series of volumes. Hence, motion occurs due to patient's unavoidable movements during a scan, which in turn results into motion corrupted data. There is a necessity of detection of these motion artifacts on captured data for correct disease diagnosis. In PWI, intensity profile gets disturbed due to occurrence of motion and/or bolus passage through the blood vessels. In this work, we propose an efficient time-frequency analysis based motion detection method. We show that proposed method is computationally inexpensive and fast. This method is evaluated on a DSC-MRI sequence with simulated motion of different degrees. We show that our approach detects motion in a few seconds.

Finding Best Answers for New Questions In QA Systems

Author(s): Vinay Pande
Advisor(s): Vasudeva Varma

Masters
May '15
Report no: IIIT/TH/2015/14
Center of Search and Information Extraction Lab

Abs PDF

Finding Best Answers for New Questions In QA Systems

Abstract
Community based question answering(cQA) sites such as Yahoo! Answers and Quora have emerged as popular and effective means of information seeking on the web. Anyone can obtain answers to their questions by posting them for other participants on these sites. Community can also decide best answer for a question. Users have already received answers for millions of questions submitted on popular question answering sites like Yahoo! Answers. However, not all questions get immediate answers from other users. if a question is not interesting enough for community, it may suffer from “starvation”. Such questions may take hours and sometimes days to get satisfactory answers. This delay in response can be avoided by searching similar questions in a very large archives of previously asked questions. Searching similar questions is different from the usual query based scenario where the database of potential documents is searched for the answer. In this case, the question is used to search the database of previous questions. If a similar question is found, then the corresponding best answer can be provided without any delay. Serving answers from archives is always faster than waiting for response from other participants. While automatically serving similar questions, cQA sites assume that archived questions always have unique best answers. However, with an in-depth analysis of questions and answers, we found that this assumption cannot be true. According to the analysis, at least 78% of the best answers are reusable when similar questions are asked again. But no more than 48% of them are indeed the unique best answers. Many times the best answer is context dependent and other answers can also be valid. Hence, it’s not appropriate to serve only best answer from the archives as it is not complete. A better approach would be to combine multiple valid answers into one to generate a well summarized complete answer to the question. In this thesis, we present a framework for answering new questions by processing the information from cQA sites. Our goal is to generate well summarized answers in response to a question posed in natural language. We also want to ensure that such answers be as trustful, complete, relevant and succinct as possible. We exploit the meta-data present on cQA sites to bias multi-document summarization towards higher quality information. We represent the data from cQA sites as a graph to utilize the implicit structure present in question-answers thread. We propose two approaches for finding best answers from cQA sites using multi-document summarization as a central theme. Experimental results on data drawn from Yahoo! Answers, Stackoverflow and Quora demonstrate the effectiveness of our approach over existing methods. Experiments also show that the best answers selected by community on cQA sites can be successfully complemented by the proposed method.

Query-by-Example Spoken Term Detection on Low Resource Languages

Author(s): Gautam Varma Mantena
Advisor(s): Kishore S Prahllad

PhD
May '15
Report no: IIIT/TH/2015/11
Center of Language Technologies Research Centre

Abs PDF

Query-by-Example Spoken Term Detection on Low Resource Languages

Abstract
The task of a query-by-example spoken term detection (QbE-STD) is to find a spoken query within a spoken audio database. A key aspect of QbE-STD is to enable searching in multi-lingual and multi-speaker audio data. A traditional QbE-STD approach is to convert spoken audio into a sequence of symbols using automatic speech recognition (ASR) and then perform text based search. ASR-based techniques assume the availability of labelled data for training the acoustic and language models. Such approaches are not scalable for languages where there is no availability or the resources to build an ASR. To overcome this limitation, zero prior knowledge is assumed about the language of the spoken audio, and thus template matching algorithms such as dynamic time warping (DTW) are exploited for QbE-STD. For QbE-STD, Gaussian posteriorgrams are a popular feature representation as they do not require labelled data. However, Gaussian posteriorgrams can only represent a limited acoustic information of the spoken audio and thus a limitation. An alternative feature representation to Gaussian posteriorgrams is phone posteriorgrams. To obtain phone posteriorgrams we require labelled data from rich resource languages to train the models. However, phone classes are not universal and thus do not perform well when there is a language mismatch. In this thesis, we address the issues of developing a DTW-based algorithm for QbE-STD search on low resource languages and deriving language independent features using rich resource languages for feature representation of speech. The contributions of this thesis are as follows: 1) We investigate the use of a DTW-based algorithm referred to as non-segmental DTW (NS- DTW), with a computational upper bound of O(mn) and analyze the performance of QbE- STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. 2) We introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ N, we show that the computational upper bound for FNS-DTW is O(mn/α^2) which is faster than NS-DTW. We also explore explore the use of graphical processing units (GPUs) to improve the speed of NS-DTW and its variants to enable searching in large databases in real time. 3) We investigate the use of language independent representations of speech such as articulatory class information for QbE-STD search. We use articulatory information and their derivatives such as bottle-neck (BN) features (also referred to as articulatory BN features) for QbE-STD. We obtain Gaussian posteriorgrams of articulatory BN features in tandem with the acoustic parameters to perform the search. 4) As an alternative to phone and articulatory classes of speech, we also explore acoustic classes of speech for QbE-STD. We define the acoustic classes as the classes in the space represented by the acoustic parameters of speech such as frequency domain linear prediction cepstral coefficients (FDLP). An advantage of using acoustic classes over articulatory or phone classes is that they can be obtained using unsupervised clustering algorithms. We use acoustic classes of speech to derive BN features and to build acoustic models for QbE-STD search. In this thesis, the techniques explored in developing DTW-based algorithms and use of feature representation derived from speech classes such as phone, articulatory and acoustic classes can be used to rapidly build QbE-STD search systems. These techniques will enable in building search systems for low resource and less spoken languages and thereby help in reducing the language barrier for information access.

Simultaneous Planning Localization and Mapping

Author(s): Piyush Shukla
Advisor(s): Madhava Krishna

Masters
May '15
Report no: IIIT/TH/2015/12
Center of Robotics

Abs PDF

Simultaneous Planning Localization and Mapping

Abstract
In this paper, we solve the traditional problem of autonomous Localization and Mapping by inter- leaving planning for exploring unknown environments by a mobile robot and Simultaneous Localization and Mapping (SLAM). We denote such planned SLAM systems as SPLAM (Simultaneous Planning Localization and Mapping). The main aim of SPLAM is to plan paths for the SLAM process such that the robot and map uncertainty upon execution of the path remains minimum and tractable. The planning is interleaved with SLAM and hence the terminology SPLAM. While typical SPLAM routines find paths when the robot traverses amidst known regions of the constructed map, herein we use the SPLAM formulation for an exploration like situation. Exploration is carried out through a frontier based approach where we identify multiple frontiers in the known map. Using Randomized Trajectory Generation techniques we calculate various possible trajectories to all the known frontiers. A practical SPLAM system must be fast and also be able to handle presence of dynamic objects in the environment. We’ve provided solutions to these two problems separately. For developing a fast planning system we introduce a novel strategy for selecting the trajectory which mimics Fast SLAM, selects a trajectory for robot motion that will minimize the map and robot state covariance. By using a Fast SLAM like approach for selecting frontiers we are able to decouple the robot and landmark covariance resulting in a faster selection of next best location, while maintaining the same kind of robustness of an EKF based SPLAM framework. We then compare our results with Shortest Path Algorithm and EKF based Planning. We show significant reduction in covariance when compared with shortest frontier first approach, while the uncertainties are comparable to EKF-SPLAM albeit at much faster planning times. For planning in Dynamic Environments while the typical planning methods for Dynamic Environments involve filtering the dynamic objects and then planning for the static environment thereafter obtained, we’ve developed a novel planning algorithm which is intelligently able to decide whether to keep or discard the dynamic objects. If we are able to make use of them to aide localization they are kept in the SLAM framework, but if found hindering the localization they are discarded. For this, along with selecting one of the trajectories generated through the Trajectory Generation system, we also comes up with an action policy. This action policy dictates what action, from a given set of actions, will the robot take at each node of the trajectory. Also for the process of SLAM which is intermediate to planning, we devise a EKF-SLAM based framework which is capable of incorporating Dynamic Objects as opposed to the traditional SLAM methods. We then compare our results with a method that unilaterally filters all moving objects, which we call as the dynamic object filtering method or simply filtering method in short. We show significant reduction in robot and map uncertainty when compared with the dynamic object filtering planning method, for both simulated and real environments.

FORWARD IMAGING USING SYNTHETIC APERTURE RADARS

Author(s): 0 Palash Jain
Advisor(s): K R Sarma , P R K Rao

Masters
May '15
Report no: IIIT/TH/2015/13
Center of Communications

Abs PDF

FORWARD IMAGING USING SYNTHETIC APERTURE RADARS

Abstract
"A typical microwave Synthetic Aperture Radar (SAR) is an aerial based system for high resolution imaging of terrain. In contrast to an optical imaging system, its operation is not limited to daytime and its performance is not adversely affected by severe weather conditions like fog, snow or clouds. A conventional Side-looking Synthetic Aperture Radar provides high azimuth resolution but in the side looking mode. In situations like aircraft landing or ground vehicle piloting, the radar has to image a forward swath. But the SAR in the forward-looking mode not only suffers from target ambiguities, but more importantly, it also cannot yield an azimuth resolution comparable to that of the side looking mode. In this thesis, we explore the possibility of an imaging system, which draws on the principle of functioning of an SAR and is able to image the terrain ahead of a vehicle. Three approaches are proposed and analyzed. The first approach, termed as Azimuth Triangulation Method, abbreviated as ATM, deploys a multiple antenna SAR to solve the problem of low azimuth resolution and high azimuth ambiguity in a forward-looking SAR (FSAR). As is to be expected, the performance improves with a larger number of antennas. In view of the complexity of the system based on the ATM approach, as the second approach, a SAR in a squint-mode is considered. For purposes of comparison, a SAR in the forward mode is also simulated. The simulations performed for the three approaches indicate while the range resolution is same in all the cases (approximately 0.3 m, under the stipulated conditions of simulations) the azimuth resolution varies considerably. The ATM approach achieved an azimuth resolution of approximately 12m, while the forward-mode SAR and the squint-mode SAR yielded an azimuth resolution of 18m and 1.5m respectively."

Broadcasting of Correlations & Secret Sharing in a Quantum World

Author(s): 0 Sourav Chatterjee
Advisor(s): Harjinder Singh , Indranil Chakrabarty

Masters
April '15
Report no: IIIT/TH/2015/10
Center of Computational Natural Sciences and Bioinformatics

Abs PDF

Broadcasting of Correlations & Secret Sharing in a Quantum World

Abstract
Quantum Information Processing (QIP) is a branch of science that investigates the use of quantum mechanics for communication and computation purposes. Quantum information has power to far exceed the capabilities of traditional “classical” information. It exploits fundamental quantum features like superposition, entanglement and non-locality in information processing. In this thesis, we report our work on two different aspects of QIP. The first lies in the domain of quantum cryptography, namely sequential quantum secret sharing, and second on the phenomena of broadcasting of correlations, which can be extensively used for decompression of quantum correlation. In QIP, entanglement proves to be an invaluable resource for carrying out various cryptographic protocols. Quantum secret sharing (QSS), an analogous quantum version of Shamir’s classical secret sharing scheme, is one such cryptographic protocol, where the sender shares a secret message between other parties in such a way that none of them can reveal the secret without the collaboration of others. We provide a (n,n)-threshold protocol for sequential secret sharing of quantum information and then optimize it under both an idealistic situation and a realistic situation, where we have a noisy environment.This scheme refers to a situation where the dealer is not having all the secrets, at the start of the protocol; however if the dealer wishes to share secrets at subsequent stages she/he can realize it with the help of our protocol. We also extend it to n > 3 parties. Thereafter, we consider a much more realistic situation where the sharing of qubits takes place through two kinds of noisy channels, namely the phase damping channel (PDC) and the amplitude damping channel (ADC). We observe that in the presence of noise, the fidelity of sequential secret sharing at the (k)th iteration is independent of the effect of noise at the (k − 1)th iteration. In case of ADC, we find that the average fidelity of secret sharing drops down to 1/2 which is equivalent to a random guess of the quantum secret. Finally, we employ weak measurements and its reversal to enhance the average fidelity. This increase can be achieved with a certain trade-off with the success probability of weak measurements. It often happens in quantum information processing that we are provided with an entangled pair and asked to create more by using quantum operations. This exigency arises from the fact that sometimes there is a higher priority on increasing the number of participants in information processing task than on the efficiency of it. In this work, we extensively study the problem of broadcasting of quantum correlations. This includes broadcasting of quantum entanglement as well as correlations that go beyond the notion of entanglement. It is quite well known from the “No-Broadcasting theorem" that perfect broadcasting of quantum correlation is not possible. However it does not rule out the possibility of partial broadcasting of correlations where we can get lesser correlated states from a given correlated state. In order to have a holistic view of broadcasting, we investigate this problem by starting with most general representation of two qubit mixed states in terms of the Bloch vectors. As a cloning transformation we have used universal symmetric optimal Buzek-Hillery (B-H) cloner both locally and nonlocally. More specifically, we obtain a set of ranges in terms of Bloch vectors for which broadcasting of entanglement will be possible. In addition to the idea of broadcasting of entanglement for general two qubit mixed states, we explore broadcasting of quantum correlations that go beyond entanglement with the help of both local and nonlocal cloners. Remarkably, we find that it is impossible to even partially broadcast such correlations by using quantum copying machines whether locally or globally. Taking two different types Buzek-Hillery quantum cloners (state dependent and state independent) we analytically prove the impossibility of even partial broadcasting of correlation. Lastly, we generalize this impossibility result for any symmetric or asymmetric cloning machines as well. This result brings out a fundamental difference between the correlation defined from the perspective of entanglement and the correlation measure which claims to go beyond entanglement. In a nutshell, our work presents some interesting phenomena in quantum information and cryptography. The work is directed to find new methodologies which will make the protocol for sharing of quantum secret more efficient, comprehensive and stable in the perspective of realistic scenarios. It is also aimed at exploring a way to efficiently decompress quantum correlations beyond entanglement and show how these techniques will allow the realization of information processing tasks between two previously uncorrelated nonlocal parties in the vicinity of another nonlocal correlated pair. Thus, our work highlights o

Evolution of Mental Models of Interactive Machines: A Formal Approach

Author(s): Himanshu Zade
Advisor(s): Venkatesh Choppella

Masters
April '15
Report no: IIIT/TH/2015/16
Center of Software Engineering Research Lab

Abs PDF

Evolution of Mental Models of Interactive Machines: A Formal Approach

Abstract
The rapid proliferation of technology has resulted in numerous new devices, which are accessible to end users. This challenges users to quickly become familiar with any new device that they encounter. While the user learns to use a new device, her understanding of it evolves gradually. We use the term user model to refer to a finite state machine (FSM) representing the user’s understanding of a machine.In contrast to the user model, we use the term target model to refer to the FSM representing the behavior of the machine, which the end user is trying to learn. In order to learn how to operate a machine correctly, there is a line of thought within Human Computer Interaction (HCI) that the user interface must communicate the underlying target model to the user in such a way that the user’s understanding of the machine is behaviorally indistinguishable from the target model. The user may attain such a model if her multiple user models progressively approach the target model, while interacting with the device. Relatively little attention has been paid to how user models evolve as a part of a user’s learning process, which is the focus of this thesis. In this thesis, we build on the past work on FSMs for representing mental models, and propose a metric for comparing the differences between FSMs. We apply this metric to represent the process of how a user learns to use a new device by tracing the evolution of user models towards a target model. We study user learning through a progressive comparison of the user model and target model over time by using bisimulation. (Bi)simulation relations allow behavioral comparisons between two models; they indicate whether the two models are behaviorally equivalent, or not. However during learning, user models may be incomplete, erroneous and contradictory, making it important to quantify the amount of learning. This translates to the problem of determining the proximity between the two models. (Bi)simulation relations by themselves do not provide such a metric to measure proximity as they only capture the notion of order, not measure. To quantify the gap between a user model and a target model, we introduce edit distance for measuring behavioral proximity between them. We propose an algorithm to compute edit distance between two models and employ the heuristic procedure on experimental data for computing edit distance between target and user models. The data is organised into two experiments depending on the device the user interacted with: (a) a simple string generator device and (b) a close to real-world vehicle transmission model. The results indicate that the edit distance modulo bisimulation measure of a user model and user learning as measured extensionally through participant responses to given tasks have a strong downhill linear relationship (correlation co- efficient = -0.76). This validates the proposed metric as edit distance converges with progressive user learning, increases for erroneous learning, and remains unchanged indicating no learning. Thus, the work demonstrates that user learning can be witnessed, captured, and measured formally, allowing for a better understanding of how users learn to use an unfamiliar device. The proposed representational technique provides an intensional description of the process involved in learning a new device. The experimental studies indicated that our proposed edit distance metric allows us to examine several questions of interest to the HCI community about the learning process. For example, deciding if the two user models are behaviorally close, or do successive user models for all users converge at some instance, etc. The key here is not to show a mere convergence of the user models, but to capture and represent the process of their evolution to show that the convergence (or divergence) follows a user’s ability to learn the system. Such a representation allows designers to identify the problem areas in a user interface by differentiating instances when a user model is improving from instances where it is not.

Mu-Rhythm Template Matching Classifier for One-Dimensional Cursor Control

Author(s): Divya Kodali
Advisor(s): Bipin Indurkhya

Masters
April '15
Report no: IIIT/TH/2015/8
Center of Cognitive Science

Abs PDF

Mu-Rhythm Template Matching Classifier for One-Dimensional Cursor Control

Abstract
Recent advances in computer hardware and signal processing have made possible the use of EEG signals or ―brain waves for communication between humans and computers. Locked-in patients have now a way to communicate with the outside world, but even with the last modern techniques, such systems still suffer communication rates on the order of 2-3 tasks/minute. In addition, existing systems are not likely to be designed with flexibility in mind, leading to slow systems that are difficult to improve. A brain-computer interface (BCI) is a system that provides an alternate non muscular communication/control channel for individuals with severe neuromuscular disabilities. With proper training, individuals can learn to modulate the amplitude of specific electroencephalographic (EEG) components (e.g., the 8–12 Hz μ rhythm and 18-26 Hz β rhythm) over the sensorimotor cortex and use them to control a cursor on a computer screen. Conventional spectral techniques for monitoring the continuous amplitude fluctuations fail to capture essential amplitude/phase relationships of the μ and β rhythms in a compact Fashion and, therefore, are suboptimal. By extracting the characteristic rhythm for a user, the exact morphology can be characterized and exploited as a matched filter. A simple, parameterized model for the characteristic rhythm is proposed and its effectiveness as a matched filter is examined offline for a one- dimensional cursor control. Although the main objective of this research idea is to build an assistive device for ALS patients, rather than working with ALS patients directly the work done here limits itself to running experiments on healthy subjects and building a template based on one these test subjects and comparing the template with other subjects.