Abstract
In 1957, Noam Chomsky’s revolutionary treatise, Syntactic Structures [45], established the foundations for a formal study of natural language syntax. Such a study requires the development of a mathematical paradigm where the grammatical structure of the language can be analyzed sans the semantic conditions responsible for the “possibility” or truth-value of the sentence, hence colourless green ideas sleep furiously. The work of Chomskyian linguistics in differentiating shallow and deep structures perpetuated the ideas of a rigid motif of structural integrity to language, which was often challenged by
semanticists. However, it established the path of the study of natural language by the medium of syntax,
therefore impressing upon the need to understand the sentence structure as a method of analyzing the
language in a typologically uniform and relevant manner.
In this thesis, we study the computational and representational offshoot of syntactic structures. The
development of formal grammars allows mathematicians to leverage simply typed lambda calculus into
the study of constituency grammar. This subsection of mathematical linguistics, known as categorial
grammar, aims to study the functions responsible for combining constituents in a grammatical manner,
viz with a view towards composition and compositional semantics. Theoretical work in this direction
was motivated by theories such as AB-grammars [4, 10], Lambek calculus [76], Combinatory Categorial
Grammar [124], Tree Adjoining Grammar [65], and Abstract Categorial Grammars [53].
Hindi has been largely underrepresented in these formal analyses of grammar, partly due to lack of
exposure and partly due to the underlying strongly motivated syntactic analysis of the Paninian Grammatical Framework [17]. Therefore in this thesis, I aim to use a simplification of the Lambek calculus
framework, known as pregroup grammar [78] in order to understand the syntactic phenomena underlying the language, when considered devoid of semantic considerations. Pregroup grammars are a simple
categorial grammars which have been used to study languages like Japanese, Sanskrit and Persian, along
with English, French, Hungarian and others, ensuring a diverse body of literature and surrounding operators and operations for representing constituency information.
In this thesis, I study the basic syntactic features of the Hindi language in terms of the word order, noun phrase characteristics, verb phrase characteristics, and formalize the agreement rules between them. I explore the representation of these properties in a consistent manner without relying on lexical semantic characteristics as much as possible. I then delve into the interoperability of relatively free word order syntactic constituent representations, their exploration in dependency syntax, and some possible applications of an advancement in constituency representation with the development of a hybrid grammaticality checker