The main assignment in this course is presentation of original research. Examples of projects undertaken by groups of students in the Spring of 2001 are given in the abstracts and reports below. The information is given for the benefit of students wishing to take the course in the future. Please do not cite the reports as literature, and contact the students directly for more information and pointers to publications resulting from this work.
Class-Based Word Clustering Models Using Kernel Machines
The goal is to estimate class-based word clustering models using kernel machines. Our initial approach is to verify if support vector machines can be integrated into decision tree methods for classification of words, and in particular whether support vector machines can be used in determining the best cuts at each decision point. Prior work (Bennet, K.P. and Blue, J.A. "A support vector machine approach to decision trees", IJCNN IEEE 1998) demonstrated that this could be done for preclassified data--- which of course is unavailable for words. Instead we must extend this approach to be analogous to entropy reduction techniques that are commonly used for decision trees. Further work explores if the number of classes built by decision trees can be parameterized and properly estimated using notions from structural risk minimization. This is a large open problem (Jelinek, F. "Statistical Methods for Speech Recognition", Cambridge, MA: MIT Press, 1997, pp. 169-170).
Kernel Methods for Named-Entity Recognition
The goal of this project is to apply kernel methods to named-entity recognition using a bootstraping algorithm currently being developed. The algorithm is based on iterative learning and re-estimation of morphological and contextual patterns captured in hierarchically smoothed trees. The training data consists of short seed lists. Therefore, we do not have a vectorial space and an initial set of labeled exemplars in that space, rather we have to build it from scratch by using a two-pass procedure on a large corpus of un-annotated text. We apply the bootstrapping algorithm to obtain distributional information from the seed data and the corpus, then we associate the known labels for the distributions of the seed words and we build a classifier starting from these distributions.
Document Semantics Dimensionality Reduction
Recent work has augmented n-gram language models with other information sources such as longer distance syntactic, and semantic constraints. LSA is a model of word semantic similarity based on word co-occurrence tendencies, and has been successful in IR and NLP applications, such as spelling correction. The basic idea of LSA is to represent each document and word in a lower dimensition space by large sparse maxtrix singular value decomposition. By the cosine distance between document vectors, between word vectors, or between document and word vectors, we can derive the semantic similarity of documents, words, and predict the next possible word given current document history. The focus of this project is to investigate the use of kernels in the LSA framework to make more effective use of the lower dimension representation of docoments and words in information retrieval and speech recognition.
Dynamic Time-Warping Kernels for Sequence Recognition
Support vector machines make use of a kernel to compute inner products in a higher dimensional space. Vectors in data space are usually restricted to have same dimension. In this project we try to alleviate this limitation by introducing time warped kernels for discriminating between input patterns of different length. This is particularly useful in speech/phoneme recognition where the duration of the phoneme varies with different utterances. We will address the problem from two perspectives: using a conventional kernel representation but introducing time warping inherently in the kernel definition; and by modeling of trajectories, comparing input patterns with different trajectories and and integrating scores to arrive at kernel values. The difficulty of both approaches is in ensuring a kernel that satisfies Mercer conditions.
Kernels for Splice Site Prediction
This project addresses the problem of splice site prediction which is a crucial step in gene finding in eukaryotic organisms. A splice site is either an acceptor or donor. Previous efforts to solve splice prediction problem have focused on neural networks, Markov models and rule-based systems. We aim to outperform the current approaches using support vector machines trained to classify simple representations for sequence windows into acceptor/non-acceptor or donor/non-donor sites. Data for this project is publicly available.
Probabilistic Assignment of Parameters to "Gestemes" Microsurgical
This project departs from previous deterministic assignments of parameter specifications for sub-states of microsurgical tasks. Gesture recognition with HMMs has been previously explored by other groups. With use of clustering on test sets (visually annotated footage of microsurgery synchronised with inertial and/or force information from an instrumented tool), the back-end HMM is expected to give improved present state results (for interface feedback, task augmentation) and/or superior segmentation of the task into its constituent "gestemes" (i.e., small gestures).
This project explores the use of kernel methods in reinforcement learning, and Q-learning in particular, for robotics applications. The goal is to make a walking robot (biped) learn to step over obstacles. There are a finite number of agent actions (leg motions) corresponding to states encoding robot spatial location. Sensory information includes four leg sensors (two for each leg) and distance to the obstacle (from a stereo camera). Punishment or reward is imparted when the agent reaches the obstacle. Kernels are used to provide a continuous topological representation of states and state-action pairs in Q-learning.
Kernel Discriminant Analysis
This project investigates kernel extensions on LDA (linear or Fisher discriminant analysis) for nonlinear dimensionality reduction. Kernel Fisher discriminants have been previously introducted in the literature, and the contribution in this project is to investigate sparse formulations of kernel discriminant analysis (KDA), and to apply KDA to problems of interest to speech recognition.
Any linear algorithm on a vector space expressed only in terms of dot products can be transformed into a general nonlinear version using the "kernel trick." A good example is a kernel nonlinear extension to PCA. Like PCA, linear ICA decorrelates mixtures of sources, but also separates them into independent components. We propose to use kernel methods to extend ICA to nonlinear transformations (although ICA is already based on nonlinearities for deriving higher order moments). Nonlinear ICA can offer, using overcomplete representation in feature space, a better solution to the problem of separating more sources than sensors, as is often encountered in practice when the number of sources is not known in advance.