Analysis of E. coli Promoters Using support Vector Machine
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
iv
Support Vector Machines (SVM) is a family of learning algorithms which is
currently considered as one of the most efficient methods in many real-world
applications. The theory behind SVM was developed in the sixties and seventies
by Vapnik and Chervonenkis, but the first practical implementation of SVM was
only published in the early nineties. Since then the method gained more and more
attention among the machine learning community, thanks to its ability to
outperform most other learning algorithms (including neural networks) in many
applications. As a result it has been successfully applied to all sorts of
classification issues, ranging from handwritten character recognition to speaker
identification or face detection in images.
SVM have been applied to many biological issues, including gene expression data
analysis or protein classification. Some claim that biological data mining
applications are one of the most promising uses of SVM, particularly for the high
dimensionality of the data. As a result, research about SVM and computational
biology is the object of much effort today, mainly due to researchers coming from
the machine learning community. One can expect SVM to become a standard tool
for bioinformaticians in the near future. Recently, the prediction of promoters has
attracted many researchers’ attention. Unfortunately most previous prediction
algorithms did not provide high enough sensitivity and specificity. This is where
SVM clearly stands out of the crowd.
Our main idea is to use computer power to calculate all possible patterns which are
the possible features of promoters (training of SVM). Once this is done, it will be
capable enough to determine whether a testing sequence is a promoter or not. In
most of the practical applications, SVM consists of a kernel function that maps the
data into a high dimensional feature space, which we will study in sufficient detail
in this report. There are many types of kernels being used, viz. linear, polynomial,
additive, splines, gaussian, etc. In this thesis, the intention is to understand the
basics and working of SVM, and to explore the performance of different kernels
for promoter recognition problem. By the experimental results, radial basis
function (Gaussian kernel) proves to be way ahead of others in promoter prediction
problem.
