Analysis of E. coli Promoters Using support Vector Machine

Taneja, Jasneet singh

Analysis of E. coli Promoters Using support Vector Machine

Files

156.pdf (852.92 KB)

Date

2007-03-08T07:17:15Z

Authors

Taneja, Jasneet singh

Supervisors

Salaria, R. S.

Abstract

iv Support Vector Machines (SVM) is a family of learning algorithms which is currently considered as one of the most efficient methods in many real-world applications. The theory behind SVM was developed in the sixties and seventies by Vapnik and Chervonenkis, but the first practical implementation of SVM was only published in the early nineties. Since then the method gained more and more attention among the machine learning community, thanks to its ability to outperform most other learning algorithms (including neural networks) in many applications. As a result it has been successfully applied to all sorts of classification issues, ranging from handwritten character recognition to speaker identification or face detection in images. SVM have been applied to many biological issues, including gene expression data analysis or protein classification. Some claim that biological data mining applications are one of the most promising uses of SVM, particularly for the high dimensionality of the data. As a result, research about SVM and computational biology is the object of much effort today, mainly due to researchers coming from the machine learning community. One can expect SVM to become a standard tool for bioinformaticians in the near future. Recently, the prediction of promoters has attracted many researchers’ attention. Unfortunately most previous prediction algorithms did not provide high enough sensitivity and specificity. This is where SVM clearly stands out of the crowd. Our main idea is to use computer power to calculate all possible patterns which are the possible features of promoters (training of SVM). Once this is done, it will be capable enough to determine whether a testing sequence is a promoter or not. In most of the practical applications, SVM consists of a kernel function that maps the data into a high dimensional feature space, which we will study in sufficient detail in this report. There are many types of kernels being used, viz. linear, polynomial, additive, splines, gaussian, etc. In this thesis, the intention is to understand the basics and working of SVM, and to explore the performance of different kernels for promoter recognition problem. By the experimental results, radial basis function (Gaussian kernel) proves to be way ahead of others in promoter prediction problem.

Keywords

Support Vector Machines, Promoters

URI

http://hdl.handle.net/123456789/156

Collections

Masters Theses@CSED

Full item page

Analysis of E. coli Promoters Using support Vector Machine

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By