Efficient Pattern Matching with Flexible Wildcard Gaps and One Off Constraint

Anu, Dahiya

Efficient Pattern Matching with Flexible Wildcard Gaps and One Off Constraint

dc.contributor.author	Anu, Dahiya
dc.contributor.supervisor	Garg, Deepak
dc.date.accessioned	2014-08-04T08:52:09Z
dc.date.available	2014-08-04T08:52:09Z
dc.date.issued	2014-08-04T08:52:09Z
dc.description	ME, CSED	en
dc.description.abstract	Deoxyribonucleic acid (DNA) is the storehouse of all information and genetic instructions used in the development and functioning of a cell. The amount of DNA that is being extracted from the organism is increasing at a faster rate. With the considerable increase in the amount of biosequence data, there is need to develop new methods to extract knowledge from the data. Pattern matching is a basic operation in finding knowledge from large amount of biosequence data. Finding patterns help in analyzing the property of a sequence. Analyzing the DNA sequence can help in identifying the genetic diseases. Promoter and intron in a DNA sequence does not occur consecutively but with a gap of 30-50 characters between them. So Pattern matching with wildcards is of great significance in bioinformatics. This thesis focuses on the problem of maximal pattern matching with flexible wildcard gaps and length constraints under the one-off condition. The problem is to find the maximum number of occurrences of a pattern P with user specified wildcard gap between every two consecutive letters of P in a biological sequence S under the one-off condition and constraint on the overall length of the matching occurrence. To obtain the optimal solution for this problem is difficult. For this problem, no complete solution has been developed so far. All algorithms are based on greedy approaches. In this work, different existing algorithms for solving the problem of maximal pattern matching with flexible wildcard gaps and length constraints under the one-off condition have been studied along with their merits and de-merits. These algorithms are then compared on the basis of data structure used by them, technique incorporated in the algorithm, time and space complexities. A heuristic algorithm, MOGO, based on the Nettree data structure has been proposed to solve this problem. Theoretical analysis and experimental results demonstrate that this algorithm performs better than the existing algorithms in most of the cases when tested on real world biological sequences.	en
dc.format.extent	2750795 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10266/2814
dc.language.iso	en	en
dc.subject	Pattern Matching	en
dc.subject	Wildcard Gaps	en
dc.title	Efficient Pattern Matching with Flexible Wildcard Gaps and One Off Constraint	en
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2814.pdf
Size:: 2.62 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.78 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses@CSED