Efficient Pattern Matching with Flexible Wildcard Gaps and One Off Constraint

dc.contributor.authorAnu, Dahiya
dc.contributor.supervisorGarg, Deepak
dc.date.accessioned2014-08-04T08:52:09Z
dc.date.available2014-08-04T08:52:09Z
dc.date.issued2014-08-04T08:52:09Z
dc.descriptionME, CSEDen
dc.description.abstractDeoxyribonucleic acid (DNA) is the storehouse of all information and genetic instructions used in the development and functioning of a cell. The amount of DNA that is being extracted from the organism is increasing at a faster rate. With the considerable increase in the amount of biosequence data, there is need to develop new methods to extract knowledge from the data. Pattern matching is a basic operation in finding knowledge from large amount of biosequence data. Finding patterns help in analyzing the property of a sequence. Analyzing the DNA sequence can help in identifying the genetic diseases. Promoter and intron in a DNA sequence does not occur consecutively but with a gap of 30-50 characters between them. So Pattern matching with wildcards is of great significance in bioinformatics. This thesis focuses on the problem of maximal pattern matching with flexible wildcard gaps and length constraints under the one-off condition. The problem is to find the maximum number of occurrences of a pattern P with user specified wildcard gap between every two consecutive letters of P in a biological sequence S under the one-off condition and constraint on the overall length of the matching occurrence. To obtain the optimal solution for this problem is difficult. For this problem, no complete solution has been developed so far. All algorithms are based on greedy approaches. In this work, different existing algorithms for solving the problem of maximal pattern matching with flexible wildcard gaps and length constraints under the one-off condition have been studied along with their merits and de-merits. These algorithms are then compared on the basis of data structure used by them, technique incorporated in the algorithm, time and space complexities. A heuristic algorithm, MOGO, based on the Nettree data structure has been proposed to solve this problem. Theoretical analysis and experimental results demonstrate that this algorithm performs better than the existing algorithms in most of the cases when tested on real world biological sequences.en
dc.format.extent2750795 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/2814
dc.language.isoenen
dc.subjectPattern Matchingen
dc.subjectWildcard Gapsen
dc.titleEfficient Pattern Matching with Flexible Wildcard Gaps and One Off Constrainten
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2814.pdf
Size:
2.62 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.78 KB
Format:
Item-specific license agreed upon to submission
Description: