Alban MANCHERON et Irena RUSU : Pattern discovery allowing gaps, substitution matrices and multiple score functions. In Gary BENSON et Roderic PAGE, éditeurs : Algorithms in Bioinformatics. Proceedings of the 3rd International Workshop on Algorithms in BioInformatics (WABI), volume 2812 de Lecture Notes in Bioinformatics (LNBI), pages 129-145. Springer-Verlag, 2003.

Pattern discovery has many applications in finding functionally or structurally important regions in biological sequences (binding sites, regulatory sites, protein signatures etc.). In this paper we present a new pattern discovery algorithm, which has the following features:
- it allows to find, in exactly the same manner and without any prior specification, patterns with fixed length gaps (i.e. sequences of one or several consecutive wild-cards) and contiguous patterns;
- it allows the use of any pairwise score function, thus offering multiple ways to define or to constrain the type of the searched patterns; in particular, one can use substitution matrices (PAM, BLOSUM) to compare amino acids, or exact matchings to compare nucleotides, or equivalency sets in both cases.
We describe the algorithm, compare it to other algorithms and give the results of the tests on discovering binding sites for DNA-binding proteins (ArgR, LexA, PurR, TyrR respectively) in E. coli, and promoter sites in a set of Dicot plants.

bib | slides | ps ] Back