Querying Large Collections of Semistructured Data
... to distinguish text documents and rank them, math symbols do not contain much semantic information on their own. Unfortunately, considering the structure of mathematical
expressions to calculate relevance scores of documents results in ranking algorithms that
are computationally more expensive than ...
Using extended feature objects for partial similarity
... In the literature, there is a lot of work on similarity search
of geometry data. In computational geometry, researchers focus on the theoretical aspects of the 1 : 1 similarity problem.
Most of the proposed algorithms are based on similarity measures inadequate for our application [AB 92]. Another a ...
Additional file 1
... each structure gene’s promoter sequences (multiple times appeared in one sequence is
regarded as one appearance). “CACGTG” represents oligos which have at most one
nucleotide mismatch with “cacgtg” (all possible oligos are 4X6); “xxxx” represents oligos
composed by 4 nucleotide (all possible oligos ...
Annotation Strategy Guide - GEP Community Server
... Zooming in on this region, we can see there is some support for 54,810 (where the tblastn
alignment terminates) as the end coordinate for the first exon (Figure 3). However, looking
a little further downstream, you can see there is much stronger support to place the end of
the exon at 54,816. While ...
Analysis of Cross Sequence Similarities for Multiple - PolyU
... In Figure 2(a), a set of 12 nucleotides ‘ACGCTTACGCAT’ is a sample sequence.
The subsequence ‘ACGCTT’ shown between 1 and 6 indicates the first six bases of the
sample sequence while the subsequence ‘ACGCAT’ listed between 7 and 12 is the 7th to
12th bases of the sample sequence. The vertical line l ...
GENtle, a free multi-purpose molecular biology tool
... information. This trend notwithstanding, general purpose software for these tasks often
suffers from severe drawbacks. Free software exists, but is often hard to set up and
operate for users on today’s point-and-click interfaces, and usually leads to the
application of a patch-work of multiple, only ...
a genetic algorithm for the automatic generation of
... it is not the only concern. In cases where notes can be
placed on the fretboard in multiple positions without significant differences in playability, the position chosen by
a professional could seem essentially arbitrary. Because
each guitar string has a slight but noticeable difference in
timbre, t ...
wsp Gene Sequences from the Wolbachia of Filarial Nematodes
... The PCR products obtained with primers WSPintF
and WSPintR from nematode Wolbachia were sequenced directly, and the sequences were aligned to the
wsp gene available for arthropod Wolbachia. We also
tried to align wsp to the gene sequences available for the
major outer membrane proteins of Anaplasma, ...
Analysis of expressed sequence tags from the Huperzia serrata leaf
... tetraketides (Morita et al. 2007, Wanibuchi et al. 2007).
However, as of January 2009, there were only 10
nucleotide sequences from H . serrata available in the
NCBI database. The limited information on the genetic
contents of this plant triggered our efforts to construct
a cDNA library from the H . ...
The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings or nucleotide or protein sequences. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981. Like the Needleman–Wunsch algorithm, of which it is a variation, Smith–Waterman is a dynamic programming algorithm. As such, it has the desirable property that it is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme). The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible. Backtracking starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment. One does not actually implement the algorithm as described because improved alternatives are now available that have better scaling (Gotoh, 1982) and are more accurate (Altschul and Erickson, 1986).