Download Efficient and Discov..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Multi-state modeling of biomolecules wikipedia , lookup

Transcript
ABSTRACT
Existing sequence mining algorithms mostly focus on mining for
subsequences. However, a large class of applications, such as biological
DNA and protein motif mining, require efficient mining of “approximate”
patterns that are contiguous. The few existing algorithms that can be
applied to find such contiguous approximate pattern mining have
drawbacks like poor scalability, lack of guarantees in finding the pattern,
and difficulty in adapting to other applications. In this paper, we present
a new algorithm called Flexible and Accurate Motif DEtector (FLAME).
FLAME is a flexible suffix-tree-based algorithm that can be used to find
frequent patterns with a variety of definitions of motif (pattern) models. It
is also accurate, as it always finds the pattern if it exists. Using both real
and synthetic data sets, we demonstrate that FLAME is fast, scalable,
and outperforms existing algorithms on a variety of performance metrics.
In addition, based on FLAME, we also address a more general problem,
named extended structured motif extraction, which allows mining
frequent combinations of motifs under relaxed constraints.
ALGORITHM - FLAME (modelTree, dataTree, l, d, k)
model = modelTree.FirstNode()
While (model 6= modelTree.LastModel())
Evaluate Support(model,dataTree)
If ( isValid(model) ) Print “Found Model: ”, model
Else If(model.support() < k)
modelTree.PruneAt(model)
model = NextNode(model,modelTree)
End While
End
Sub Evaluate Support (model, dataTree)
newsymbol = last symbol of model.String
oldmatches = model.Parent().Matches()
newmatches = EmptyMatches()
If (model.Parent() == root)
newmatches = Expand Matches(root,newsymbol,dataTree)
Else
ForEach match x in oldmatches
newmatches = newmatches U
Expand Matches(x,newsymbol,dataTree)
End ForEach
model.SetMatches(newmatches)
Return
Sub Expand Matches (x, newsymbol, dataTree)
Let Y = Set of all single character expansions of x.String
in dataTree
ForEach element b in Y
If b’s last symbol 6= newsymbol
b.mismatches ++
If b.mismatches > max mismatches
Remove b from Y
End ForEach
Return Y
EXISTING SYSTEM
Existing sequence mining algorithms mostly focus on mining for
subsequences. Existing algorithms for structured motif mining can mine
these patterns only if the user specifies the minimum and maximum
number of gaps between the simple motifs.
Disadvantage:
1) Poor scalability,
2) Lack of guarantees in finding the pattern,
3) Difficulty in adapting to other applications.
PROPOSED SYSTEM
This method is primarily focused at finding pairs (or sets) of motifs
that co-occur in the data set within a short distance of each other. This
method only considers a simple mismatch-based definition of noise, and
does not consider other more complex motif models.
Advantage:
1) These show that FLAME is able to identify many true biological
motifs.
FLAME never misses any matches.
MODULES
1. Doctor Module.
2. Admin Module.
3. Technician Module.
4. FLAMES Module.
Doctor Module:
In this module, is used to send mail to other doctors, Admin and Lab
Technicians. Doctors, view the patient entry details and patient test details.
Edit personal details. Search test result using FLAMES algorithms.
Admin Module:
In this module, is used to enter the patient, doctor registration details
and to send the doctor username and password from the mail. View the test
details and send and view the mails using inbox. An admin is intermediate to
doctor and lab technicians.
Technician Module:
In this module, is used to enter the patient test results and also edit
those details. The lab technician is used to send mails to others and view mails
from inbox. The lab technician performs separately; it is not allowed to access
other doctors and patient details without admin permission.
FLAMES Module:
In this module, which can be used to find the (L, M, s, k) motifs.
For ease of exposition, we explain the algorithm using an (L, d, k) model,
and then describe how we extend it to the full-fledged (L, M, s, k) model.
The approach we take in FLAME explores the space of all possible
models. In order to carry out this exploration in an efficient way, we first
construct two suffix trees: a suffix tree on the actual data set that
contains counts in each node (called the data suffix tree), and a suffix
tree on the set of all possible model strings (called the model suffix tree).
This second set is typically the set of all strings of length L over the
alphabet.
SYSTEM SPECIFICATION
Hardware Requirements:
•
System
: Pentium IV 2.4 GHz.
•
Hard Disk
: 40 GB.
•
Floppy Drive
: 1.44 Mb.
•
Monitor
: 14’ Colour Monitor.
•
Mouse
: Optical Mouse.
•
Ram
: 512 Mb.
•
Keyboard
: 101 Keyboard.
Software Requirements:
•
Operating system
: Windows XP.
•
Coding Language
: ASP.Net with C#
• Data Base
: SQL Server 2005.