Download Abstract - Logic Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Network motif wikipedia , lookup

Transcript
Efficient and Discovery of Patterns in Sequence Data Sets
ABSTRACT
Existing sequence mining algorithms mostly focus on mining for subsequences.
However, a large class of applications, such as biological DNA and protein motif mining,
require efficient mining of “approximate” patterns that are contiguous. The few existing
algorithms that can be applied to find such contiguous approximate pattern mining have
drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in
adapting to other applications. In this paper, we present a new algorithm called Flexible and
Accurate Motif DEtector (FLAME). FLAME is a flexible suffix-tree-based algorithm that can
be used to find frequent patterns with a variety of definitions of motif (pattern) models. It is also
accurate, as it always finds the pattern if it exists. Using both real and synthetic data sets, we
demonstrate that FLAME is fast, scalable, and outperforms existing algorithms on a variety of
performance metrics. In addition, based on FLAME, we also address a more general problem,
named extended structured motif extraction, which allows mining frequent combinations of
motifs under relaxed constraints.
EXISTING SYSTEM
Existing sequence mining algorithms mostly focus on mining for subsequences. Existing
algorithms for structured motif mining can mine these patterns only if the user specifies the
minimum and maximum number of gaps between the simple motifs.
Disadvantage:
1) Poor scalability,
2) Lack of guarantees in finding the pattern,
3) Difficulty in adapting to other applications.
PROPOSED SYSTEM
Contact: 040 - 40274843, 09533694296
Email id: [email protected], Website: www.logicsystems.org.in
Efficient and Discovery of Patterns in Sequence Data Sets
This method is primarily focused at finding pairs (or sets) of motifs that co-occur in the
data set within a short distance of each other. This method only considers a simple mismatchbased definition of noise, and does not consider other more complex motif models.
Advantage:
1) These show that FLAME is able to identify many true biological motifs.
FLAME never misses any matches.
MODULES
1. Doctor Module.
2. Admin Module.
3. Technician Module.
4. FLAMES Module.
Doctor Module:
In this module, is used to send mail to other doctors, Admin and Lab Technicians.
Doctors, view the patient entry details and patient test details. Edit personal details. Search test
result using FLAMES algorithms.
Admin Module:
In this module, is used to enter the patient, doctor registration details and to send the
doctor username and password from the mail. View the test details and send and view the mails
using inbox. An admin is intermediate to doctor and lab technicians.
Technician Module:
In this module, is used to enter the patient test results and also edit those details. The lab
technician is used to send mails to others and view mails from inbox. The lab technician
Contact: 040 - 40274843, 09533694296
Email id: [email protected], Website: www.logicsystems.org.in
Efficient and Discovery of Patterns in Sequence Data Sets
performs separately; it is not allowed to access other doctors and patient details without admin
permission.
FLAMES Module:
In this module, which can be used to find the (L, M, s, k) motifs. For ease of exposition,
we explain the algorithm using an (L, d, k) model, and then describe how we extend it to the
full-fledged (L, M, s, k) model. The approach we take in FLAME explores the space of all
possible models. In order to carry out this exploration in an efficient way, we first construct two
suffix trees: a suffix tree on the actual data set that contains counts in each node (called the data
suffix tree), and a suffix tree on the set of all possible model strings (called the model suffix
tree). This second set is typically the set of all strings of length L over the alphabet.
SYSTEM SPECIFICATION:
H/W SYSTEM CONFIGURATION: Processor
-Pentium –III
 Speed
- 1.1 Ghz
 RAM
- 256 MB(min)
 Hard Disk
- 20 GB
S/W System Configuration:
Operating System
: Windows XP

Front End
: Html, Javascript

Server
: Tomcat 6.0

Database
: MySql

Language
: Java
Contact: 040 - 40274843, 09533694296
Email id: [email protected], Website: www.logicsystems.org.in