Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis of Complex Systems John Sherwood Period 2 Abstract My project is involved with using data mining techniques on the internet in order to gather enough information for the use for a genetic algorithm in trend analysis of a complex system; e.g. the stock market Scope The most fundamental element of my program is creating a correlation between news about a company and its stock and the price of the stock itself. In order to do this, a huge amount of data on both stock prices and news regarding companies must be processed into a quantitative format, and then extensively analyzed. Expected Results In this project, I expect to at the very least have a very useful genetic algorithm, that given a list of independant and dependant data, can generate equations to create a tentative correlation. While the extremely chaotic nature of the specific application may prevent quantitative success in this instance, I do expect to have success on general terms. Other's Work Due to the very lucrative nature of a program that could predict the stock market: Many have tried All have failed Procedures Differs for each part of program Data Mining Analysis Determination of data parsing sequences to extract information from HTML (Hypertext Markup Language) Quantitative tests of success XML parser Data classification Trial and error tests Evaluation algorithms Discriminant generation Design Several program segments: Data mining algorithms Price data logger News parser Data analysis algorithms Command Shell Genetic unifies elements of program Graph Generator Written in PHP to build PNG graphs of data Program Tests XML parser tests prove successful in parsing properly formatted XML/XHTML, sufficient success in parsing malformed XHTML Stock logging working perfectly Equation refinement algorithms work perfectly on test data Algorithms Different program segments use different algorithms Data mining algorithm Discriminant Generation Algorithm XML parsing algorithm Equation Refinement algorithm Data Mining Based on XML parser to convert XHTML code to programming objects New algorithm allows for parsing of Google Finance pages Different algorithms required to parse different websites for information XML Parser Two potential paradigms Iterative Uses a set of flags to determine what action to take with each character Recursive Splits XML document into sets of tags and processes each tag's child elements Problems Malformed XHTML forces extensive testing of data mining for each new site Huge variety in formatting of different websites makes mining slow Results and Conclusions Actual data instead of test data may be fed into regression algorithms in a matter of days after Reuters and a few other select websites are made parseable with new algorithms.