Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis of Complex Systems John Sherwood Period 2 Abstract My project is involved with using data mining techniques on the internet in order to gather enough information for the use for a genetic algorithm in trend analysis of a complex system; e.g. the stock market Scope The most fundamental element of my program is creating a correlation between news about a company and its stock and the price of the stock itself. In order to do this, a huge amount of data on both stock prices and news regarding companies must be processed into a quantitative format, and then extensively analyzed. Expected Results In this project, I expect to at the very least have a very useful genetic algorithm, that given a list of independant and dependant data, can generate equations to create a tentative correlation. While the extremely chaotic nature of the specific application may prevent quantitative success in this instance, I do expect to have success on general terms. Other's Work Due to the very lucrative nature of a program that could predict the stock market: Many have tried All have failed Procedures Differs for each part of program Data Mining Analysis Determination of data parsing sequences to extract information from HTML (Hypertext Markup Language) Quantitative tests of success XML parser Data classification Trial and error tests Evaluation algorithms Discriminant generation Design Several program segments: Data mining algorithms Price data logger News parser Data analysis algorithms Heuristic Generator Equation Regression Command Shell Genetic unifies elements of program Graph Generator Written in PHP to build PNG graphs of data Program Tests XML parser tests prove successful in parsing properly formatted XML/XHTML, sufficient success in parsing malformed XHTML Stock price logging working perfectly Generalized equations work but are semitimeframe specific Algorithms Different program segments use different algorithms Data mining algorithm Discriminant (Heuristic) Generation Algorithm XML parsing algorithm Equation Refinement algorithm Data Mining Based on XML parser to convert XHTML code to programming objects New algorithm allows for parsing of Google Finance pages – following links to other sites for more data Different algorithms required to parse different websites for information XML Parser Two potential paradigms Iterative Uses a set of flags to determine what action to take with each character Recursive Splits XML document into sets of tags and processes each tag's child elements Problems Malformed XHTML forces extensive testing of data mining for each new site Huge variety in formatting of different websites makes mining problematic to make sure that only news data is mined Results and Conclusions The equations generated by my equation refiner are accurate in the timeframe they are generated for, then become less accurate as time passes (time in the past or future relative to the generation timeframe), implying that the effect of news information on stock prices is non-constant and relative based on your current timeframe