Download 50 Top Free Data Mining Software

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Predictive Analytics, Data Mining, Big data, Text Analytics, Business Intelligence, Social Media Analytics, Cloud, Digital and Emerging Technology
Must Read:Top 34
Predictive Analytics
Software API
Must Read:What is
Predictive Analytics ?
Must Read:What is
Predictive Modeling ?
Top 53 Business
Intelligence Companies
40 Open Source and
Free Business
Intelligence Solutions
Must Read:50 Bigdata
Must Read:50 Bigdata
Platforms and Bigdata
Analytics Software
Predictive Analytics
Now Reading
50 Top Free Data Mining Software
Prev (
Full Article
Top Free Data Mining Software
User Reviews
50 Top Free Data Mining Software
50 Top Free Data Mining Software : Data Mining is the computational process of discovering patterns in large
data sets involving methods using the arti cial intelligence, machine learning, statistical analysis, and database
systems with the goal to extract information from a data set and transform it into an understandable structure
for further use. Orange, Weka,Rattle GUI, Apache Mahout, SCaViS, RapidMiner, R, ML-Flex, Databionic ESOM
Tools, Natural Language Toolkit, SenticNet API , ELKI , UIMA, KNIME, , Vowpal Wabbit, GNU
Octave, CMSR Data Miner, Mlpy, MALLET, Shogun, Scikit-learn, LIBSVM, LIBLINEAR, Lattice Miner, Dlib, Jubatus,
KEEL, Gnome-datamine-tools, Alteryx Project Edition , OpenNN, ADaM, ROSETTA, ADaMSoft, Anaconda, yooreeka,
AstroML, streamDM, jHepWork, TraMineR, ARMiner, arules, CLUTO and TANAGRA are some of the top free data
mining ( software in no particular order.
4.56 (91.18%) 161
You may also like to review the top free data analysis
freeware software list :
Top Free Data Analysis Software (
You may also like to review the top proprietary data mining software list:
Top Data Mining Software (
Top Free Data Mining Software
Orange, Weka,Rattle GUI, Apache Mahout, SCaViS, RapidMiner, R, ML-Flex, Databionic ESOM Tools, Natural
Language Toolkit, SenticNet API , ELKI , UIMA, KNIME, , Vowpal Wabbit, GNU Octave, CMSR
Data Miner, Mlpy, MALLET, Shogun, Scikit-learn, LIBSVM, LIBLINEAR, Lattice Miner, Dlib, Jubatus, KEEL,
Gnome-datamine-tools, Alteryx Project Edition , OpenNN, ADaM, ROSETTA, ADaMSoft, Anaconda, yooreeka,
AstroML, streamDM, jHepWork, TraMineR, ARMiner, arules, CLUTO and TANAGRA.
Orange is a component based data mining and machine learning software suite written in the Python language. It
is an Open source data visualization and analysis for novice and experts. Data mining can be done through visual
programming or Python scripting. It has components for machine learning. There are add ons for bioinformatics
and text mining. It is also packed with features for data analytics, di erent
visualizations, from scatterplots, bar charts, trees, to dendrograms,
networks and heatmaps. Orange remembers the choices, and suggests most
frequently used combinations, and intelligently chooses which
communication channels between widgets to use. Orange uses common
Python open-source libraries for scienti c computing, such as numpy, scipy
and scikit-learn, while its graphical user interface operates within the crossplatform Qt framework.The default installation includes a number of
machine learning, preprocessing and data visualization algorithms in 6
widget sets such as data, visualize, classify, regression, evaluate and
unsupervised. Additional functionalities are available as add-ons for
bioinformatics, data fusion and text-mining.
Compare ✖
Dataiku DSS is the collaborative data science platform that enables teams to explore, prototype, build, and
deliver their own data products more e ciently. Dataiku DSS provides an interactive visual interface where they
can point, click, and build or use languages like SQL to data wrangle, model, easily re-run work ows, visualize
results, and get up-to-date insights on demand. Dataiku DSS provides tools to draft data preparation and
modelisation in seconds, that wish to leverage their favorite ML libraries (scikitlearn, R, MLlib, H2O, and so on),
and that rely on automating their work in a completely customizable interface. Data Ops.
Dataiku (
Orange (
Weka is a suite of machine learning software applications written in the Java programming language. Weka is
Waikato Environment for Knowledge Analysis. It is a collection of machine learning algorithms for data mining
tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains
tools for data pre-processing, classi cation, regression, clustering, association rules, and visualization.Weka
provides access to SQL databases using Java Database Connectivity and can process the result returned by a
database query. It is not capable of multi-relational data mining, but there is separate software for converting a
collection of linked database tables into a single table that is suitable for processing using Weka.
Weka (
3.Rattle GUI
Rattle GUI is a free and open source software providing a graphical user interface (GUI) for Data Mining using the
R statistical programming language. Rattle provides considerable data mining functionality by exposing the
power of the R Statistical Software through a graphical user interface.
Rattle GUI (
4.Apache Mahout
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed
or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative ltering,
clustering and classi cation. Many of the implementations use the Apache Hadoop platform.Provides algorithms
for Scala + Apache Spark, H2O, Apache Flink. Also provides Samsara, a vector math experimentation environment
with R-like syntax which works at scale.
Apache Mahout (
SCaViS is a Java cross platform data analysis framework developed at Argonne National Laboratory.SCaVis can be
used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations,
function minimization, linear algebra, solving systems of linear and di erential equations. Linear, non-linear and
symbolic regression are also available.
SCaViS (
RapidMiner provides an integrated environment for machine learning, data mining, text mining, predictive
analytics and business analytics. RapidMiner is used for business, industrial applications, research, education,
training, rapid prototyping, and application development and has more than 600 enterprise customers and more
than 250,000 active users.
RapidMiner (
R is a language and environment for statistical computing and graphics.
R (
ML-Flex is a software package that enables users to integrate with third party machine learning packages written
in any programming language, execute classi cation analyses in parallel across multiple computing nodes, and
produce HTML reports of classi cation results.
ML-Flex (http://ml
9.Databionic ESOM Tools
Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and
classi cation with Emergent Self Organizing Maps (ESOM).
Databionic ESOM Tools (
Databionic ESOM Tools
10.NLTK (Natural Language Toolkit)
NLTK ,Natural Language Toolkit, is a suite of libraries and programs for symbolic and statistical natural language
processing (NLP) for the Python language.
NLTK (Natural Language Toolkit) (
11.SenticNet API
SenticNet API is a semantic and a ective resource for opinion mining and sentiment analysis.
SenticNet API (
ELKI is a university research project with advanced cluster analysis and outlier detection methods written in the
Java language.ELKI provides a large collection of highly parameterizable algorithms, in order to allow easy and fair
evaluation and benchmarking of algorithms.In ELKI, data mining algorithms and data management tasks are
separated and allow for an independent evaluation.
ELKI (http://elki.dbs.i
The UIMA is Unstructured Information Management Architecture. UIMA is a component framework for analyzing
unstructured content such as text, audio and video and is originally developed by IBM.UIMA enables applications
to be decomposed into components. Each component implements interfaces de ned by the framework and
provides self-describing metadata via XML descriptor les. The framework manages these components and the
data ow between them.
KNIME, the Konstanz Information Miner, is a user friendly and comprehensive data analytics framework which
o ers capabilities for the entire analysis process: data access, data transformation, initial investigation, powerful
predictive analytics, visualisation and reporting.
Knime is a chemical structure miner and web search engine. (
16.Vowpal Wabbit
Vowpal Wabbit is an open source fast out of core learning system library and program developed originally at
Yahoo! Research, and currently at Microsoft Research. Vowpal Wabbit’s is notable as an e cient scalable
implementation of online machine learning and support for a number of machine learning reductions,
importance weighting, and a selection of di erent loss functions and optimization algorithms.
Vowpal Wabbit (
GraphLab is a graph-based, high performance, distributed computation framework written in C++. It is used in a
broad range of other data-mining tasks; out-performing other abstractions by orders of magnitude.
GraphLab (
18.GNU Octave
GNU Octave is a high level programming language, primarily intended for numerical computations. It provides a
command line interface for solving linear and nonlinear problems numerically, and for performing other
numerical experiments using a language that is mostly compatible with MATLAB.
GNU Octave (
19.CMSR Data Miner
CMSR Data Miner Suite provides an integrated environment for predictive modeling, segmentation, data
visualization, statistical data analysis, and rule-based model evaluation. It also provides integrated analytics and
rule-engine environment for advanced power users.
CMSR Data Miner (
CMSR Data Miner
Mlpy is a Python, open source, machine learning library built on top of NumPy/SciPy, the GNU Scienti c Library.
Mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised
problems and it is aimed at nding a reasonable compromise among modularity, maintainability, reproducibility,
usability and e ciency.
Mlpy (
MALLET is an integrated collection of Java code useful for statistical natural language processing, document
classi cation, cluster analysis, information extraction, topic modeling and other machine learning applications to
Shogun is a free, open source toolbox written in C++. It o ers numerous algorithms and data structures for
machine learning problems. The focus of Shogun is on kernel machines such as support vector machines for
regression and classi cation problems. Shogun also o ers a full implementation of Hidden Markov models.
Shogun (
Scikit-learn is an open source machine learning library for the Python programming language.It features various
classi cation, regression and clustering algorithms including support vector machines, random forests, gradient
boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scienti c libraries
NumPy and SciPy.
Scikit-learn (
LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National
Taiwan University. LIBSVM implements the SMO algorithm for kernelized support vector machines (SVMs),
supporting classi cation and regression.LIBLINEAR implements linear SVMs and logistic regression models
trained using a coordinate descent algorithm.
25.Lattice Miner
Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of
concept lattices. Lattice Miner allows also the drawing of nested line diagrams.
Lattice Miner (
Dlib is a general purpose cross platform open source software library written in the C++ programming language.
Its design is heavily in uenced by ideas from design by contract and component-based software engineering.
Dlib (
Jubatus is an open
source online machine
Make Hiring Easier, Faster & Cost E ective. Free Demo!
features like classi cation, recommendation, regression, Anomaly detection, graph mining.
Jubatus (
KEEL is Knowledge Extraction based on Evolutionary Learning and is a suite of machine learning software tools,
developed under the Spanish National Project.KEEL provides a simple GUI based on data ow to design
experiments with di erent datasets and computational intelligence algorithms in order to assess the behavior of
the algorithms.
29.Gnome datamine tools
Gnome datamine tools is a growing collection of tools packaged to provide a freely available single collection of
data mining tools.
Gnome datamine tools (
30.Modular toolkit for Data Processing (MDP)
30.Modular toolkit for Data Processing (MDP)
The Modular toolkit for Data Processing (MDP) is a library of widely used data processing algorithms that can be
combined according to a pipeline analogy to build more complex data processing software.
Modular toolkit for Data Processing (MDP) (
Fityk is a curve tting and data analysis application, predominantly used to t analytical, bell-shaped functions to
experimental data. It is positioned to ll the gap between general plotting software and programs speci c for one
Fityk (http://
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data
analysis tools for the Python programming language.
Pandas (
PyBrain is a modular Machine Learning Library for Python. Its goal is to o er exible, easy-to-use yet still
powerful algorithms for Machine Learning Tasks and a variety of prede ned environments to test and compare
your algorithms.
PyBrain (
34. MiningMart
MiningMart processes data from relational databases.MiningMart currently supports PostgreSQL, MySql and
MiningMart (
35.Alteryx Project Edition
Alteryx Project Edition comes with over 150 tools, to blend, cleanse and analyze data. Project Edition includes a
number of predictive (R language) drag-and-drop tools that can build into analytic work ows. Some of the most
useful of these are those related to A/B Testing. These are tools can be used to pilot a change for instance a new
menu, a promotion, or a new web layout.
Alteryx Project Edition ()
Alteryx Analytics
OpenNN is an open source class library written in C++ which implements neural networks. The library is intended
for advanced users, with high C++ and machine learning skills. OpenNN provides an e ective framework for the
research and development of data mining and predictive analytics algorithms and applications.
OpenNN (
Neural Viewer
Algorithm Development and Mining System (ADaM) is used to apply data mining technologies to remotely-sensed
and other scienti c data. The mining and image processing toolkits consist of interoperable components that can
be linked together in a variety of ways for application to diverse problem domains.
ADaM ()
DataMelt is a free mathematics software which can be used for numeric computation, statistics, symbolic
calculations, data analysis and data visualization.
DataMelt ()
ROSETTA is a toolkit for analyzing tabular data within the framework of rough set theory. ROSETTA is designed to
support the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the
data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to
validation and analysis of the induced rules or patterns.
ADaMSoft is a free and Open Source Data Mining software developed in Java. It contains data management
methods and it can create ready to use reports. It can read data from several sources and it can write the results
in di erent formats.
ADaMSoft (
Anaconda is an open data science platform powered by Python. The open source version of Anaconda is a high
performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala
packages for data science. There is also access to over 720 packages that can easily be installed with conda, the
package, dependency and environment manager, that is included in Anaconda.Includes the most popular Python,
R & Scala packages for stats, data mining, machine learning, deep learning, simulation & optimization, geospatial,
text & NLP, graph & network, image analysis. Featured packages include: NumPy, SciPy, pandas, scikit-learn,
Numba, PyTables, h5py, Matplotlib, Jupyter (formerly IPython), Spyder, Qt/PySide, VTK, Numexpr, Cython, Theano,
scikit-image, NLTK, NetworkX, IRKernel, dplyr, shiny, ggplot2, tidyr, caret, nnet.
Anaconda (
Free Software Tools
Data Mining Software
Free Pattern
yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The
algorithms covered are Clustering :Hierarchical—Agglomerative and Divisive Partitional , Classi cation :Bayesian, Decision trees, Neural Networks, Rule based, Recommendation, Collaborative ltering : Content based, Search,
PageRank, DocRank and Personalization.
yooreeka (
AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and
matplotlib. It contains a growing library of statistical and machine learning routines for analyzing astronomical
data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and
visualizing astronomical datasets.
AstroML (
jHepWork is an environment for scienti c computation, data analysis and data visualization. It is fully
multiplatform, a 100% Java and integrated with the Jython (Python) scripting language.
jHepWork (
streamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei
Noah’s Ark Lab. Spark Streaming is an extension of the core Spark API that enables stream processing from a
variety of sources.
streamDM (
TraMineR is a R-package for mining, describing and visualizing sequences of states or events. Its primary aim is
the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family
TraMineR (
ARMiner is a client-server data mining application specialized in
developped at UMass/Boston as a Software Engineering project.
nding association rule. ARMiner has been
ARMiner (
arules provides the infrastructure for representing, manipulating and analyzing transaction data and patterns in
frequent itemsets and association rules.
arules (
CLUTO is a software package for clustering low- and high-dimensional datasets and for analyzing the
characteristics of the various clusters. CLUTO is well-suited for clustering data sets arising in many diverse
application areas including information retrieval, customer purchasing transactions.
TANAGRA is a data mining software for academic and research purposes. It proposes several data mining
methods from exploratory data analysis, statistical learning, machine learning and databases area. TANAGRA
contains some supervised learning but also other paradigms such as clustering, factorial analysis, parametric and
nonparametric statistics, association rule, feature selection and construction algorithms.
You may also like to review the top free data analysis freeware software list :
Top Free Data Analysis Software (
You may also like to review the top proprietary data mining software list:
Top Data Mining Software (
LinkedIn 211 (
Facebook 279 (
Google (
Twitter (
Pinterest 26 (
Pocket (
Tumblr (
5 Reviews
Leave a Review
March 17, 2014 at 9:23 am (
Hello bud, on your data mining softwares witch 1 would u recommend for email mining? Thank you
➦ (
April 1, 2014 at 11:50 pm (
Do any of these have non-English capabilities?
➦ (
July 29, 2014 at 12:52 am (
Hi buddy! Are there any attempts to do cloud based data analytics softwares? I think such a thing can solve the problem Phoenix had mentioned.
➦ (
K R Chin
January 25, 2015 at 6:14 pm (
I’d like to know if there are any data mining programs which could be used to predict terrorist activities or analyze material movements (shipping,
purchases, and orders) to search for indicators of suspicious activity.
I’m a security consultant and advisor, this sort of information would be useful in my consultations.
➦ (
March 5, 2015 at 4:00 pm (
Hi KR Chin,
To predict any activity you need to know which variables you want to base your prediction on. You also need a historical data to run your predictive
analysis and nd the possible correlations between di erent event. I know that somewhere in the US the police uses crime predictions based on
historical criminality data (new Orleans if I am not mistaken)…bottom line : you need data to get the info ! have fun
➦ (
Posted In
Data Mining (
data mining (, data mining software (, free data mining (, open source data mining
(, predictive analytics (
data mining (, data mining software (, free data mining (, open source data mining
(, predictive analytics (
About The Author
imanuel (
Copyright © 2017 Predictive Analytics Today, All Rights Reserved.DMCA Protected and Monitored.
Run by Predictive Analytics Today ( || About (|| Vendor Info
(|| Write for US (
➦ Go (