Tuesday, January 5, 2010

Why use R Statistical Software

From: http://datamining.togaware.com/survivor/Pros_Cons.html

*R is the most comprehensive statistical analysis package available. It incorporates all of the standard statistical tests, models, analyses, as well as providing a comprehensive language for managing and manipulating data.

* R is a programming language and environment developed for statistical analysis by practising statisticians and researchers.

* R is developed by a core team of some 10 developers, including some of the worlds leading Statisticians.

* The validity of the R software is ensured through openly validated and comprehensive governance as documented for the American Food and Drug Authority in XXXX. Because R is open source, unlike commercial software, R has been reviewed by many internationally renowned statisticians and computational scientists.

* R has over 1400 packages available specialising in topics like from Econometrics, Data Mining, Spatial Analysis, Bio-Informatics.

* R is free and open source software allowing anyone to use and, importantly, to modify it. R is licensed under the GNU General Public License, with Copyright held by The R Foundation for Statistical Computing.

* Anyone can freely download and install the R software and even freely modify the software, or look at the code behind the software to learn how things are done.

* Anyone is welcome to provide bug fixes, code enhancements, and new packages, and the wealth of quality packages available for R is a testament to this approach to software development and sharing.

* R well integrates packages in different languages, including Java (hence the Rpackage[]RWeka package), Fortran (hence Rpackage[]randomForest), C (hence Rpackage[]arules), C++, and Python.

* The R command line is much more powerful than a graphical user interface.

* R is cross platform. R runs on many operating systems and different hardware. It is popularly used on GNU/Linux, Macintosh, and MW/Windows, running on both 32bit and 64bit processors.

* R has active user groups where questions can be asked and are often quickly responded to, and often responded to by the very people who have developed the environment--this support is second to none. Have you ever tried getting support from people who really know SAS or are core developers of SAS?

* New books for R (the Springer Use R! series) are emerging and there will soon be a very good library of books for using R.

* No license restrictions (other than ensuring our freedom to use it at our own discretion) and so you can run R anywhere and at any time.

* R probably has the most complete collection of statistical functions of any statistical or data mining package. New technology and ideas often appear first in R.

* The graphic capabilities of R are outstanding, providing a fully programmable graphics language which surpasses most other statistical and graphical packages.

* A very active email list, with some of the worlds leading statisticians actively responding, is available for anyone to join. Questions are quickly answered and the archive provides a wealth of user solutions and examples. Be sure to read the Posting Guide first.

* Being open source the R source code is peer reviewed, and anyone is welcome to review it and suggest improvements. Bugs are fixed very quickly. Consequently, R is a rock solid product. New packages provided with R do go through a life cycle, often beginning as somewhat less quality tools, but usually quickly evolving into top quality products.

* R plays well with many other tools, importing data, for example, from CSV files, SAS, and SPSS, or directly from MS/Excel, MS/Access, Oracle, MySQL, and SQLite. It can also produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML.