Common statistics program packages differ considerably in terms of their strengths, weaknesses, and handling. The decision as to which system is the best fit should be made with care. Changing to a new system can result in high costs for things like new licenses and re-training. This article introduces and contrasts the market leaders – R, Python, SAS, SPSS, and STATA – to help illustrate their relative pros and cons, and help make the decision a bit easier.
R is a popular, open-source statistics environment that can be extended by packages almost at will. R is commonly used with RStudio, a comfortable development environment that can be used locally or in a client-server installation via a web browser. R applications can also be used directly and interactively on the web via Shiny.
- Very large range of functions (well over 2,000 packages)
- New statistical methods are quickly implemented
- Very easy to automate and integrate (for example, with Git, LaTeX, ODBC, Oracle R Enterprise, teradataR, Apache Hadoop, Microstrategy, etc.)
- Very good community support, as well as fee-based support via third-party providers
- Extensive help resources freely available (manuals, tutorials, and so on)
- Very powerful and flexible scripting language (e.g. support of object-oriented programming)
- All common platforms are supported (Windows, Linux, MacOS…)
- Future-proof due to very large, active developer community
- Getting familiar with the R syntax presents a barrier to entry
- Stability/quality of little-used packages is often not as high as the core distribution
- Powerful hardware is required when working with very large data sets
Licensing model and cost
R is free and open source: there are no fees for use
Originally, R was only a low-cost alternative for those that could not afford a commercial statistics program. R has outgrown this perception and now trumps the commercial competition in terms of functionality, flexibility, and integrability with other applications. Many competitors (e.g., SPSS) have reacted by integrating R into their programs. The criticism that R is much harder to learn and use than commercial competitors is less valid today with the availability of RStudio. R is a particularly good choice for frequent users that plan to deal more extensively with statistics and don’t want to be restricted by their statistical program.
Python is a fully functional, open, interpreted programming language that has become an equal alternative for data science projects in recent years. Python is particularly well-suited to the Deep Learning and Machine Learning fields, and is also practical as statistics software through the use of packages, which can easily be installed. A variety of development environments are available, such as jupyter, spyder, and PyCharm. Python is a widely-used language that is also popular in fields like web development.
- Powerful, fully-functional programming language
- Offers the potential for object-oriented, structured, and functional concepts
- Mature programming language, resulting in unit tests and debugging functionalities, for example
- A large number of stable packages in the data science sector and beyond
- Readable, clean syntax
- Constant development by a large developer community
- Full availability of the latest Deep Learning and Machine Learning methods
- Very easy to automate (e.g., via scripts or a web server)
- Fully integratable (Git, teradata, PySpark, Hadoop, KNIME)
- Extremely good community support from a large and constantly-growing community
- Visualizations are appealing and easy to create
- Professional development environments are available
- Future-proof due to continued growth in use in scientific and commercial fields
- Not all statistical methods are available
- Some development environments for statistics are still in their infancy
- High bar of entry due to being a “full” programming language
Licensing model and cost
There are no user fees for the use of Python. However, in some special areas (e.g., text mining) not all packages are released for commercial use.
Python stands out in this summary given that it is a complete programming language suitable for a wide range of applications. In recent years it has also developed into a serious statistics program due to a large number of high-performance packages and is increasing in popularity. In particular, Python is indispensable for procedures that are more likely to come from the field of computer science, such as Deep Learning. Its advantages are also clear for automation, and in interaction with other programs (which can also be written in Python). Learning Python requires being prepared to learn a complete programming language, though many good tutorials and training are available on the subject due to the language’s popularity. A development environment specifically tailored to the data science sector on the level of RStudio, for example, does not (yet) exist.
See the comparison for SAS, SPSS and STATA on the INWT Statistics website.
“Originally Posted on July 25, 2019 – What’s the Best Statistical Software? A Comparison of R, Python, SAS, SPSS and STATA”
Disclosure: Interactive Brokers
Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from INWT Statistics and is being posted with permission from INWT Statistics. The views expressed in this material are solely those of the author and/or INWT Statistics and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.
Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.