Table of Contents

    R vs Python: A Comparison for Data Science and Programming

    R vs Python: A Comparison for Data Science and Programming

    R or Python Usage

    Python has been developed by Guido van Rossum, a computer guy, circa 1991. Python has influential libraries for math, statistic and Artificial Intelligence. You can think Python as a pure player in Machine Learning. However, Python is not entirely mature (yet) for econometrics and communication. Python is the best tool for Machine Learning integration and deployment but not for business analytics.

    The good news is R is developed by academics and scientist. It is designed to answer statistical problems, machine learning, and data science. R is the right tool for data science because of its powerful communication libraries. Besides, R is equipped with many packages to perform time series analysis, panel data and data mining. On the top of that, there are not better tools compared to R.

    In our opinion, if you are a beginner in data science with necessary statistical foundation, you need to ask yourself following two questions:

    • Do I want to learn how the algorithm work?
    • Do I want to deploy the model?

    If your answer to both questions is yes, you'd probably begin to learn Python first. On the one hand, Python includes great libraries to manipulate matrix or to code the algorithms. As a beginner, it might be easier to learn how to build a model from scratch and then switch to the functions from the machine learning libraries. On the other hand, you already know the algorithm or want to go into the data analysis right away, then both R and Python are okay to begin with. One advantage for R if you're going to focus on statistical methods.

    Secondly, if you want to do more than statistics, let's say deployment and reproducibility, Python is a better choice. R is more suitable for your work if you need to write a report and create a dashboard.

    In a nutshell, the statistical gap between R and Python are getting closer. Most of the job can be done by both languages. You'd better choose the one that suits your needs but also the tool your colleagues are using. It is better when all of you speak the same language. After you know your first programming language, learning the second one is simpler.


    Difference between R and Python

    Parameter R Python
    Objective Data analysis and statistics Deployment and production
    Primary Users Scholar and R&D Programmers and developers
    Flexibility Easy to use available library Easy to construct new models from scratch. I.e., matrix computation and optimization
    Learning curve Difficult at the beginning Linear and smooth
    Popularity of Programming Language. Percentage change 4.23% in 2018 21.69% in 2018
    Average Salary $99.000 $100.000
    Integration Run locally Well-integrated with app
    Task Easy to get primary results Good to deploy algorithm
    Database size Handle huge size Handle huge size
    IDE Rstudio Spyder, Ipthon Notebook
    Important Packages and library tydiverse, ggplot2, caret, zoo pandas, scipy, scikit-learn, TensorFlow, caret
    Disadvantages Slow High Learning curve Dependencies between library Not as many libraries as R
    Advantages
    • Graphs are made to talk. R makes it beautiful
    • Large catalog for data analysis
    • GitHub interface
    • RMarkdown
    • Shiny
    • Jupyter notebook: Notebooks help to share data with colleagues
    • Mathematical computation
    • Deployment
    • Code Readability
    • Speed
    • Function in Python

    Conclusion

    In the end, the choice between R or Python depends on:

    • The objectives of your mission: Statistical analysis or deployment
    • The amount of time you can invest
    • Your company/industry most-used tool