Every few days, I get a message or a mail from a newbie who wants to learn about machine learning. The primary dilemma that they are not able to get past is not about datasets or the problems that they want to solve but the dilemma is of a very rudimentary kind – one, which plagues almost everyone who wants to join the machine learning universe.
“Should I use R or should I use Python?”
That’s the dilemma. The answer that you will get depends on who you present the question to. A statistician by heart might probably swear by R – it was created for them – but there always are those outliers who would not leave Python’s side, irrespective of their statistical beliefs. The same stands for those who have tamed Python and would probably work on R under duress.
The RedMonk Programming Language Rankings: January 2020 place Python and R as the two most popular programming languages for statistics and advanced forms of data analysis. It’s a comparison that has spawned several groups in support of each of the languages, each baying for the other’s processing prowess. As with most comparisons where the matters of the heart supersede those of the mind, this battle too, always runs into indecisiveness. The simple reason is the desire to look for a onestop solution for all our problems, I mean, dataanalysis related problems. Unfortunately, there is no onestop solution.
Both these languages have evolved over the least two decades, are well known and have good user/contributing bases. However, if we must answer which of them is the “best”, there is only one answer – “it depends“.
We have had a lot of capability wise discussions around R and Python, but I largely believe that both are capable programming languages and their usage differs based on the user’s comfort or on the scenario.
Let me explain this “it depends” from a user’s perspective and give my view:
 For firsttime user:
 If you are a researcher or statistician or somebody from a nonsoftware field, then you might find R interesting and easy to learn. R makes it quite easy to perform analysis and the availability of packages is quite easy, especially in RStudio. Though the programming language has a steep learning curve for complex functionalities, it is easy to begin with. Let us just say that the entry barrier for R is low.
 If you are a developer, tester, or a person with experience in software, you might find Python easier to work with. Python has evolved as a scripting language and has the capabilities in dealing with both backend and frontend programs, especially with frameworks such as Django and Beeware.
 For someone who wants to build a product:
 If you are a product manager/technology lead or a person who is planning to build a product using the power of machine learning & AI, then you might want to consider Python. I know that I am moving into controversial territory with these hypotheses, but they are just that – hypotheses. The reason is that Python is robust for deployment and it has optimized packages and methodologies to compute large scale mathematical algorithms. Python has been widely used to make different applications giving it a proven track record of large scale deployability, when compared to R.
 For someone who is into business consulting (data driven):
 Consulting as a craft evaluates all the available aspects of the problem and cleverly chooses the right one. A consultative approach dictates that we can’t tie our options down to one of the two platforms. In my opinion, consultants could use:
 Python when:
 They have a huge data table that gets more than 70% of your systems’ RAM going
 They use nonparametric forms and more Blackbox forms
 They have to implement deep learning
 R when:
 They have to perform firsthand data analysis (EDA) of comparatively smaller datasets
 They have to work with stateoftheart algorithms, as researchers usually develop packages in CRAN
 They need great visualizations
 Python when:
 Data Scientist:
 The work of a data scientist and the value that they bring is not limited to just a handful of businesses, sectors or even technologies. They work across many technologies and fields of work, beginning with NLP experts to computer vision experts to deep learning. Not differentiating each as a different skillset (which I believe should be segregated), we can evaluate which of the two programming languages in concern should be used:
 Python for:
 Deep learning
 NLP
 Building products
 Deploying large scale and complex algorithms (Which also can be done in R, but it is more complex to learn)
 Computer Vision
 Python for:

 R for:
 Deploying algorithms with statistical inferences (Time series, parametric forms for example)
 Doing statistical research
 Building great visualization
 Conducting market research analysis
 Empirical research
 R for:
This is not an exhaustive list. The key point of this comparison is that the choice of the language should not be fixed but should be altered based on the purpose and the persona of the user.
Let us see some differences based on examples:
Statistics in base languages

 There are a set of functions in base R which come handy for a statistician, for example
 Quantiles
 Ttest
 Anova (AOV)
 Linear models (lm)
 These functions are not present in native Python, we must import packages such as Pandas, NumPy, etc.
 There are a set of functions in base R which come handy for a statistician, for example
Interpretation of ML algorithms
Interpretation can be statistically easy in R, but power of having different solvers/optimizers can be higher in Python. For Example
 Logistic Regression
 In R if we use GLM we have these interpretations
 Deviance Residuals
 Coefficient and Significance values (P and Z values for 90%,95%,99% significance)
 Null/Residual deviance
 AIC (Akaike information criterion)
 In Python if we use Sklearn for Logistic regression
 We will not have significance values, which are key to the analysis, rather we must derive it using an external function
 On the other hand, there are solvers available in it like liblinear, Saga, Sag, newtoncg along with regularization
 In R if we use GLM we have these interpretations
Reading large data sets

 Reading large data sets is usually faster in Python than compared to R if we consider using the regular data manipulation tools. R will need sequential read most of the time.


 In R – Time taken is 4.15 minutes
 In Python – Time Taken is 77.15 seconds
 In R – Time taken is 4.15 minutes
 Let’s check to import large CSV from this dataset
 In R
 In Python
 In R

We can’t always say that one language is better, but we can make a language work better if we have deeper knowledge. Just like the merge function in R vs the merge function in pandas, the pandas merge function was written by keeping in mind the drawbacks of R’s merge function and thus, has a better algorithm in place.
Deep learning
While we can perform deep learning in both the programming languages, Python easily wins with its TensorFlow, Keras and Theano packages. The same thing can be done in R but it’s difficult to find the exact way to implement directly. The number of commits being done in Python is way higher than R (For e.g. if we compare TensorFlow, Keras & SKlearn in Python vs H20, mlr in r) N.B Though the TensorFlow package in R is also available, it’s been here since October 2019 and doesn’t have so many commits.
Speed
Let us take an example. Suppose I want to implement a dCNN (Deep convolutional neural network). The solution is just a search away for Python’s TensorFlow, while you will probably end up searching for halfaday to implement in R and to get rid of the errors that follow.
Visualization
The visualization in R, especially with Hadley Wickhams GGPLOT2 with more than 50 visualization types has been very handy and is powerful if you compare it with matplotlib. The visualization capabilities in R and its supporting packages are quite evolved and matured while Python’s visualization packages are still going through massive commits to catch up with R’s capabilities.
Some points which came up from an analysis done on the stackoverflow data present in kaggle, where prediction models were used– based on different features, can we predict which user you are. Some insights that came out are:

 If you are looking to move towards Linux next year, you are more likely a Python user
 If you studied statistics, you are more likely to go with R, and if you studied computer science, you might lean towards Python
 If you are young (1824 years old), you are more likely a Python user
 If you participate in coding competitions, you are more likely to be a Python user
 If you want an Android next year, you are more likely a Python user
 If you want to learn SQL next year, you are more likely an R user
 If you use MS office, you are more likely an R user
 If you want a Raspberry Pi next year, you are more likely a Python user
 If you are a fulltime student, you are more likely to be a Python user
 If you are using Agile methodology, you are more likely to be a Python user
 If you are more worried than excited about AI, then you are more likely to be an R user
 People who have been coding for 311 years are more likely to be Python users, R seems to be the flavour of choice for those with over 12 years of experience
 People using Python and R are more often moderately happy than people using only one of the languages
CONCLUSION
Everything points back to the same answer “it depends”, which essentially does not help, but proves one thing that there is no need of the comparison. Decision should be taken based on time in hand or the purpose of the usage or the stage of learning that you are in. Getting ourselves updated time and again would help us take this decision faster and better.
Today, Python’s user base has shot up way above R’s user base. More and more Python packages are being deployed to come up at par with R’s 12,000 packages. Popularity wise, the number of questions asked in Quora/Stackoverflow has increased for Python substantially, over the last couple of years. The job market is also inclined towards Python while there is a fairly large user base which still prefers R.
Personally, I believe both are equally capable languages. It is just that the ease of getting a job done is totally based on the user and the job at hand.