Course: Data Science & Big Data Analytics Week 5 Discussion Several Big Data Visualization tools have been evaluated in this weeks paper. While the focus was primarily on R and Python with GUI
Running Head: R AND PYTHON.
Substantive replies needed for the below 2 posts:
Post 1:
1 day ago
Karthik
Week 5 discussion
COLLAPSE
Top of Form
R and Python Languages
R and Python are programming languages that are open source in a large community. The catalogs of the languages are added to new libraries continuously. R mainly is used in statistical analysis and on the other hand, Python gives a general perspective on data science. In programming, they are considered to be state of the art with regards to data science. According to Vallat (2018), Python is seen as the general-purpose programming language. It contains syntax R that is readable, which the statisticians create and specific language encompasses, while R has a bigger ecosystem in performing data analytics. It makes it considerable in specified analytical work. R has more excellent tools for communicating the results. On the other hand, Python has the capabilities of working like R but has a deployment tool for implementing machine learning on a bigger scale.
The two languages have some merits and demerits in terms of usage. Python is a pure participant in machine learning. It has very influential libraries, and R does both, data science and machine learning. R also has some powerful libraries for communication, while Python is entirely for deployment. One of Python's disadvantages is that it is not mature enough to perform communications and econometrics and lack many libraries as R and R depends between the libraries and thus slows the curve for high learning (Morandat, Hill, Osvald & Vitek 2012). An example of R is the Rstudio with knitr, while the Python example is the Ipython notebook with scikit learn. Since I have never used the two languages, I am looking forward to using them to analyze big data and data visualization because of the high speed in Python and more extensive data catalogs in R.
References
Morandat, F., Hill, B., Osvald, L., & Vitek, J. (2012, June). Evaluating the design of the R language. In European Conference on Object-Oriented Programming (pp. 104-131). Springer, Berlin, Heidelberg.
Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open-Source Software, 3(31), 1026.
Bottom of Form
Post 2
Venkat Raju
RE: Week 5 Discussion
COLLAPSE
Top of Form
R vs Python Pros and Cons
The use of R and Python languages as tools for visualizations has been very popular in the field of big data analytics. The tools have become some of the powerful tools in conducting data analytics and visualization. Both of these programming languages are open source and utilized by a large community, particularly for data science. Also, they are both regularly updated and new libraries added (Fahad & Yahya, 2018). This makes each of them very significant and unique in its own way.
Nonetheless, these tools exhibit some differences that create some contrasting features between them. For instance, R was primarily developed for statistical analysis while python provides a general approach to big data analytics. Each of these languages exhibits various pros and cons. Python is quite popular in the data science field as it's easy to use, supports multiple programming paradigms, and has less execution time compared to R (Costa, 2020). On the other hand, R has a robust set of analysis tools more than Python, a wide range of packages meant to improve its capabilities and behaviors, and allows powerful data import options. However, R is a bit difficult to learn compared to Python (Karakan, 2020).
Typically, R has a powerful statistical nature and interactive visualization capabilities. In my case, I have interacted with both tools when carrying data analytics in my profession.With that, I prefer using R as I believe is more powerful when it comes to data analysis. The language has powerful packages that make the process of data analysis and visualization quite simpler. A good example to help demonstrate this is when one needs to create a regression model. When using R code is much short and simple compared to using Python.
Example
R code
# Loading inbuilt car dataset
data("mtcars")
# Build linear regression model on full data
linearMod <- lm(dist ~ speed, data=cars)
print(linearMod)
Python code
# Loading inbuilt boston dataset
boston = datasets. load_boston(return_X_y=False)
# Defining feature matrix(X) and response vector(y)
X = boston.data
y = boston.target
# Splitting training and testing dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
# Creating linear regression object
reg = linear_model.LinearRegression()
# Training the model
reg.fit(X_train, y_train)
# Regression coefficients
print('Coefficients: \n', reg.coef_)
References
Costa, C. D. (2020, August 21). Python vs(and) R for data science. Medium. https://towardsdatascience.com/python-vs-and-r-for-data-science-4a32580846a4
Fahad, S. A., & Yahya, A. E. (2018). Big data visualization: Allotting by R and Python with GUI tools. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). https://doi.org/10.1109/icscee.2018.8538413
Karakan, B. (2020, May 16). Python vs R for data science: And the winner is. Medium. https://medium.com/@datadrivenscience/python-vs-r-for-data-science-and-the-winner-is-3ebb1a968197