Course: Data Science & Big Data Analytics Week 5 Discussion Several Big Data Visualization tools have been evaluated in this weeks paper. While the focus was primarily on R and Python with GUI

Running Head: R AND PYTHON.

Substantive replies needed for the below 2 posts:

Post 1:


1 day ago

Karthik

Week 5 discussion

COLLAPSE

Top of Form

R and Python Languages

R and Python are programming languages that are open source in a large community. The catalogs of the languages are added to new libraries continuously. R mainly is used in statistical analysis and on the other hand, Python gives a general perspective on data science. In programming, they are considered to be state of the art with regards to data science. According to Vallat (2018), Python is seen as the general-purpose programming language. It contains syntax R that is readable, which the statisticians create and specific language encompasses, while R has a bigger ecosystem in performing data analytics. It makes it considerable in specified analytical work. R has more excellent tools for communicating the results. On the other hand, Python has the capabilities of working like R but has a deployment tool for implementing machine learning on a bigger scale.

The two languages have some merits and demerits in terms of usage. Python is a pure participant in machine learning. It has very influential libraries, and R does both, data science and machine learning. R also has some powerful libraries for communication, while Python is entirely for deployment. One of Python's disadvantages is that it is not mature enough to perform communications and econometrics and lack many libraries as R and R depends between the libraries and thus slows the curve for high learning (Morandat, Hill, Osvald & Vitek 2012). An example of R is the Rstudio with knitr, while the Python example is the Ipython notebook with scikit learn. Since I have never used the two languages, I am looking forward to using them to analyze big data and data visualization because of the high speed in Python and more extensive data catalogs in R.

References

Morandat, F., Hill, B., Osvald, L., & Vitek, J. (2012, June). Evaluating the design of the R language. In European Conference on Object-Oriented Programming (pp. 104-131). Springer, Berlin, Heidelberg.

Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open-Source Software, 3(31), 1026.

Bottom of Form








Post 2



Venkat Raju

RE: Week 5 Discussion

COLLAPSE

Top of Form

R vs Python Pros and Cons

The use of R and Python languages as tools for visualizations has been very popular in the field of big data analytics. The tools have become some of the powerful tools in conducting data analytics and visualization. Both of these programming languages are open source and utilized by a large community, particularly for data science. Also, they are both regularly updated and new libraries added (Fahad & Yahya, 2018). This makes each of them very significant and unique in its own way.

Nonetheless, these tools exhibit some differences that create some contrasting features between them. For instance, R was primarily developed for statistical analysis while python provides a general approach to big data analytics. Each of these languages exhibits various pros and cons. Python is quite popular in the data science field as it's easy to use, supports multiple programming paradigms, and has less execution time compared to R (Costa, 2020). On the other hand, R has a robust set of analysis tools more than Python, a wide range of packages meant to improve its capabilities and behaviors, and allows powerful data import options. However, R is a bit difficult to learn compared to Python (Karakan, 2020).

Typically, R has a powerful statistical nature and interactive visualization capabilities. In my case, I have interacted with both tools when carrying data analytics in my profession.With that, I prefer using R as I believe is more powerful when it comes to data analysis. The language has powerful packages that make the process of data analysis and visualization quite simpler. A good example to help demonstrate this is when one needs to create a regression model. When using R code is much short and simple compared to using Python.

Example

R code

Loading inbuilt car dataset

data("mtcars")

# Build linear regression model on full data

linearMod <- lm(dist ~ speed, data=cars) 

print(linearMod)

 

Python code

# Loading inbuilt boston dataset

boston = datasets. load_boston(return_X_y=False)

# Defining feature matrix(X) and response vector(y)

X = boston.data

y = boston.target

# Splitting training and testing dataset

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

# Creating linear regression object

reg = linear_model.LinearRegression()

# Training the model

reg.fit(X_train, y_train)

# Regression coefficients

print('Coefficients: \n', reg.coef_)

References

Costa, C. D. (2020, August 21). Python vs(and) R for data science. Medium. https://towardsdatascience.com/python-vs-and-r-for-data-science-4a32580846a4

Fahad, S. A., & Yahya, A. E. (2018). Big data visualization: Allotting by R and Python with GUI tools. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE)https://doi.org/10.1109/icscee.2018.8538413

Karakan, B. (2020, May 16). Python vs R for data science: And the winner is. Medium. https://medium.com/@datadrivenscience/python-vs-r-for-data-science-and-the-winner-is-3ebb1a968197