Several Big Data Visualization tools have been evaluated in this week's paper. While the focus was primarily on R and Python with GUI tools, new tools are being introduced every day. Compare and con

Two response posts substantive. A substantive post will do at least TWO of the following:

  • Ask an interesting, thoughtful question pertaining to the topic

  • Answer a question (in detail) posted by another student or the instructor

  • Provide extensive additional information on the topic

  • Explain, define, or analyze the topic in detail

  • Share an applicable personal experience

  • Provide an outside source (for example, an article from the UC Library) that applies to the topic, along with additional information about the topic or the source (please cite properly in APA)

  • Make an argument concerning the topic.

Response needed for post 1:

R vs Python

COLLAPSE

Top of Form

Data visualization is a graphical representation of the data. It aids in communicating the relationship between various data items to the data viewers. According to Jagadish et al., humans' brains can understand 60,000 times quicker when an instruction is in an image format. Data visualization helps to make sense of the data collected by perceiving a large amount of knowledge and sometimes in real-time(Fahad et, al., 2018). Data visualization tools can be found both in R and Python, with some differences between them. 

programming language is primarily meant for data analysis and comes with basic packages installed by default, including graphics package. The graphic package provides more than 100 functions to generate basic visualizations. These straightforward functions are easy to implement, enabling users to create various plots quickly. The different graphs users can create are bar charts, box charts, histograms, etc. In addition to the essential functions, R also comes with numerous libraries like Plotly, lattice, ggplot2 (Frampton et al., 2017) that help in producing a variety of graphs and even make them interactive. Among them, ggplot2 is more popular, providing a dignified and flexible way to create charts allowing users to place objects in layers to the graphs and enabling them to make instinctive choices ('Data Visualization in R vs. Python', 2019).

Python, unlike R, is a general programming language. Programmers from different disciplines use it. Python does not come with default visualization tools but includes libraries such as Matplotlib and Seaborn. Matplotlib is the most popularly used library for visualization in Python. It is based on NumPy arrays and is originally developed as an alternative to Matlab in Python ('Data Visualization in R vs. Python', 2019). One drawback of the library is it has no capabilities for customizations like ggplot2 of R. Seaborn can be used as an extension to the Matplotlib library to enable personalization. This combination will be equivalent to the ggplot2 library allowing users to add features in layers.

To conclude, both R and Python are good data visualization tools. One thing to keep in mind is R is specially developed for data analysis, providing packages that are meant for visualizations. Customizing graphs comes as an inherent feature with R. Python also, with the help of packages like Seaborn, achieve this feature but with fewer lines of code than R language ('Data Visualization in R vs. Python', 2019). In my opinion, the choice between Python and R is really up to the user, and they can pick whichever is comfortable for them.

Bottom of Form

Response needed for post 2:

R is developed by statisticians for statistical analysis and output graphics. It is free, open source and supported by large community for updates. R provides high quality graphics with around 12000 packages available in CRAN("R Vs Python", 2020). R provides many libraries for the output of statistical analysis. The analytical power of R is virtually unmatched; one of the strongest competitors, SAS, requires three programming languages to accomplish the same tasks that R can do with one (Ozgur et al, 2017).

 

Python is free to the public and an object-oriented programming language which can use for high-level programming practice. It is more used for general programming and support large standard libraries and modules for major and smaller projects. Python is capable to develop and implement machine learning and deep learning and provided excellent API for machine learning. Python provides bunch of libraries like Seaborn, Pandas and Numpy to perform tasks related to data science. Python has built-in debugging feature to help programmers to debug the code easily. 

 

It can be easily learned or accessed by a programmer as the syntax are very easy to read for other programmer. At other hand, R as software, don't require a programmer to understand and those who are less concerned with coding and more concerned with strictly producing data models may find this an easier software with which to work (Ozgur et al, 2017). But majority of United States colleges have the Python language as their degree requirement for computer science or IT which makes Python relatively popular compare to R (Ozgur et al, 2017). Python and R both are very fast but when it comes to larger data, R is much faster than python as it uses the lapply function.

 

            In my personal experienced, as software engineer, I have used multiple object-oriented languages like, C++, Java and Python. As all these languages were part of my course, I am very familiar with the syntax and OOPs concepts. R is very different from these programming languages in terms of syntax as well as how it works. Though R is the best to visualize the data, I consider using Python as it has multiple library supports for data visualization like Matplotib, Seaborn, Plotly etc., in addition Python is also useful in several different development where R is not very suitable like machine learning, deep learning and general application developments.