Coding for Data Science with R-Studio: A Comprehensive Guide to Data Visualization, Study notes of Computer science

A comprehensive guide to coding for data science using r-studio, focusing on data visualization techniques. It covers essential r functions and the ggplot2 library, detailing how to create various plots such as scatter plots, box plots, line charts, bar charts, histograms, and pie charts. Practical examples and syntax explanations, making it a valuable resource for understanding and implementing data visualization in r. It also explains mean, median and mode functions. (410 characters)

Typology: Study notes

2024/2025

Uploaded on 06/10/2025

noel-b-daniel
noel-b-daniel 🇮🇳

1 document

1 / 46

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Coding for Data Science using R-Studio
An essential part of the data science includes
data visualization. We can represent such
visualization as scatter plots, box plots, time
series plots, bar chats , histograms, pie charts etc.
Although we have functions to plot all the above
and we can also plot them by including a
package named ggplot2.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e

Partial preview of the text

Download Coding for Data Science with R-Studio: A Comprehensive Guide to Data Visualization and more Study notes Computer science in PDF only on Docsity!

Coding for Data Science using R-Studio

An essential part of the data science includes data visualization. We can represent such visualization as scatter plots, box plots, time series plots, bar chats , histograms, pie charts etc. Although we have functions to plot all the above and we can also plot them by including a package named ggplot2.

Plot function

x=c(1,2,3,4,5) y=c(2,3,4,5,6) plot(x,y) plot(x,y,type=‘l’) plot (x,y,type=b’) plot(x,y, type=‘b’,pch=12) #pch ranges from 0 to 25

Scatter plot

Example input <- mtcars[, c('disp', 'hp')] S<- plot(x = input$disp, y = input$hp, xlab = "Higher speed", ylab = "Horsepower", xlim = c(50, 500), ylim = c(40, 320), main = "Highest speed vs Horsepower")

Box plot in R

A box plot is a graphical technique of summarizing a set of data on an interval scale. Boxplots are used extensively in descriptive data analysis.

Box plot in R

A boxplot in R is created using the boxplot() function The basic syntax is boxplot(x, data, notch, varwidth, names, main) x- is the vector or formula data-is the data frame. notch- is a logical value. Set as TRUE to draw a notch. varwidth- is a logical value. Set as true to draw width of the box proportionate to the sample size. names- are the group labels which will be printed under each boxplot. main- is used to give a title to the graph

Box plot

Example temperature=airquality$Temp wind=airquality$wind boxplot(temperature,wind, main=“heading”, names=c(“Temperature”, “Windspeed”), col=c(“orange”,”red”), border=“brown”)

Box plot in R

Another example boxplot(Temp~Month,airquality, main="Heading", names=c("Temperature", “Windspeed” xlab="month Number ylab="Degree Fahrenheit", col=c("orange", "red"), border="brown")

Box plot

Bar Chart in R

The bar chart is represented by using the barplot() function The basic syntax is barplot(H,xlab,ylab,main, names.arg,col) H- is a vector or matrix containing numeric values used in bar chart. xlab- is the label for x axis. ylab- is the label for y axis. • main is the Title of the bar chart. names.arg- is a vector of names appearing under each bar. col- is used to give colours to the bars in the graph.

Bar Chart in R

Example Temperature<-c(7,12,35,25,40) months<c("jan","feb","mar","april","may") color=c("yellow","red","blue","orange","green") barplot(temperature,names.arg=months, xlab="Months", ylab="temperature", main="heading", col=color)

Histogram in R

The histogram is represented by using the hist() function The basic syntax is hist(v,main,xlab,xlim,ylim,breaks,col,border) v- is a vector containing numeric values used in histogram. main- indicates Title of the chart.

  • col is used to set color of the bars. border- is used to set border color of each bar. xlab- is used to give description of x-axis. xlim- is used to specify the range of values on the x-axis. ylim- is used to specify the range of values on the y-axis. breaks- are used to mention the width of each bar.

Histogram in R

Example k<c(9,13,21,8,36,22,12,41,31,33,19) hist(k,xlab="weight",col="yellow",border="blue")

Pie Chart in R

Example y<-c(22,57,35,88) labels<c("Volletball","Football","Baskeytball","Cricket") pie(y,labels)

Pie Chart in R

Another Example y<-c(22,57,35,88) labels<c("Volletball","Football","Baskettball","Cricket") pie(y,labels,main=“sports pie chart”,col=rainbow(length(y)))