






































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive guide to coding for data science using r-studio, focusing on data visualization techniques. It covers essential r functions and the ggplot2 library, detailing how to create various plots such as scatter plots, box plots, line charts, bar charts, histograms, and pie charts. Practical examples and syntax explanations, making it a valuable resource for understanding and implementing data visualization in r. It also explains mean, median and mode functions. (410 characters)
Typology: Study notes
1 / 46
This page cannot be seen from the preview
Don't miss anything!







































An essential part of the data science includes data visualization. We can represent such visualization as scatter plots, box plots, time series plots, bar chats , histograms, pie charts etc. Although we have functions to plot all the above and we can also plot them by including a package named ggplot2.
x=c(1,2,3,4,5) y=c(2,3,4,5,6) plot(x,y) plot(x,y,type=‘l’) plot (x,y,type=b’) plot(x,y, type=‘b’,pch=12) #pch ranges from 0 to 25
Example input <- mtcars[, c('disp', 'hp')] S<- plot(x = input$disp, y = input$hp, xlab = "Higher speed", ylab = "Horsepower", xlim = c(50, 500), ylim = c(40, 320), main = "Highest speed vs Horsepower")
A box plot is a graphical technique of summarizing a set of data on an interval scale. Boxplots are used extensively in descriptive data analysis.
A boxplot in R is created using the boxplot() function The basic syntax is boxplot(x, data, notch, varwidth, names, main) x- is the vector or formula data-is the data frame. notch- is a logical value. Set as TRUE to draw a notch. varwidth- is a logical value. Set as true to draw width of the box proportionate to the sample size. names- are the group labels which will be printed under each boxplot. main- is used to give a title to the graph
Example temperature=airquality$Temp wind=airquality$wind boxplot(temperature,wind, main=“heading”, names=c(“Temperature”, “Windspeed”), col=c(“orange”,”red”), border=“brown”)
Another example boxplot(Temp~Month,airquality, main="Heading", names=c("Temperature", “Windspeed” xlab="month Number ylab="Degree Fahrenheit", col=c("orange", "red"), border="brown")
The bar chart is represented by using the barplot() function The basic syntax is barplot(H,xlab,ylab,main, names.arg,col) H- is a vector or matrix containing numeric values used in bar chart. xlab- is the label for x axis. ylab- is the label for y axis. • main is the Title of the bar chart. names.arg- is a vector of names appearing under each bar. col- is used to give colours to the bars in the graph.
Example Temperature<-c(7,12,35,25,40) months<c("jan","feb","mar","april","may") color=c("yellow","red","blue","orange","green") barplot(temperature,names.arg=months, xlab="Months", ylab="temperature", main="heading", col=color)
The histogram is represented by using the hist() function The basic syntax is hist(v,main,xlab,xlim,ylim,breaks,col,border) v- is a vector containing numeric values used in histogram. main- indicates Title of the chart.
Example k<c(9,13,21,8,36,22,12,41,31,33,19) hist(k,xlab="weight",col="yellow",border="blue")
Example y<-c(22,57,35,88) labels<c("Volletball","Football","Baskeytball","Cricket") pie(y,labels)
Another Example y<-c(22,57,35,88) labels<c("Volletball","Football","Baskettball","Cricket") pie(y,labels,main=“sports pie chart”,col=rainbow(length(y)))