For more than 10 years, I have been using R for data analysis, data manipulation, and graphing. To demonstrate the power of R, I have created a number of work samples. Please click on the links to read the papers.
I did an initial analysis of single family home sales in the Madison, Wisconsin area. Techniques used were regression, random forests, and boosting. I used the tidyverse for data manipulation and the ggplot2 package to create charts. The boxplot of home prices by Madison school and by decade used faceting. I used the flextable package to create the table.
I performed a repeat sales analysis using the R slider package.
Madison Housing Repeat Sales Analysis
I am a Dodgers baseball fan, so I decided to have some fun and compare Trea Turner to Manny Machado as measured by batting average over the past five years.
# The data
infielders = data.frame(players = c('Turner', 'Machado'),
Y2018 = c(.271, .273),
Y2019 = c(.298, .256 ),
Y2020 = c(.335, .304 ),
Y2021 = c(.328, .278),
Y2022 = c(.298, .298)
)
# reshaping the data
inf =
infielders |>
pivot_longer(cols = !players,
names_to = 'years', names_prefix = 'Y', names_transform = as.integer,
values_to = 'BA')
ggplot(inf, aes(x = years, y = BA, color = players)) +
geom_line(linewidth = 1.3) + geom_point() +
geom_point(data = highlight_df, aes(x = years, y = BA), color = "blue", size = 3.5) +
labs(title = "Trea Turner versus Manny Machado", subtitle = "Annual Batting Average 2018-2022") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
xlab("Years") +
ylab("Batting Average") +
scale_y_continuous(
labels = scales::number_format(accuracy = 0.001)) +
annotate("text", x = 2021, y = .335, label = "2021: Trea Turner is MLB batting champ")
As an El Cerrito resident, I analyzed a lot of El Cerrito data in R (and Tableau). In El Cerrito Data in Pictures, I used the R package ggplot in order to make it easy for people to see how El Cerrito’s finances were doing.
A few years ago, I took the online course by Hastie and Tibshirani based on their book Introduction to Statistical Learning. Below is some code that is based on what I learned.
# Plot of residuals versus fitted
plot(fitted(HS09.lm1AVGED), residuals(HS09.lm1AVGED), xlab = "Fitted", ylab = "Residuals")
abline(h = 0, lwd = 2)
Statistical Learning with R: Using the API Dataset and a Sim
Before I moved to El Cerrito, I lived in Albany. I decided to see how well Albany High School students were performing on certain standardized tests. I downloaded the data from the California Department of Education website and then I performed a variety of regression analyses.
The Performance of Albany High School on the API: A Statistical Regression Analysis
Copyright © 2022 IRASHARENOW.COM - All Rights Reserved.