Friday, March 29, 2024

Final Project: Comparative Analysis of Fuel Efficiency in Various Vehicle Types

 

Final Project: Comparative Analysis of Fuel Efficiency in Various Vehicle Types

Step 1: Choosing a Dataset

Dataset: Fuel Economy Data from the US Department of Energy (http://www.fueleconomy.gov/feg/download.shtm)

Step 2: Sampling and Hypothesis

Sample Size: 250 vehicles

Null Hypothesis (H0): There is no significant difference in fuel efficiency between different vehicle types.

Alternative Hypothesis (H1): There is a significant difference in fuel efficiency (MPG/City) between different vehicle types.

Step 3: Write-up Summary

This study aims to determine whether different types of vehicles have statistically significant differences in fuel efficiency. Customers place a high value on fuel economy, and knowledge about the capabilities of various car models can help lawmakers and consumers alike.

This study is consistent with what was discussed in class on analysis of variance (ANOVA) and hypothesis testing. The groundwork for choosing suitable statistical techniques to evaluate the differences in fuel efficiency between various car models has been laid by previously discussed subjects in class.

I will utilize an ANOVA to answer the study question. An analysis of variances in fuel economy between different types of vehicles can be done effectively with ANOVA since it permits the comparison of means across numerous groups. The type of vehicle (compact, SUV, etc.) is the categorical variable, and fuel efficiency is the continuous variable.

The following R code was used to conduct the ANOVA variance analysis:

Step 4: Generate Visualization and Abstract

Visualization

To show the distribution of fuel efficiency for each type of vehicle, I created a boxplot. A clear comparison of the fuel efficiency distribution and central tendency across different vehicle types was made possible by this graphical approach.

The purpose of this study is to determine whether there are any statistically significant differences in fuel efficiency across different vehicle classes using ANOVA. The boxplot provides insights into the possible effects on customers and the car industry by graphically illustrating the difference in fuel efficiency. The results will advance our knowledge of how different car models differ in terms of fuel efficiency, which will have consequences for consumer decisions as well as environmental concerns.

Thursday, March 28, 2024

LIS 4317: Module # 11 Assignment

 
Module # 11 Assignment



I created a marginal histogram scatter plot for this assignment using R and the ggplot2 program. ggplot2 and ggExtra were among the packages I first loaded and installed. Next, I generated statistics that showed annual budget expenditures per capita. The scatter plot was then created using ggplot2, maintaining a basic style, and adding a linear regression line for trend visualization. ggExtra's ggMarginal function was utilized to add marginal histograms to the plot. The histogram type of marginal plots was specified, and their presence was guaranteed on both axes. 

This procedure made it possible to create a thorough visualization that combined summaries of the marginal distribution with insights from scatter plots, enabling a deeper comprehension of the properties of the data.

Wednesday, March 20, 2024

LIS 4370: Module # 11 Debugging and defensive programming in R

 

Module # 11 Debugging and defensive programming in R


Bugged Code:

tukey_multiple <- function(x) {

   outliers <- array(TRUE,dim=dim(x))

   for (j in 1:ncol(x))

    {

    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])

    }

outlier.vec <- vector(length=nrow(x))

    for (i in 1:nrow(x))

    { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }



Corrected Code:

tukey_multiple <- function(x) {

  outliers <- array(FALSE, dim = dim(x))  # Corrected initialization

  for (j in 1:ncol(x)) {

    outliers[, j] <- tukey.outlier(x[, j])  # Corrected logic

  }

  outlier.vec <- vector(length = nrow(x))

  for (i in 1:nrow(x)) { 

    outlier.vec[i] <- any(outliers[i, ])  # Corrected logic

  } 

  return(outlier.vec) 

}


Explanation:

Immediately as I looked over the code, I saw that the loop's update of the outliers array might have a flaw. The array's initialization and update appeared to deviate from the Tukey method's rationale. I found that debugging is frequently a gratifying and demanding task. To prepare it for updating with the output of the tukey.outlier method, the outliers array was initialized with FALSE rather than TRUE. The mechanism for updating the outliers array inside the loop has been corrected to correctly identify outliers for each input matrix x column.Then, to appropriately detect rows having at least one outlier, the logic inside the second loop was adjusted.

The tukey_multiple function's problem was successfully fixed after closely examining the code and comprehending the underlying reasoning. 

LIS 4317: Module # 10 assignment

 

Module # 10 assignment


Time series data is important in many fields, such as finance, economics, and weather forecasting. It is defined as observations taken at repeated time intervals. Finding patterns, drawing conclusions, and coming to wise decisions all depend on the visualization of such data. First, we take a look at the 'economics' dataset that is integrated into ggplot2, which includes economic indicators for a number of years. This dataset contains variables like population, unemployment rate, and median length of unemployed. Using a time series graphic of the unemployment rate, we first investigate patterns and variations in the dataset. 



By charting the median length of unemployment across time, we can see possible long-term trends or seasonal patterns. Furthermore, we explore combination visualization methods, comparing and contrasting several time series plots for investigation of correlation or comparability. The flexible framework of ggplot2 allows analysts and researchers to obtain deeper insights into the dynamics of time-dependent phenomena, which improves forecasting and decision-making in a variety of disciplines. 




These visualizations enable audiences to confidently and clearly navigate complicated temporal environments by revealing the underlying narratives hidden behind time series data.

Saturday, March 16, 2024

LIS 4370: Module # 10 Building your own R package


Module # 10 Building your own R package 


GitHub Link: https://github.com/agremer/LIS4370/blob/main/DESCRIPTION%20File

DECRIPTION File:


Package: DataVisTools

Title: Alec Gremer's Data Visualization Test Implementation

Version: 0.1.0.9000

Authors@R: "Alec Gremer, agremer@usf.edu [aut, cre]"

Description: DataVisTools is an R package designed to streamline data visualization tasks by offering a versatile set of functions. It will include interactive visualization features, customizable plot aesthetics, statistical visualization tools, geospatial mapping capabilities, and time series analysis functions. Comprehensive documentation and an MIT License will ensure ease of use and widespread adoption. Implementation will prioritize S3 classes and methods for flexibility. 

Depends: R (>= 3.1.2)

License: CC0

LazyData: true



For the final project, I proposed an R package named DataVisTools, which will serve as a comprehensive toolkit for data visualization tasks. This package will offer a range of functions designed to simplify the process of creating informative and visually appealing plots for exploratory data analysis and presentation purposes. Key features of DataVisTools will include interactive visualization functions leveraging Plotly and ggplot2, customization options for plot aesthetics, statistical visualization tools such as histograms and box plots, geospatial visualization capabilities for mapping data points, and time series visualization functions for analyzing trends and seasonality. To ensure usability and clarity, DataVisTools will come with thorough documentation, including long-form vignettes and metadata. The package will be released under the CC0 License, allowing for free use, modification, and distribution while maintaining proper attribution. Implementation will primarily utilize S3 classes and methods for flexibility and simplicity.

Monday, March 4, 2024

LIS 4370: Module # 9 Visualization in R


Module # 9 Visualization in R 


The dataset I chose to present for this assignment was based off of the total amount of Florida Voting Records by county, saved as "Florida.csv".


Base R Graphics (Bar plot):

Without the need for any additional packages, this kind of plot may be made with simple R functions. It can be easily created and is appropriate for basic visualizations. The graphic makes it simple to compare the total votes across different counties by displaying the votes by county using bars. But in comparison to other packages, there aren't as many choices for customization.


Lattice Package (Dot plot):

Plotting may be done using Lattice thanks to its high-level interface. In contrast to bars, which can be more difficult to read when dealing with a large number of categories, the dot plot displays the distribution of total votes by county using dots. Compared to base R graphics, Lattice provides greater customization options for the plot's design and arrangement.




ggplot2 Package (Box plot):

Because of its language of graphics approach, ggplot2 is a well-liked and robust software for making graphics in R. Using the median, quartiles, and any outliers displayed, the box plot illustrates the distribution of total votes by county. With ggplot2, users may build highly customizable, publication-quality plots thanks to its vast customization possibilities. It has a tiered approach, which makes it simple to enhance interpretation by adding more layers to the story.


In terms of user-friendliness, personalization choices, and plotting possibilities, each package has advantages and disadvantages that vary. Advanced customization is absent from Base R graphics, which are straightforward and simple to utilize. With lattice, you may plot higher-level functions with more flexibility. For making intricate and visually appealing plots, ggplot2 is the best solution due to its wide range of customization options. The particular needs and tastes of the user determine which package to select.

LIS 4317: Module #9 assignment


 LIS 4317: Module #9 assignment


The following matrix was constructed in ggplot2 using the 'iris' dataset:


Each plot in this scatterplot matrix is a combination of the two variables, sepal length and sepal width. We can compare the connections between variables across different species of iris by faceting the data points according to their species.


The 5 principles of design for this visualization that have been implemented in this graph have:

Ensured proper alignment of axis labels, data points, and facets to create a sense of order and organization in the plot. Each plot in the matrix was aligned properly with consistent axis labels.

Repeated design elements such as axis labels and facetting to create consistency and coherence throughout the visualization. Consistency in representation helped viewers understand the relationships between different variables and categories.

Utilized contrast to highlight important elements and relationships. Different colors were used for different species of iris to distinguish between them easily. This helped viewers identify patterns specific to each species.

Grouped related elements together to improve readability and comprehension. Data points representing the same species were grouped together within each facet. This proximity allowed viewers to compare the relationships between variables within each species.

Distributed elements evenly throughout the visualization to create a sense of equilibrium. Each facet in the scatterplot matrix contained enough data points to provide meaningful insights, and the visual weight of the plot was balanced by adjusting the size and spacing of facets.

“Ethical Concerns on the Deployment of Self-driving Cars”: A Policy and Ethical Case Study Analysis

Alec Gremer University of South Florida LIS4414.001U23.50440 Information Policy and Ethics Dr. John N. Gathegi, June 12th, 2023 “The Ethical...