Thursday, February 29, 2024

LIS 4370: Module # 8 Input/Output, string manipulation and plyr package

 

Module # 8 Input/Output, string manipulation and plyr package





Effective data manipulation and analysis approaches are demonstrated by R code, which provides a comprehensive solution for handling datasets. It starts by utilizing the read.table() API to let the user import any dataset they want. The code computes the average grade for every gender category by utilizing the plyr package, and then stores the results in a new dataframe. Next, using the write.table() function, the outcomes are recorded to a CSV file called "Sorted_Average.csv". Furthermore, the code effectively uses the grepl() and subset() functions to filter the dataset such that it only keeps the items whose names begin with the letter 'i,' regardless of case. Users can then access a refined dataset that meets particular criteria by writing the filtered subset that was produced to a different CSV file called "DataSubset.csv."



With a focus on readability and modularity, this code embodies a solid method for data analysis and manipulation in R. The code enhances efficiency and facilitates further data exploration and interpretation by optimizing the importing, analyzing, and filtering processes of datasets through the integration of essential functions from frequently used packages like plyr. It is a useful tool for data scientists and analysts who want to glean insights from a variety of datasets because of its well-organized structure and distinct task division, which cater to users with different skill levels.

LIS 4317: Module # 8 Correlation Analysis and ggplot2

 

Module # 8 Correlation Analysis and ggplot2




In visual data representations, Few's guidelines place a strong emphasis on correctness, simplicity, and clarity. The data visualization community holds Few's principles in high regard, and they have had a big impact on how data is displayed and understood.


The finest techniques in data visualization align well with Few's emphasis on clarity and simplicity. His guidelines assist guarantee that data visualizations are comprehensible and accessible to a broad audience by emphasizing the conveying of important insights and reducing needless visual clutter. The emphasis on accuracy also highlights how crucial it is to convey data honestly and steer clear of deceptive representations.

Thursday, February 22, 2024

LIS 4370: Module # 7 R Object: S3 vs. S4 assignment


 Module # 7 R Object: S3 vs. S4 assignment


Using variables like mpg, cyl, disp, horsepower, and others, I was able to get data on different automobile models using the "mtcars" dataset in R. It is basically a data frame with rows denoting various automobile models and columns denoting their respective properties. 

The mtcars dataset is suitable for generic functions. On this dataset, you can construct functions that do tasks like generating summaries of statistics, creating visualizations, or conducting analyses.

The mtcars dataset cannot be directly assigned with S3 or S4 items. R's S3 and S4 systems are object-oriented programming environments where classes and methods for objects are defined. The mtcars dataset lacks a defined class structure and is instead a data frame.


How do you tell what OO system (S3 vs. S4) an object is associated with?

The class() function can be used to find the OO system linked to an object. The class or classes from which the object is derived will be returned. A class of "S4", for instance, is associated with S4, and a class of "S3" is related with S3.


How do you determine the base type (like integer or list) of an object?

The R typeof() function can be used to find the base type of an object. The object's type is indicated by the string that is returned.


What is a generic function?

In R, a function that has many implementations based on the type of arguments it receives is called a generic function. It gives various object types a common interface via which to carry out comparable functions. To alter the behavior of the generic function, users can define methods for various classes.


What are the main differences between S3 and S4?

S3: The class attribute of the object determines how methods are defined and called. Although formal class definitions and method signatures are absent, it is flexible and straightforward to use.

S4: It permits multiple inheritance and requires specific class definitions. Methods are declared explicitly and are called according to the object's class and method signature. Although more complex at times, it provides improved order and control.


Wednesday, February 21, 2024

LIS 4317: Module # 7 assignment


 Module # 7 assignment


I analyzed the variable distributions and made comparisons between them using a grid of scatter plots in this visual analytics exercise using the mtcars dataset in R. Details on different car models, such as weight, horsepower (hp), and miles per gallon (mpg), are included in the mtcars dataset.





Histograms provide us a visual representation of each variable's distribution, while scatter plot grids make it easier to compare two variables at a time and see possible patterns or links.




Though they might not always be the most effective or appropriate strategy for all datasets or analytical objectives, Few's recommendations could be disputable. For example, scatter plots may not be ideal for datasets with a high number of variables or complex relationships, even though they are beneficial for identifying correlations between variables. Other graphical approaches or statistical tools may be better appropriate in these kinds of situations.

Although Few's suggestions offer a useful framework for performing visual analytics, it's crucial to take the dataset's unique properties and the analytical goals into account when selecting which visualization approaches to employ.




Wednesday, February 14, 2024

LIS 4317: Module # 6 assignment

 

Module # 6 assignment


Few highlight how crucial simplicity, efficacy, and clarity are in visualizations. Few's ideas are followed in this example, where we have generated a basic histogram, shaded in dark green. Viewers may easily comprehend the data distribution since the visualization does a good job of conveying it.

Yau stresses how crucial narrative and context are to data visualization. The data distribution is clearly represented by our simple histogram, as it contains contextual details like labels, annotations, and research details that might assist paint a picture of the data. These kinds of components are crucial to include in more intricate studies in order to produce a picture that is more educational.

LIS4370: Module # 6 Doing math in R part 2

 Module # 6 Doing math in R part 2


1a) Sum of matrices A and B:

> print(sum_result)

     [,1] [,2]

[1,]    7    5

[2,]    2    2


1b) Difference of matrices A and B:

> print(diff_result)

     [,1] [,2]

[1,]   -3   -3

[2,]   -2    4


2) Source Code: 

diag_matrix <- diag(c(4, 1, 2, 3), nrow = 4, ncol = 4)


Output:




3) Source Code:

matrix <- matrix(c(3, 1, 1, 1, 1,
                   2, 3, 0, 0, 0,
                   2, 0, 3, 0, 0,
                   2, 0, 0, 3, 0,
                   2, 0, 0, 0, 3), 
                 nrow = 5, byrow = TRUE)

print(matrix)

Output:
 
    [,1] [,2] [,3] [,4] [,5]
[1,]    3    1    1    1    1
[2,]    2    3    0    0    0
[3,]    2    0    3    0    0
[4,]    2    0    0    3    0
[5,]    2    0    0    0    3

Thursday, February 8, 2024

LIS 4370:Module # 5 Doing Math

 

Module # 5 Doing Math



First, two matrices, A and B, with respective dimensions of 10x10 and 10x100, are defined in the manner that is described. The determinant of matrix A is then computed by the code using the det() function, and its non-zero value indicates whether matrix A is unique or invertible. When matrix A's determinant is zero, showing that it is singular, an alert stating that "Matrix A is singular and doesn't have an inverse" is printed. 

Following this, the code tries to print matrix A's inverse, but because of its singularity, the output message is not complete and shows "Inverse of matrix A:" and a blank line. The matrix A determinant is then printed by the code. A message reading "Determinant of matrix B cannot be calculated as it is not a square matrix" is printed because matrix B is not a square matrix, making it impossible to calculate the determinant. All things considered, the process shows how to determine matrix singularity, compute the determinant, and work with non-square matrices in R. 

The following output followed, as displayed in the screenshot:

> # Define matrices A and B

> A <- matrix(1:100, nrow = 10)

> B <- matrix(1:1000, nrow = 10)

> # Calculate determinant of matrix A

> det_A <- det(A)

> # Check if matrix A is invertible

> if (det_A != 0) {

+   # Matrix A is invertible, so calculate its inverse

+   A_inv <- solve(A)

+ } else {

+   # Matrix A is singular and doesn't have an inverse

+   cat("Matrix A is singular and doesn't have an inverse.\n")

+   A_inv <- NULL

+ }

Matrix A is singular and doesn't have an inverse.

> # Print results for matrix A

> cat("Inverse of matrix A:\n")

Inverse of matrix A:

> if (!is.null(A_inv)) {

+   print(A_inv)

+ }

> cat("\nDeterminant of matrix A:", det_A, "\n")


Determinant of matrix A: 0 

> # Print message for matrix B

> cat("\nDeterminant of matrix B cannot be calculated as it is not a square matrix.\n")


Determinant of matrix B cannot be calculated as it is not a square matrix.

LIS 4317: Module # 5 assignment

Module # 5 assignment


For this week's assignment, I used the Area/Shaded Line graph to visualize the data. There are various benefits to using a area graph to see position over time. Primarily, it facilitates the analysis of complicated or fluctuating data patterns by clearly highlighting the direction and amplitude of positional change over time. This type of graph is especially helpful for uncertain or probabilistic data, it offers a visual depiction of variability or uncertainty in the measurements by encapsulating a range of possible placements within the shaded area. Furthermore, viewers can get an understanding of the degree of potential variation and the dependability of the position estimates if the shaded region indicates confidence intervals or uncertainty boundaries. Colored line graphs make it simpler to compare different courses or trajectories over time, allowing you to see how they have changed over time and identify positional discrepancies. In general, shaded line graphs provide a thorough representation of a position across time, encapsulating both variability and central tendency, improving data processing and understanding.


Saturday, February 3, 2024

LIS 4317: Module # 4 assignment

 

Module # 4 assignment



The data I have included in the chart above from the Monthly Modal Time Series include Collisions with a Motor Vehicle, Collisions with a Person, Primary USA Code, Ridership, Vehicle Revenue, and Vehicle Revenue Miles. My reasoning behind this explains what the variables can tell me about the data as a whole. The Collisions with a Person, Primary USA Code, and Ridership variables will give me a lot more correlation about the data behind the person, while the Collisions with a Motor Vehicle, Vehicle Revenue, and Vehicle Revenue Miles will give me a lot more information about the correlation between vehicles and cost.

All of the variables included will be factored over the period of time of which the study was conducted. This will give me the opportunity to better illustrate the graph based off of the data collected.

LIS 4370: Module # 4 Programming structure in R

 

Module # 4 Programming structure in R



The distribution of blood pressure for patients classified according to their initial assessment will be visually shown by the boxplot. An overall picture of each patient's blood pressure distribution can be obtained from the histogram. How the first assessment changes for various blood pressure levels will be displayed in a boxplot. "Bad" and "good" initial assessments will be represented by a histogram.

An understanding of the relationship between blood pressure and the general practitioner's initial assessment can be found by looking at these plots and identifying trends, outliers, and the central tendency of the data.


Source Code:


Frequency <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)

BP <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)

First <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1)

Second <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1)

FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1)


# Boxplots and Histograms

boxplot(BP ~ First, main="Boxplot of Blood Pressure by First Assessment", xlab="First Assessment (1=bad, 0=good)", ylab="Blood Pressure")


hist(BP, main="Histogram of Blood Pressure", xlab="Blood Pressure", col="lightblue", border="black")


boxplot(First ~ BP, main="Boxplot of First Assessment by Blood Pressure", xlab="Blood Pressure", ylab="First Assessment (1=bad, 0=good)")


hist(First, main="Histogram of First Assessment", xlab="First Assessment (1=bad, 0=good)", col="lightgreen", border="black")

“Ethical Concerns on the Deployment of Self-driving Cars”: A Policy and Ethical Case Study Analysis

Alec Gremer University of South Florida LIS4414.001U23.50440 Information Policy and Ethics Dr. John N. Gathegi, June 12th, 2023 “The Ethical...