Posts

Showing posts from February, 2024

LIS 4370: Module # 8 Input/Output, string manipulation and plyr package

Image
  Module # 8 Input/Output, string manipulation and plyr package Effective data manipulation and analysis approaches are demonstrated by R code, which provides a comprehensive solution for handling datasets. It starts by utilizing the read.table() API to let the user import any dataset they want. The code computes the average grade for every gender category by utilizing the plyr package, and then stores the results in a new dataframe. Next, using the write.table() function, the outcomes are recorded to a CSV file called "Sorted_Average.csv". Furthermore, the code effectively uses the grepl() and subset() functions to filter the dataset such that it only keeps the items whose names begin with the letter 'i,' regardless of case. Users can then access a refined dataset that meets particular criteria by writing the filtered subset that was produced to a different CSV file called "DataSubset.csv." With a focus on readability and modularity, this code embodies a so...

LIS 4317: Module # 8 Correlation Analysis and ggplot2

Image
  Module # 8 Correlation Analysis and ggplot2 In visual data representations, Few's guidelines place a strong emphasis on correctness, simplicity, and clarity. The data visualization community holds Few's principles in high regard, and they have had a big impact on how data is displayed and understood. The finest techniques in data visualization align well with Few's emphasis on clarity and simplicity. His guidelines assist guarantee that data visualizations are comprehensible and accessible to a broad audience by emphasizing the conveying of important insights and reducing needless visual clutter. The emphasis on accuracy also highlights how crucial it is to convey data honestly and steer clear of deceptive representations.

LIS 4370: Module # 7 R Object: S3 vs. S4 assignment

 Module # 7 R Object: S3 vs. S4 assignment Using variables like mpg, cyl, disp, horsepower, and others, I was able to get data on different automobile models using the "mtcars" dataset in R. It is basically a data frame with rows denoting various automobile models and columns denoting their respective properties.  The mtcars dataset is suitable for generic functions. On this dataset, you can construct functions that do tasks like generating summaries of statistics, creating visualizations, or conducting analyses. The mtcars dataset cannot be directly assigned with S3 or S4 items. R's S3 and S4 systems are object-oriented programming environments where classes and methods for objects are defined. The mtcars dataset lacks a defined class structure and is instead a data frame. How do you tell what OO system (S3 vs. S4) an object is associated with? The class() function can be used to find the OO system linked to an object. The class or classes from which the object is derive...

LIS 4317: Module # 7 assignment

Image
 Module # 7 assignment I analyzed the variable distributions and made comparisons between them using a grid of scatter plots in this visual analytics exercise using the mtcars dataset in R. Details on different car models, such as weight, horsepower (hp), and miles per gallon (mpg), are included in the mtcars dataset. Histograms provide us a visual representation of each variable's distribution, while scatter plot grids make it easier to compare two variables at a time and see possible patterns or links. Though they might not always be the most effective or appropriate strategy for all datasets or analytical objectives, Few's recommendations could be disputable. For example, scatter plots may not be ideal for datasets with a high number of variables or complex relationships, even though they are beneficial for identifying correlations between variables. Other graphical approaches or statistical tools may be better appropriate in these kinds of situations. Although Few's sug...

LIS 4317: Module # 6 assignment

Image
  Module # 6 assignment Few highlight how crucial simplicity, efficacy, and clarity are in visualizations. Few's ideas are followed in this example, where we have generated a basic histogram, shaded in dark green. Viewers may easily comprehend the data distribution since the visualization does a good job of conveying it. Yau stresses how crucial narrative and context are to data visualization. The data distribution is clearly represented by our simple histogram, as it contains contextual details like labels, annotations, and research details that might assist paint a picture of the data. These kinds of components are crucial to include in more intricate studies in order to produce a picture that is more educational.

LIS4370: Module # 6 Doing math in R part 2

Image
 Module # 6 Doing math in R part 2 1a) Sum of matrices A and B: > print(sum_result)      [,1] [,2] [1,]    7    5 [2,]    2    2 1b)  Difference of matrices A and B: > print(diff_result)      [,1] [,2] [1,]   -3   -3 [2,]   -2    4 2)   Source Code:  diag_matrix <- diag(c(4, 1, 2, 3), nrow = 4, ncol = 4) Output: 3) Source Code: matrix <- matrix(c(3, 1, 1, 1, 1,                    2, 3, 0, 0, 0,                    2, 0, 3, 0, 0,                    2, 0, 0, 3, 0,                    2, 0, 0, 0, 3),                   nrow = 5, byrow = TRUE) print(matrix) Output:       [,1] [,2] [,3] [,...

LIS 4370:Module # 5 Doing Math

Image
  Module # 5 Doing Math First, two matrices, A and B, with respective dimensions of 10x10 and 10x100, are defined in the manner that is described. The determinant of matrix A is then computed by the code using the det() function, and its non-zero value indicates whether matrix A is unique or invertible. When matrix A's determinant is zero, showing that it is singular, an alert stating that "Matrix A is singular and doesn't have an inverse" is printed.  Following this, the code tries to print matrix A's inverse, but because of its singularity, the output message is not complete and shows "Inverse of matrix A:" and a blank line. The matrix A determinant is then printed by the code. A message reading "Determinant of matrix B cannot be calculated as it is not a square matrix" is printed because matrix B is not a square matrix, making it impossible to calculate the determinant. All things considered, the process shows how to determine matrix singula...

LIS 4317: Module # 5 assignment

Image
Module # 5 assignment For this week's assignment, I used the Area/Shaded Line graph to visualize the data. There are various benefits to using a area graph to see position over time. Primarily, it facilitates the analysis of complicated or fluctuating data patterns by clearly highlighting the direction and amplitude of positional change over time. This type of graph is especially helpful for uncertain or probabilistic data, it offers a visual depiction of variability or uncertainty in the measurements by encapsulating a range of possible placements within the shaded area. Furthermore, viewers can get an understanding of the degree of potential variation and the dependability of the position estimates if the shaded region indicates confidence intervals or uncertainty boundaries. Colored line graphs make it simpler to compare different courses or trajectories over time, allowing you to see how they have changed over time and identify positional discrepancies. In general, shaded line ...

LIS 4317: Module # 4 assignment

Image
  Module # 4 assignment The data I have included in the chart above from the Monthly Modal Time Series include Collisions with a Motor Vehicle, Collisions with a Person, Primary USA Code, Ridership, Vehicle Revenue, and Vehicle Revenue Miles. My reasoning behind this explains what the variables can tell me about the data as a whole. The Collisions with a Person, Primary USA Code, and Ridership variables will give me a lot more correlation about the data behind the person, while the Collisions with a Motor Vehicle, Vehicle Revenue, and Vehicle Revenue Miles will give me a lot more information about the correlation between vehicles and cost. All of the variables included will be factored over the period of time of which the study was conducted. This will give me the opportunity to better illustrate the graph based off of the data collected.

LIS 4370: Module # 4 Programming structure in R

Image
  Module # 4 Programming structure in R The distribution of blood pressure for patients classified according to their initial assessment will be visually shown by the boxplot. An overall picture of each patient's blood pressure distribution can be obtained from the histogram. How the first assessment changes for various blood pressure levels will be displayed in a boxplot. "Bad" and "good" initial assessments will be represented by a histogram. An understanding of the relationship between blood pressure and the general practitioner's initial assessment can be found by looking at these plots and identifying trends, outliers, and the central tendency of the data. Source Code: Frequency <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2) BP <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176) First <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1) Second <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1) FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1) # Boxplots and Histogra...