Sunday, September 24, 2023

Module # 6 Assignment

 

Module # 6 Assignment


Part A.


a) To compute the mean of the population, you sum up all the values and divide by the total number of values:


Population mean = (8 + 14 + 16 + 10 + 11) / 5 = 59 / 5 = 11.8


b) You can use the sample() function in R to randomly select a sample of size 2 from the population, such as this one:

population <- c(8, 14, 16, 10, 11)

sample_size <- 2

sample_data <- sample(population, size = sample_size)


c) To compute the mean and standard deviation of the sample:

members <- c(3, 5, 2, 1)

sample_size <- 2

sample_data <- sample(members, size = sample_size)

sample_data


Sample Mean = 9.5

Sample Standard Deviation = 2.12


d) The sample mean (9.5) is lower than the population mean (11.8). This suggests that, on average, the sample is purchasing fewer ice creams during the academic year compared to the entire population.


Part B.


Does the sample proportion p have approximately a normal distribution?

The sample proportion, denoted as p̂ (p-hat), has an approximately normal distribution under certain conditions. One of the key conditions for the sample proportion to have an approximately normal distribution is related to the sample size (n) and the population proportion (p). The Central Limit Theorem (CLT) states that the sampling distribution of the sample proportion p̂ approaches a normal distribution as the sample size (n) becomes sufficiently large, regardless of the shape of the population distribution, as long as np and n(1-p) are both greater than or equal to 5.


In this case, you have n = 100 and p = 0.95.


np = 100 * 0.95 = 95

n(1-p) = 100 * (1 - 0.95) = 100 * 0.05 = 5

Both np and n(1-p) are greater than or equal to 5, so the conditions for the Central Limit Theorem are met. This suggests that with a sample size of 100, the sampling distribution of the sample proportion p̂ is approximately normal.

To have an approximately normal sampling distribution of the sample proportion p, you typically want both np and nq to be greater than or equal to 5. However, the exact threshold for "approximately normal" can vary depending on the specific context and desired level of precision. In practice, a common rule of thumb is to aim for np and nq both greater than or equal to 10 for a reasonably normal distribution.


What is the smallest value of n for which the sampling distribution of p is approximately normal?

The smallest value of n for which the sampling distribution of p̂ is approximately normal can vary depending on the specific context and desired level of approximation. However, a common rule of thumb is to aim for both np and n(1-p) to be greater than or equal to 10 for a reasonably normal distribution.

In this case, p = 0.95. To find the smallest value of n, you can set np and n(1-p) both equal to 10 and solve for n:


np ≥ 10

0.95n ≥ 10

n ≥ 10 / 0.95

n ≥ 10.53


Since n must be a whole number, the smallest value of n for which the sampling distribution of p̂ is approximately normal is n = 11, where np = 10.45 and n(1-p) = 0.55. So, in this context, a sample size of 11 or more would be reasonable for the sampling distribution of p̂ to be approximately normal.

Part C.


Simulated coin tossing is better done using the rbinom function than using the sample function for several reasons. The rbinom function is designed specifically for generating random binomial outcomes (e.g., heads or tails in a coin toss). It is optimized for such tasks and is generally faster than using the sample function, which is more generic. The rbinom function simplifies the process of simulating binomial events. You can specify the number of trials (n) and the probability of success (p) directly in the function, making it more intuitive for simulating binary events. The rbinom function also generates random binomial outcomes consistently, making it suitable for simulations where you need to replicate experiments many times. It ensures that the distribution of outcomes adheres to the binomial distribution.


In contrast, the sample function is more versatile and can be used for various random sampling tasks, but it may require more code to simulate coin tossing and doesn't guarantee binomial distribution characteristics without additional effort.

Saturday, September 16, 2023

Module #4 Probability Theory

 Module #4 Probability Theory


A1. Event A: P(A) = 10 

A2. Event B: P(B) = 20 

A3. Event A1: P(A1) = 20

A4. Event B1: P(B1) = 40 

 

B Event B1: True. The probability that it will rain on the day of Jane's wedding, given the weatherman's forecast for rain, is approximately 11.1%.

B Event B2: The result of successfully applying Bayes' theorem to the given data makes the answer accurate because it is the outcome that was attained. The Bayes theorem is a formula for revising probability in light of fresh information or evidence, in this case, a weather forecast. The revised likelihood that it will rain on the wedding day is determined using the prior likelihood of rain (P(A1)) as well as the conditional probabilities of the weatherman's forecast (P(B | A1) and P(B | A2).

 

C:

dbinom(X, size=N, prob=P)

dbinom(10, size=10, prob=0.2)

 

0.1073742

 

With this approach, the likelihood of successfully operating on 10 patients is roughly 0.1074, or 10.74%.

Saturday, September 9, 2023

Module #3: Data Set Analysis

Data Set Analysis




Set #1: 10, 2, 3, 2, 4, 2, 5

Set #2: 20, 12, 13, 12, 14, 12, 15


1) Central Tendency:


Mean :


Set #1: (10 + 2 + 3 + 2 + 4 + 2 + 5) / 7 = 28 / 7 ≈ 4

Set #2: (20 + 12 + 13 + 12 + 14 + 12 + 15) / 7 = 98 / 7 ≈ 14



Median:


Set #1: Median = 3 

Set #2: Median = 14 

Mode:


Set #1: Mode = 2

Set #2: Mode = 12



2) Variation:


Range:


Set #1: Range = 8

Set #2: Range = 8



Interquartile Range (IQR):



Set #1: IQR = Q3 - Q1

Set #2: IQR = Q3 - Q1



Variance:


Set #1: Variance = ~6.33 

Set #2: Variance = ~9.33



Standard Deviation:


Set #1: Standard Deviation = ~2.52

Set #2: Standard Deviation = ~3.06




Comparison of Central Tendency:


Comparing Set #1 and Set #2, the mean, median, and mode values in Set #2 are typically higher. This shows that the average value in Set #2 is greater.


Variation:


Given that the range is the same for both sets, the spread of data from minimum to maximum is also identical.

The interquartile range (IQR) of Set #2 is bigger than that of Set #1, indicating that Set #2's center 50% of the data is more dispersed than in Set #1.

Compared to Set #1, Set #2 likewise has a higher variance and standard deviation, indicating that the data are more variable.

Sunday, September 3, 2023

Module #2: Calculating Vector Mean

Calculating Vector Mean


The R script creates the myMean function, which computes the average of the assignment2 vector. The sum(assignment2) code calculates the total of each component in the vector assignment2. Using the formula, length(assignment2), the vector's length, or the number of assignment2's elements, is determined. The mean is then calculated by dividing the sum of the elements by the length of the vector.



The assignment2 vector's mean value is around 18.66667.

“Ethical Concerns on the Deployment of Self-driving Cars”: A Policy and Ethical Case Study Analysis

Alec Gremer University of South Florida LIS4414.001U23.50440 Information Policy and Ethics Dr. John N. Gathegi, June 12th, 2023 “The Ethical...