Assignment 5: Data management/code best practices

Due by 5:00 PM on Tuesday, October 13, 2020

To do yourself

Create an Excel spreadsheet with 4 columns and 10 rows, following proper spreadsheet conventions discussed in class.
- Let the first column be an ID number.
- Let the second column be a set of integers between 20 and 30, denoting the number of teeth someone has.
- Let the third column be a set of real numbers between 1.0 and 4.0, denoting GPA in school.
- Let the fourth column be a mix of “H” and “T” character values, denoting the tosses of a coin.
Read it into R. What is the class of this object?
Create a new object with the same content, but with the class list
Address the fourth element of the list object. Address the content of the fourth element

In one line of code, calculate the mean of the second column and the third column. Do the same for the list object.
Print out the frequency distribution of the fourth column.

Rename the columns into something descriptive using R code. What variable naming convention are you using (refer to Code Organization Best Practices slides)?
Save the data into .csv format and submit along with the code file

Load the ChickWeight dataset that comes with R. Install the skimr package. Use this package to summarize the dataset in one line of code.

Explore at least one of the following functions. Compare it with the skimr functionality, which one would you prefer?

Type a list of 5 files with good naming conventions (they could have an R file extension, a SAS file extension, etc.)

Let the files represent tasks in some project. Explain why your naming conventions are good and helpful to future you.

Make an empty vector of length 10 million. Set a seed for reproducibility. Populate the vector in two ways and compare:

Use a for loop to enter a randomly-generated Exponential variable with rate parameter 3 in each position in the vector.
Use a more efficient method, possibly writing a function, to populate the vector with randomly-generated numbers from the same distribution.
Try parallelization approaches.

Time all methods, and make a conclusion.

Last updated on October 12, 2020