![]() ![]() It gets much more interesting if we look at missing values across variables and records in the dataset. sum(is.na(mtcars)) # 15Īrguably, though, the total number of missing values in a dataset is a rather crude measure. Getting the total number of NAs then is simple because sum() works with matrices as well as vectors. # Cadillac Fleetwood FALSE FALSE FALSE FALSE FALSE FALSE FALSEĪs you can see the result is a matrix of logical values. # Merc 450SLC FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Merc 450SL FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Merc 450SE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Merc 280C FALSE FALSE FALSE TRUE FALSE FALSE TRUE # Merc 280 FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Merc 230 FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Merc 240D FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Duster 360 FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Valiant FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Hornet Sportabout FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Hornet 4 Drive FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Datsun 710 FALSE FALSE FALSE FALSE FALSE FALSE FALSE # Mazda RX4 Wag FALSE FALSE FALSE TRUE FALSE FALSE TRUE # Mazda RX4 FALSE FALSE FALSE TRUE FALSE FALSE TRUE The is.na() function is generic and has a method for data frames so you can directly pass it a data frame as input. To illustrate the concepts let me first add some missing values to the mtcars dataset. mean(is.na(x)) # 0.4Įnough of vectors, though, let’s look at counting missing values in a data frame. Does that “formula” look somehow familiar? Summing up all elements in a vector and dividing by the total numbers of elements, that’s calculating the arithmetic mean! So, instead of using sum() and length() we can simply use mean() to get the proportion of NAs in a vector. To get the proportion of missing values you can proceed by dividing the result of the previous operation by the length of the input vector. Thus, sum(is.na(x)) gives you the total number of missing values in x. In the process TRUE gets turned to 1 and FALSE gets converted to 0. sum(is.na(x)) # 2Ĭonfused why you can sum TRUE and FALSE values? R automatically converts logical vectors to integer vectors when using arithmetic functions. First of all, to count the total number of NAs in a vector you can simply sum() up the result of is.na(). is.na(x) # FALSE FALSE TRUE FALSE TRUEĪrmed with that knowledge let’s explore how to calculate some basic summary statistics about missing values in your data. If you insist, you’ll get a useless results. verbs, you can easily string together a nice pipeline.To check for missing values in R you might be tempted to use the equality operator = with your vector on one side and NA on the other. Once you learn the dplyr functions a.k.a. I prefer the dplyr approach, which allows you to "pipe" or "chain" different functions. Group_by(data, Diet) %>% summarise(mean = mean(weight), n = length(weight))Īggregate(weight ~ Diet, data = subset(data, Diet!=1), mean) Group_by(data, Diet, Time) %>% summarise(mean = mean(weight))Īggregating and calculating two summaries.Īggregate(weight ~ Diet, data = data, FUN = function(x) c(mean = mean(x), n = length(x))) Head(aggregate(weight ~ Time + Diet, data = data, mean)) List(time = data$Time, diet = data$Diet), Group_by(data, Time) %>% summarise(mean = mean(weight)) # The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks.Īggregate(data$weight, list(time=data$Time), mean) ![]() I'll use the same ChickWeight data set as per my previous post. I wrote a post on using the aggregate() function in R back in 2013 and in this post I'll contrast between dplyr and aggregate(). I recently realised that dplyr can be used to aggregate and summarise data the same way that aggregate() does.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |