# Motivation

Lately I have been thinking about one of the most exciting concepts for me to teach in a programming course: how to write functions. Functions have syntactical, technical, and aestetic qualities that all have to align to create a “good” function. The R for Data Science book has a great chapter that touches on many of these issues, including the problem of how to make good names for arguments. Naming things is a notoriously hard problem in programming, and this chapter has a subsection about how to pick both aesthetically good names and names that conform to the argument names in other families of functions in R.

R has several package ecosystems that often have their own conventions, so this got me thinknig: how could you disocver what arguments are commonly used in a family of packages so that you could use the same argument names in your own functions? Conceptually this should be a simple enough task since R has so many features for computing on the langauge which one of my favorite things to do!

So first we will grab all of the functions in a package, and then we can extract the argument names for each function. After some searching I found the lsf.str() function which is new to me, but will give us every function in a package. Let’s test it out with the relatively small postcards package.

The syntax for package names with lsf.str() is package:[name of package] which I don’t really understand, but it works! Also notice that the postcards package has to be loaded for lsf.str() to work. Now that we have all of the functions in this package we can get all of the arguments with args().

It’s working nicely, but this is R after all, so wouldn’t it be better to have this information in a data frame?

# Larger Packages

Now that we can get this information into a tidy data frame, we can do some exploratory data analysis. Let’s look at a much bigger and more popular set of functions like the base package.

At the top of this data frame we can see several operators and their arguments. It appears that : does not have any arguments, which is certainly not intuitive. Let’s look at all of the functions that don’t have arguments in base:

We can see that there are several operators that supposedly don’t have any arguments, plus special words in R like function, if, break, and return. There are also some classics in this list like getwd() and a personal favorite of mine: is.R().

# Individual Arguments

To get back on task, let’s look at the most common arguments in base, assuming I want to write my own functions that integrate well with the base package:

It looks like x is by far the most popular argument in base followed by .... Off the top of my head I can think of a few functions where x is the first argument, I wonder if that is generally the case?

Wow x is almost always the first argument whenever it is used! Let’s take a look at the exceptions:

Looks like most of these functions are in the grep family or they have to do with functional programming. Also I never knew about chartr(), that looks like it could be useful in the future.

# Comparing Numbers of Arguments Between Packages

Beyond just looking at argument names, we might also be concerned about how many arguments are in our functions. Is it gauche to have too many arguments? Let’s take a look in base again:

It looks like most functions in base don’t have many arguments. Let’s look at ggplot2, a very popular package that I know has several functions that use many arguments.

Let’s see which functions have the most arguments:

Whoa theme() has 95 arguments! I am going to exclude it since it is such an outlier.

As we can see here, if you were writing functions to extend ggplot2 you could get away with including lots of arguments without your users thinking that the number of arguments was too unusual.

# Conclusion

Like many of my posts, this is a proof-of-concept that this kind of thing could be done. If this inspires you to look at or to write packages differently please let me know! In general I am very interested in having more conversations about how to create best practices for writing functions, especially for people developing data science packages.