Lately I have been thinking about one of the most exciting concepts for me to
teach in a programming course: how to write functions. Functions have
syntactical, technical, and aestetic qualities that all have to align to
create a “good” function. The R for Data Science book has
a great chapter that touches on many of
these issues, including the problem of how to make good names for arguments.
Naming things is a notoriously hard problem in programming, and this chapter has
a subsection about how to pick both aesthetically good names and names that
conform to the argument names in other families of functions in R.
R has several package ecosystems that often have their own conventions, so this
got me thinknig: how could you disocver what arguments are commonly used in a
family of packages so that you could use the same argument names in your own
functions? Conceptually this should be a simple enough task since R has so many
features for
computing on the langauge
which one of my favorite things to do!
So first we will grab all of the functions in a package, and then we can extract
the argument names for each function. After some searching I found the
lsf.str() function which is new to me, but will give us every function in a
package. Let’s test it out with the relatively small postcards package.
The syntax for package names with lsf.str() is package:[name of package]
which I don’t really understand, but it works! Also notice that the postcards
package has to be loaded for lsf.str() to work. Now that we have all of the
functions in this package we can get all of the arguments with args().
It’s working nicely, but this is R after all, so wouldn’t it be better to have
this information in a data frame?
Larger Packages
Now that we can get this information into a tidy data frame, we can do some
exploratory data analysis. Let’s look at a much bigger and more popular set of
functions like the base package.
At the top of this data frame we can see several operators and their arguments.
It appears that : does not have any arguments, which is certainly not
intuitive. Let’s look at all of the functions that don’t have arguments in
base:
We can see that there are several operators that supposedly don’t have any
arguments, plus special words in R like function, if, break, and return.
There are also some classics in this list like getwd() and a personal favorite
of mine: is.R().
Individual Arguments
To get back on task, let’s look at the most common arguments in base, assuming
I want to write my own functions that integrate well with the base package:
It looks like x is by far the most popular argument in base followed by
.... Off the top of my head I can think of a few functions where x is the
first argument, I wonder if that is generally the case?
Wow x is almost always the first argument whenever it is used! Let’s take a
look at the exceptions:
Looks like most of these functions are in the grep family or they have to do
with functional programming. Also I never knew about chartr(), that looks like
it could be useful in the future.
Comparing Numbers of Arguments Between Packages
Beyond just looking at argument names, we might also be concerned about how many
arguments are in our functions. Is it gauche to have too many arguments? Let’s
take a look in base again:
It looks like most functions in base don’t have many arguments. Let’s look at
ggplot2, a very popular package that I know has several functions that use
many arguments.
Let’s see which functions have the most arguments:
Whoa theme() has 95 arguments! I am going to exclude it since it is such an
outlier.
As we can see here, if you were writing functions to extend ggplot2 you could
get away with including lots of arguments without your users thinking that the
number of arguments was too unusual.
Conclusion
Like many of my posts, this is a proof-of-concept that this kind of thing could
be done. If this inspires you to look at or to write packages differently please
let me know! In general I am very interested in having more conversations about
how to create best practices for writing functions, especially for people
developing data science packages.
Motivation
Lately I have been thinking about one of the most exciting concepts for me to
teach in a programming course: how to write functions. Functions have
syntactical, technical, and aestetic qualities that all have to align to
create a “good” function. The R for Data Science book has
a great chapter that touches on many of
these issues, including the problem of how to make good names for arguments.
Naming things is a notoriously hard problem in programming, and this chapter has
a subsection about how to pick both aesthetically good names and names that
conform to the argument names in other families of functions in R.
R has several package ecosystems that often have their own conventions, so this
got me thinknig: how could you disocver what arguments are commonly used in a
family of packages so that you could use the same argument names in your own
functions? Conceptually this should be a simple enough task since R has so many
features for
computing on the langauge
which one of my favorite things to do!
So first we will grab all of the functions in a package, and then we can extract
the argument names for each function. After some searching I found the
lsf.str() function which is new to me, but will give us every function in a
package. Let’s test it out with the relatively small postcards package.
The syntax for package names with lsf.str() is package:[name of package]
which I don’t really understand, but it works! Also notice that the postcards
package has to be loaded for lsf.str() to work. Now that we have all of the
functions in this package we can get all of the arguments with args().
It’s working nicely, but this is R after all, so wouldn’t it be better to have
this information in a data frame?
Larger Packages
Now that we can get this information into a tidy data frame, we can do some
exploratory data analysis. Let’s look at a much bigger and more popular set of
functions like the base package.
At the top of this data frame we can see several operators and their arguments.
It appears that : does not have any arguments, which is certainly not
intuitive. Let’s look at all of the functions that don’t have arguments in
base:
We can see that there are several operators that supposedly don’t have any
arguments, plus special words in R like function, if, break, and return.
There are also some classics in this list like getwd() and a personal favorite
of mine: is.R().
Individual Arguments
To get back on task, let’s look at the most common arguments in base, assuming
I want to write my own functions that integrate well with the base package:
It looks like x is by far the most popular argument in base followed by
.... Off the top of my head I can think of a few functions where x is the
first argument, I wonder if that is generally the case?
Wow x is almost always the first argument whenever it is used! Let’s take a
look at the exceptions:
Looks like most of these functions are in the grep family or they have to do
with functional programming. Also I never knew about chartr(), that looks like
it could be useful in the future.
Comparing Numbers of Arguments Between Packages
Beyond just looking at argument names, we might also be concerned about how many
arguments are in our functions. Is it gauche to have too many arguments? Let’s
take a look in base again:
It looks like most functions in base don’t have many arguments. Let’s look at
ggplot2, a very popular package that I know has several functions that use
many arguments.
Let’s see which functions have the most arguments:
Whoa theme() has 95 arguments! I am going to exclude it since it is such an
outlier.
As we can see here, if you were writing functions to extend ggplot2 you could
get away with including lots of arguments without your users thinking that the
number of arguments was too unusual.
Conclusion
Like many of my posts, this is a proof-of-concept that this kind of thing could
be done. If this inspires you to look at or to write packages differently please
let me know! In general I am very interested in having more conversations about
how to create best practices for writing functions, especially for people
developing data science packages.