AARC scholars work with many datasets describing the publication outputs of research faculty. These datasets are almost always zero-inflated, or at least are skewed toward the lower end of the distribution. This phenomenon is so common we’ve even changed how we perform regression analyses to account for these skewed distributions (e.g., we ran hurdle regressions in our paper on Open Access publication trends). The histogram below shows the average number of journal articles published by scholars in departments classified as “Physics” over the past 10 years:

The data are clearly skewed towards the left side of the plot, between 0-100 articles per person over 10 years. The mean number of articles published over 10 years is 91.7 (green vertical line) and the median number of articles is 32 (red vertical line); however, a few physicists have as many as 1,100 articles over that 10-year span. These scholars are generally associated with massive multi-institution and multi-year projects such as CERN, so we looked at several other disciplines outside the natural sciences to see whether the pattern persists – the image below is for English Language and Literature journal articles over the same 10-year period:

Indeed, the distribution looks similar to that seen for physicists. In English, the mean number of journal articles per person over 10 years (green vertical line) is 3.8, while the median is 2.0 articles (red vertical line). A small number of English faculty members have published upwards of 50 journal articles over the 10-year period.

The skewness of these publication metrics complicates interpretation of discipline norms, with meaningful consequences for the faculty, administrators, and other committee members charged with comparing discipline publishing patterns for strategic planning. In English, the median number of journal articles published is about one half of the mean value, and in Physics the median is about one third of the mean value. Although means are commonly used in bibliometric comparisons, choosing the mean as the unit of comparison biases the data towards the few examples at the extreme right end of the publishing distribution.

For these reasons, we believe the median is often the more appropriate measure for intra- and interdisciplinary analysis and bibliometric comparisons. University-wide planning and evaluation are better served by focusing on discipline (or peer group) norms such as the median, rather than numbers incorporating the most extreme cases and perhaps setting unrealistic publishing expectations.

(the data above are from database version AAD2019-1470)