Review of 'Look Who's Talking : Gender Differences in Academic Job Talks'

Reviewer: Nancy Heckman

Publication date of review: 2023-09-19

Bookmark

Nancy Heckman5

Look Who's Talking : Gender Differences in Academic Job TalksCrossref ScienceOpen

Average rating:	    Rated 5 of 5.
Level of importance:	    Rated 5 of 5.
Level of validity:	    Rated 5 of 5.
Level of completeness:	    Rated 5 of 5.
Level of comprehensibility:	    Rated 5 of 5.
Competing interests:	None

Reviewed article

Record: found
Abstract: found
Article: found

Is Open Access

Look Who's Talking : Gender Differences in Academic Job Talks

Amanda Glazer, Hubert Luo, Shivin Devgon … (2023)

The "job talk"is a standard element of faculty recruiting. How audiences treat candidates for faculty positions during job talks could have disparate impact on protected groups, including women. We annotated 156 job talks from five engineering and science departments for 13 categories of questions and comments. All departments were ranked in the top 10 by US News & World Report. We find that differences in the number, nature, and total duration of audience questions and comments are neither material nor statistically significant. For instance, the median difference (by gender) in the duration of questioning ranges from zero to less than two minutes in the five departments. Moreover, in some departments, candidates who were interrupted more often were more likely to be offered a position, challenging the premise that interruptions are necessarily prejudicial. These results are specific to the departments and years covered by the data, but they are broadly consistent with previous research, which found differences of comparable in magnitude. However, those studies concluded that the (small) differences were statistically significant. We present evidence that the nominal statistical significance is an artifact of using inappropriate hypothesis tests. We show that it is possible to calibrate those tests to obtain a proper P-value using randomization.

0 comments Cited 0 times     Rated -3 of 5. – based on 3 reviews

Preprint version 2

Bookmark

Review information

DOI:: 10.14293/S2199-1006.1.SOR-STAT.AARP3V.v1.RKGFFL

License:

This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

ScienceOpen disciplines: Applications,Statistics

Keywords: randomization tests,type III error,job talk,nonparametric,permutation,gender,academia

Review text

The paper makes a valuable contribution to the analysis of diversity data by providing a blueprint for constructing a data set with minimal sources of bias and then analyzing that data set with methods that are easy to interpret and are statistically sound. I comment on these aspects of the paper, and not the conclusions of the study.

The authors took great care in constructing their data set, reducing many ways that analysis might lead one astray. The details are provided in the paper, and are good guidance for other researchers. For instance, questions asked during a seminar had to be labelled according to type. The authors had several raters label each question, to reduce any rater-induced bias in the labels (“two raters reviewed each video and a third rater resolved diﬀerences”). In addition, labels were carefully defined; Table 2 is the outcome of iterations and collaboration and discussion, to make sure that all raters were on the same page. Kudos also to whoever provided the funding for this effort. I’m sure it took a lot of time.

The authors also took great care in analyzing and interpreting the data. They calculate p-values using a randomization scheme, which makes their p-values valid no matter what the underlying distribution of the data. In contrast, p-values calculated in classical parametric statistics methods rely on distributional assumptions that often are not satisfied, and not even approximately satisfied when the sample sizes are small. If distributional assumptions are not satisfied, the p-values are not valid, and thus conclusions are suspect. This randomization method for calculating a p-value can be applied to any test statistic, and the authors have considered several. They have provided excellent details on the importance of this technique. The only criticism one can make is that, sadly, randomization methods don’t have as much power as parametric methods. This may be the reason that the paper’s results were null – no differences.

I am not familiar with all of the literature in this area, so I really can’t comment on that (I was forced to make a rating, and just used quantlity as a metric!).

The paper is well-written, well organized, an interesting read.

One part of the paper – a very small part - was a little disappointing, and stood in contrast to the rest, which was so carefully laid out. This is in section 5.1 “Are interruptions bad?” This is a very interesting question, of course. The authors write “Table 1 shows that the proportion of female pre-tenure faculty in CEE, EECS, and IEOR is higher than the proportion of women in their applicant pools. These departments also spent more time questioning women than men.” I’m not sure what I can take from this. The statement relates past hiring practices with current interviewing practices, which is a questionable way to consider the question “Are interruptions bad?”. It seems that the CEE department data provides a more direct way to answer the question, as we read “In CEE, faculty presenters who received oﬀers generally were asked more questions during their talk than presenters who did not receive oﬀers.” There is no statistical analysis here, which is OK, I guess, since all of this is in the discussion. But I feel that the authors should put some cautionary remarks here about making any conclusions.