Scientific sexism: the gender bias in the scientific production of the Universidade de São Paulo

ABSTRACT OBJECTIVE To investigate gender inequity in the scientific production of the University of Sao Paulo. METHODS Members of the University of Sao Paulo faculty are the study population. The Web of Science repository was the source of the publication metrics. We selected the measures: total publications and citations, average of citations per year and item, H-index, and history of citations between 1950 and 2019. We used the name of the faculty member as a proxy to the gender identity. We use descriptive statistics to characterize the metrics. We evaluated the scissors effect by selecting faculty members with a high H-index. The historical series of citations was projected until 2100. We carry out analyses for the general population and working time subgroups: less than 10 years, 10 to 20 years, and 20 years or more. RESULTS Of the 8,325 faculty members, we included 3,067 (36.8%). Among those included, 1,893 (61.7%) were male and 1,174 (38.28%) female. The male gender presented higher values in the publication metrics (average of articles: M = 67.0 versus F = 49.7; average of citations/year: M = 53.9 versus F = 35.9), and H-index (M = 14.5 versus F = 12.4). Among the 100 individuals with the highest H-index (≥ 37), 83% are male. The male curve grows faster in the historical series of citations, opening a difference between the groups whose separation is confirmed by the projection. DISCUSSION Scientific production at the Universidade de São Paulo is subject to a gender bias. Two-thirds of the faculty are male, and hiring over the past few decades perpetuates this pattern. The large majority of high impact faculty members are male. CONCLUSION Our analysis suggests that the Universidade de São Paulo will not overcome gender inequality in scientific production without substantive affirmative action. Development does not happen by chance but through choices that are affirmative, decisive, and long-term oriented.


INTRODUCTION
"Achieving gender equality and empowering all women and girls" is the 5 th Sustainable Development Goal 1 . Gender inequality is the result of centuries of female oppression and the devaluation of women and has been perpetuated to the present day. The cost of this inequality is high: the loss of human capital due to less access by girls to education is estimated to reach up to 160 trillion dollars 2 . Those who manage to reach some level of education and enter the labor market suffer from significant wage inequality. Women and men who do the same task at the same time have different salary values 2 . In addition to earning less at work, they are primarily responsible for caring for the family and household chores 3 .
Within the scientific environment, the scenario is not much better. Phenomena that perpetuate gender inequality are well known. We can cite the "Matilda effect" 4 , a systematic pattern of ignoring, not recognizing, or hiding female scientists. In the labor market, diversity, not only of gender, but ethnicity, has also proved to be a good asset for survival and innovation 5 . In science, diversity seems even more critical, especially when knowledge is fragmented and scientists are increasingly specialized.
Universities tend to be environments of innovation, constituting different axes for research and sources of new knowledge 6 . Research at universities enables the development of new technologies and products and makes it possible to develop solutions to problems of social value. The mission of developing innovative solutions to problems of social relevance, which may not be relevant to the market, is vital for universities, especially public ones 6 . Brazilian public universities are part of the network of social facilities and contribute to reducing inequalities in our society. However, like other organizations, they can be permeable to structural bias and social determinants that make the Brazilian society one of the most unequal in the world 6 .
The Universidade de São Paulo (USP) is the largest Brazilian university and the one that most contributes to the country's scientific production, reaching the best positions in national and international classifications 7 . However, even if actions are developed to promote gender equality at USP -the university is the only Latin American representative to be part of the United Nations (UN) HeForShe program -, the effects of structural sexism still seem to be present. Women account for approximately half of its students and 41% of its faculty. However, just over a quarter of leadership or top academic career positions are occupied by them 8 . Considering that scientific production is one of the determinants of progression in the academic career and consequently of institutional leadership, the present study investigates gender inequalities in the USP scientific production.

METHODS
This is a descriptive-analytical study whose objective is to evaluate metrics of scientific publication by USP professors according to gender.

DataUSP
USP is a public institution and follows data publishing rules, such as spending on salaries, equipment, and infrastructure. In addition to these data, the university has other initiatives to provide academic and administrative indicators. DataUSP is the integrated repository of this data, where it is possible to view and extract decision-supportive information 9 .

Web of Science (WoS)
Since 2012, one of USP's services is access to professors' citation profiles in three repositories of publication metrics: Web of Science, Google Scholar, and Scopus. Access to the Scopus system's application programming interfaces (API) is restricted, limiting access to information in an automated way. For this reason, they were not used in the present study. Google Scholar has important limitations on the quality and accuracy of the information available on the platform; thus, it was also excluded 10 . Therefore, the WoS repository was chosen as a data source for the publication metrics, collected via its Publons system.

Study Population
USP professors registered in DataUSP, with data available on the Publons platform and names that allow allocation in the "male" and "female" categories were included in the study population. Professors whose data were inconsistent between platforms, names that did not allow gender allocation, or had no data available on the Publons platform were excluded from the analysis.

Variables of Interest
The scientific production indicators used were: total publications, total citations, average of citations per year and item, H-index, and historical series of citations between 1950 and 2019.
Data on time of service at USP, salaries and other benefits received in 2019 were also collected 11 and integrated into the database. As the time of service database does not contain USP's unique identification number, the full name of the professors was used as a variable for integrating the databases.
The gender variable was derived from the individuals' names a , which is a relevant factor for gender self-determination. Thus, a public and open database of names 12 was used that relies on the frequency of gender in several Brazilian names, generated from the 2010 Census data.

Data Collection and Study Procedures
This study's database was built from public data available on web pages, automatically extracted from API calls. All scripts used for the collection were developed in the Python language, and Figure 1 shows the step-by-step and the URLs of the APIs used. From the USPdigital 13 page, professors' names and their identifiers in the WoS were collected. A translation step was necessary from the identifiers to build the call, providing the file with interest data. In the translation stage, to ensure that the data extracted from DataUSP were consistent with the profiles found in the WoS, the profile name was also collected. The profile name and the professor's name were compared, and those not compatible were excluded. All metrics provided by WoS were collected for each of the professors. The collection algorithm was executed on 20 Jun 2020. Although all the data used were public, once the study database was constituted, it was anonymized to protect the faculty's privacy and make it challenging to connect the analyzed data to the individual.

Statistical Methods
Descriptive statistics were used for each of the metrics to assess differences in the number of publications by gender. Mann-Whitney test was used, with a significance level ≤ 0.05, to verify whether the distributions have the same pattern. Individuals with a higher H-index were selected to assess whether there is a scissors effect (reduction of female presence according to career progression). According to their respective H-indexes, the individuals were ordered, and the hundredth individual index was defined as a cut-off point. All individuals with an H-index greater than or equal to the cut-off point were selected for this analysis. The hundredth highest H-index was selected in each group (male and female) to check the distance between male and female individuals. The analyses presented above were then replicated. The faculty were categorized into H-index ranges. The gender ratio in each range was calculated with 95% confidence intervals to assess each gender's proportion according to the H-index.
USP units were re-categorized into humanities and social sciences (e.g. history and sociology), natural and applied sciences (e.g. biology and medicine), formal sciences and related applied fields (e.g. mathematics and engineering) areas to compare the gender a The authors recognize the limited way in which gender is used in this work, understanding that gender is not a binary definition and that this approach can be considered reductive. With this clarification, the terminology is used since the variable "name" is more related to self-perception and selfdetermination of gender than biological sex. distribution by area of knowledge. We calculated the gender distribution in the general population and the population with the highest H-indexes for each area. The historical series of citations was drawn from the average number of citations by gender. Also, the projection until the year 2100 was calculated by polynomial linear regression of 3 th degree. All analyses were performed using the general population and three subgroups according to the time of service at USP: less than 10 years, 10 to 20 years, and 20 years or more of service.
The analyses were performed by LO-C, using the Python language and the IDE (Integrated Development Environment) Spyder for both the statistical evaluation and the creation of the graphs. AMS performed independent verification of the analysis. The code used in this analysis is published on GitHub 14 As this work uses open data, with unrestricted access and made available by the institutions themselves, approval by the ethics committee was not required 15 .

RESULTS
The data of 8,325 academics from 219 departments of 52 teaching units were included in the survey. A total of 3,783 individuals were registered on the WoS Publons platform (45.44%), of which 3,205 made their data publicly available. A total of 3,067 Individuals (36.8%) were classified as "male" or "female", according to their names, with 1,893 male (22.74%) and 1,174 female (14.1%). Figure 1 shows the study flowchart, including the subgroups by the time of service. Table 1 shows the scientific production indicators of USP faculty according to gender. Of the total of 3,067 records, 61.72% were classified as male and 38.28% as female. All indicators of scientific production are higher in the male population. This pattern is repeated in the subgroups, regardless of the time of service. Table 2 shows the scientific production indicators among USP faculty with the highest H-index. In the general population, the cut-off point for the H-index (hundredth highest H-index) was 37 and included 112 individuals. In the subgroup with 20 or more years of service at the university, the cut-off point for the H-index was 32; between 10 and 20 years, the index was 25; in the group under 10 years, the index was equal to 20. Women make up the smallest part of the stratum of individuals with the highest indices, representing only 16.96% of academics with an H-index ≥ 37, 29.25% of academics with H-index ≥ 32 and length of service of 20 years or more and with H-index ≥ 25, 28.18% between 10 and 20 years and 22.86% less than 10 years. In the group with less than 20 years of work at USP, the H-index presents a statistical difference, reaching 20 points of difference between the highest values of each gender.
Other metrics with a statistical difference in this group are the number of publications, total citations, and average of citations per year. In the group between 10 and 20 years, the number of citations also shows a statistical difference. The group with less than 10 years of work, on the other hand, has a statistical difference in all metrics, except the average number of citations per item. Table 3 shows the ratio between genders. Regardless of the assessed H-index range or time of service at USP, there is a male predominance, with a significant confidence interval. Also, the biggest differences are in the higher index ranges, regardless of the time of service at USP. The ratio of the number of men to women tends to be higher with a higher H factor. The greater inequality among all groups is observed among individuals with the highest H factor with less than 10 years of service at the university. There are 23 male and no female individuals.  Figure 2 shows an exponential growth of the average of citations curve, with the male curve growing at a higher speed than the female curve, widening the difference between the groups over the years. The projection of these data by univariate linear regression suggests a divergent trend in the number of citations between the male and female genders. The comparative analysis of the faculty salary by gender in 2019 showed no differences (supplementary material).

DISCUSSION
Our findings indicate that scientific production at USP is subject to gender bias. Among USP faculty who have a valid record on the most prestigious international scientific platform, the Web of Science/Publons, only a third are female. This pattern is maintained among individuals hired in the last two decades and even among those hired less than ten years ago. In general, male academics achieve more expressive scientific production metrics than female ones. The absolute majority (83.04%) of the group of individuals with high-performance scientific production, i.e., those with the highest H-indexes, is male. Our analyses also suggest that the differences in productivity between genders are not narrowing: the projection of the current trend for the coming decades indicates that the effect of gender bias on USP's scientific production will not be overcome in the near future.
Scientific thinking has excluded and removed women since the beginning 16 . Science has historically been defined by a patriarchal, male, white, western, and financially privileged model, where men attribute reason to themselves and emotion to women. With this postulate, the ability to do science has been removed from women and attributed to men, allegedly "endowed with reason" 17 . Over time, many women who have challenged this paradigm have been ignored, minimized, and sometimes misused by their male counterparts 10 , the so-called "Matilda effect" 4 . Currently, women remain underrepresented within the scientific workforce 18 .
In Brazil, female education was neglected for 450 years, and it was only in the twentieth century that the movement to reduce the gender gap in this area started 19 . In higher education, graduates who declare themselves cisgender women are already the majority, representing 48.1% against 40.1% of graduates who declare themselves cisgender men 20 . However, even today, we observe that, as the career progresses, the proportion of women decreases and that of men increases, in a process known as the "scissor effect" 21 . Furthermore, the situation is not just numerical. The proportion of women in leadership and decisionmaking spaces in science is also invariably lower compared to men 22 . Our study shows that the scissors effect is perceived both in areas dominated by women 23 and in areas with unfavorable base 18,24 . The analysis of our data confirms that the gender distribution of the 100 professors with the highest publication metrics at the university is more favorable to men, regardless of the distribution of the selected area of knowledge.
Scientific publications result from a work process beginning with the research proposal, which depends on funding to be feasible. In addition, it requires infrastructure, institutional support, and human resources. In the first stages, women are overlooked to receive the funding that makes the research feasible 25 and work on the project 26 . Thus, since the beginning of their careers, women receive less investment and institutional support 27 , making it difficult to carry out relevant projects and, consequently, impact publications. As a parameter to progress in the academic career, the smaller number of scientific publications becomes a barrier to the progression of women in the scientific career. Our data comparing populations by the length of service at university suggests that women's lowest number of publications begins in the first decade of their careers. There is no recovery in the following decades, contributing to enhancing the difference between genders. The sticky floor delays the progression of the woman's career and can be the first step towards perpetuating gender bias when analyzing the number of publications. Corroborating this idea, a study showed that the evaluation of research projects focused only on the "quality of the proposal" presents no difference between genders, but that women lose points significantly in the evaluation of the "quality of the researcher" 28 ; when removing the name of the authors, the number of articles accepted with female authors increases 29 . At the other end, women in advanced stages of their careers find it difficult to progress further, even if they have the same or even greater scientific production than their male counterparts (glass ceiling phenomenon) 22,30 .
Doing science is a social activity, which requires a network of contacts and collaboration between scientists. Some activities external to the institutions in which individuals are inserted can assist in forming these bonds. However, the responsibilities associated with the female gender often hinder or prevent women from participating in these networks, which are associated with better bibliometric results for both genders 31 and allow interactions that result in greater visibility of the participants 32 , including facilitating invitations to scientific projects 33 . Within USP itself, the need to ensure women's visibility has been discussed 34 , since the male gender was identified as a factor in selecting the press itself to choose scientific dissemination 35 . In conclusion, how scientists are treated varies according to gender and may unconsciously reaffirm the position of women as exogenous to the scientific body 36 .
Currently, there is the impression that the scenario has been changing since there is greater visibility for the problem. It has been increasingly discussed, with the proviso that the delay in the solution would be caused by problems in the structure of the university. However, two new problems must be avoided: aversion to the movement for change and the false sense of change, based on perceptions and anecdotal evidence. This perception of change can lead decision-makers to underestimate gender bias, accentuating the imbalance 37 , due to the "equality paradox" effect. Our data show that gender and publication metrics distribution has not changed in the last 20 years among professors at USP.
One of the strengths of this research was the use of an automated data collection script. This method enabled us to gather the publication data of all faculty at USP with the available identifier, collecting thousands of records for analysis. Also, when compared to the manual collection, an automatic system eliminates errors or, if it generates systematic errors, these would be present in all groups analyzed. Another strength is the data source in the university's database, which can be considered high quality and adequate coverage of the universe analyzed. In addition to this source, bibliometrics originates from a high-quality index curated by a team of editors. The Web of Science (WoS) platform is one of the largest citation databases, with 1.7 billion citation references in more than 159 million files and 254 areas of knowledge 38 . The platform maintains strict rules to select articles for its indexes, in addition to a team of specialists who curate these items. An essential factor for the Brazilian context is the use of SciELO as a regional database integrated with WoS. As several USP research projects have a local focus and impact, ensuring that these publications are accounted for in the analysis is essential.
Among the weaknesses is the impossibility of collecting other sociodemographic information from the faculty. The analysis of gender dissociated from other social determinants is known for not explaining the whole phenomenon of inequality. It occurs especially in the ethnic-racial issue, considering that the socioeconomic conditions of black and brown women, when compared to those of white women, are worse and, consequently, the former struggle with higher levels of inequality 39 . The lack of data from individuals who did not provide their WoS identifier is another limitation. In addition, the use of the name to define gender can be criticized. However, a comprehensive database of names was used, containing the frequency of gender and consequently the Brazilian standard of denominations. While we understand that the choice of WoS as a source generates a bias in the data, it presents a higher data quality as it is a manually validated basis, despite less coverage. Finally, we recognize that scientific production is not limited to publications and citations, but these measures have a significant impact on the career development of professors.
The analysis implications of this research's results include the need to expand studies on the expression mechanisms of sexism in the university environment and develop solutions to combat it. Our findings' possible practical implications include discussing the implementation of systemic interventions of an affirmative and countercyclical nature, such as research grants dedicated to female researchers. The goal is to ensure minimal and adequate proportions of female representation in teaching vacancies and ensure the distribution of incentives research, particularly for women at the highest levels of their careers.
Although the Universidade de São Paulo has been developing actions to achieve gender equality, our data shows that inequalities persist and will hardly be overcome without substantive affirmative action. Adopting effective systemic actions is fundamental for the achievement of gender equality in this generation of researchers. Development does not happen by chance but through choices that are affirmative, decisive, and long-term oriented.