University ranking, a single dimension that can’t summarize million variables

ddd
ddd
ddd

Together with the GIAI research team, we have been running a higher education in AI/Data Science since year 2022. What comes to our attention over the years of running a higher educational institution is that many potential students are largely influenced by the university rankings published by renowned institutions like US News, The Times, Financial Times, and many others. It is true that we were all influenced by the information, but at some point of our graduate study that we all understand that the rankings reflect only the fraction of the school’s capability, and the school ranking matters way less than what we can bring on the table when we look for jobs.

Indeed, many of us experience the school ranking’s bias that can greatly help in resume screening. If you have a good school’s diploma, it is likely you can earn many interviews. Highly ranked universities have paid decades of attention and hardworking to earn that status. But, we all have at least one friend whose ability is a little too distant from the friend’s alma mater’s reputation. Many schools offer easier grading, so GPA also loses credibility by far.

University Ranking, a single variable that cannot summarize the entire picture

The less sophisticated the clients are, the more they are inclined to simple answer. Most of them are happy to see Yes/No type classified answer, and they turn away from a large table with various numbers. In politics, all they want to know is whether their candidates can be elected or not. At best, they look at summarized survey data to measure the probability of wining. Analysts in politics look at gender, age, region as their key variables, with race as an additional variable for racially diverse countries like United States.

In politics, win or lose is all that matters, so I can understand. But to find the winning strategies, I am not sure if that’s enough. Are you not supposed to look into the details of the simplified number, and dig deeper by talking to people?

The same goes with the university ranking. Before the admission, students often have mis-perception that higher ranking guarantees complete success of their life. Evidently the probability is high, if the school is well renowned, but there is no guarantee. If it determines your life, in large portion, are you not supposed to dig deeper, at least as much as professional political analysts?

Ranking is a time varying factor

Blind faith on university rankings by reputed newspapers does not always pay back. For one reason, it is not a time-invariant factor. It changes. Sometimes dramatically.

For example, rankings in business men’s mind are mostly formed when he went to college. Rankings go up and down quite a lot, so when the business (wo)men are in hig/her 40s, the ranking in memory may not be fresh enough. Disproportional changes in race, gender, and backgrounds of students have decisively affected university rankings in decades. The economic situation in neighorbood is also a key indicator. The engineering programs in ‘Rust Belt’ had been ranked high until 1970s and 1980s. But higher wage pushed manufacturers to go abroad, which resulted in less business activities the area, less funding to schools, along with decline in university rankings. Current generation would be startled if they see Rust Belts university rankings in 1970s.

Dimensionalty reduction fails to capture all information

University ranking is one particular example of social science practices with dimensional reduction.

As we teach for PCA(Principle Component Analysis) and all related factor analysis tools in AI/Data Science classes, the ranking is just a simplified information of the school, which is dimensional reduction in mathematical terms.

The reduction is widely accepted practice in the era of ‘Big data’, but we are also aware of the fact that we lose valuable information during the reduction process. For example, PCA first creates a data set’s dimension in terms of (co-)variance. From the PCA’s perspective, only highly (co-)varying data sets matter. The eventual choice of number of principle components precludes all explantory powers of excluded variance dimensions. Setting aside the variance, what if the data set is 0/1, which does not have variance at all?

In the process of data compression, unless there is a clear rule like image formats (JPG, PNG, GIF, WEBP, AVIF, and etc.), it is highly likely one cannot avoid loss of information. For social science data, the loss is often huge. It can potentially create omitted variable bias.

When important variables are omitted, in statistics, researchers may end up with inconsequntial or unrealistic results. As audiences, we see many univerity rankings are wrong, partly because of missing variables.

Ranking contains simultaneity bias

Imagine one student with two contending offers. One from highly ranked, and the other from mediocre ranked university. If all other factors are the same, highly ranked universities would be more competitive in that scope. This factor can reinforce students’ choices, which can result in ‘the rich get richer and the poor get poorer’.

The tendency can be more evident if the ranking is not free from outside influence. Let’s assume an imaginary scenario that one news journal’s ranking gives decisive favor to a university by an unknown connection. The highly ranked university, if the reinforcing behavior is present, will be able to attract more competitive students. The process may continue if the unknown connection is persistent.

Although I am in no position to argue that there does exist the ‘unknown connection’, Asian universities have long been complaining about less appreciation as most rankings give more credit to western hemisphere. Since 2003, China supports the Academic Ranking of World Universities (ARWU), published by Shanghai Jiao Tong University, in an attempt to counterbalance western-oriented university ranking measures. Though the Shanghai Ranking does not have as much popularity as QS World University Rankings, Times Higher Education World University Rankings, US News University Rankings, and Financial Times Rankings, at least it shows that Asian universities are underappreciated.

Is the used data accurate?

Immediately after a newspaper announces university rankings, almost all universities complain that the input data is not accurate. Sometimes the ranking is fixed due to heavy complaints. For US News, admission statistics is a key factor that schools sometimes have incentive to mis-report, as only the self-reported numbers are used. More applications with less admission is usually the key factor for a highly ranked program. Schools have incentive to create bogus number of applicants, just to exaggerate lower acceptance rate.

The aforementioned ARWU relies less on admission but more on academic achievements of faculty. Quality of research is a key variable in ARWU, but the weighting system has been under never-ending discussion because whatever the justification that SJTU provides, there is no way the weight can be scientifically allocated across time.

Business needs brain, not the certificate

From time-series issue to all endogeneity matters, the rankings failed to meet too many basic statistical properties that can be claimed as an accurate measure of a university’s capability. After all, we need rankings just to summarize the school’s ability to foster quality workforce to the society. In other words, businesses need brain, not the certificate.

Accreditational institutions are also well aware of the ultimate goal of the education. In the name of SIAI, the faculty so far have spoken to many accreditation agencies, and most western ones ask us job performance of our graduates.