On 22 April, Baidu announced that its ERNIE Bot 4.0 has topped the Chinese list in Tsinghua University’s latest assessment and ranking of large language models (LLMs). However, it still lags behind international competitors such as OpenAI’s GPT-4 and Anthropic’s Claude-3 models.
The assessment is based on a framework called SuperBench, developed by the Basic Model Research Centre of Tsinghua University in collaboration with the Zhongguancun Laboratory. 14 LLMs representative of the current landscape were tested and ranked on multiple capabilities. According to the South China Morning Post, overseas LLMs like GPT-4 and Claude-3 rank higher on multiple fronts, “including semantic comprehension, coding abilities and alignment with human commands.”
Apart from topping the list among Chinese LLMs, ERNIE Bot performs well in certain capabilities such as Chinese language comprehension, and leads the board with 0.41-points over the second place entrant, Zhipu AI’s GLM-4. In mathematical capabilities and semantic comprehension, ERNIE 4.0 ties with Claude-3 as number 1 while the GPT-4 models rank number 4 and 5 respectively. Notably, in human command alignment, ERNIE 4 ranked number 2, only 0.03 points behind GPT-4. ERNIE 4 also tops the safety and security capability, leading the pack with 89.1, 1 point above GPT-4 Turbo in second place.
Baidu boasts over 200 million users of ERNIE since its launch last March, with over 200 million daily application programming interface (API) uses. Although the gap still exists between Chinese and international LLMs, the Tsinghua study shows that it has been narrowing. With the ERNIE Bot and GLM-4 leading the charge, Alibaba’s Tongyi Qianwen 2.1 and Kimi chatbot by the start-up Moonshot AI ranked equally high on the list. Zhipu AI has raised over 2.5 billion RMB (345 million USD) while Beijing-based Moonshot AI also raised 1 billion USD this February. It is an exciting time as the AI landscape is quickly shifting in China and the world.