Photo: VCG
While OpenAI has been continuously iterating and gaining widespread attention in the past few months, Google finally dropped a bombshell by unveiling its long-awaited multimodal model, Gemini, which Google itself touted as "the most capable model" they've ever built.
Chinese analysts say it can help Google catch up with OpenAI in the generative AI race, which also teaches a lesson to Chinese AI developers that latecomers can narrow the gap through non-stop effort.
Considered as the biggest competitor to ChatGPT in the market, Gemini encompasses three AI models - Gemini Nano, Gemini Pro and Gemini Ultra - each serving different purposes.
Gemini Ultra's performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks in large language model (LLM) research and development, ranging from natural image, audio and video comprehension to mathematical reasoning, Demis Hassabis, the co-founder and CEO of DeepMind, wrote in a Google blog on Wednesday.
Once released, worldwide news media splashed the story all over. Some Chinese media wrote the headline that "AI industry ushers in further catalysis with Gemini release." Zhong Junhao, secretary-general of the Shanghai Artificial Intelligence Industry Association, told Chinese media that Gemini will deal "a disruptive blow to OpenAI" as Google has created a mature AI development ecology, citing its vast range of deployment in hardware devices, search engines and chips as well as the new-generation large model.
According to Hassabis, Gemini better understands nuanced information and can answer questions relating to complicated topics, like explaining reasoning in complex subjects including math and physics. It is also capable of understanding, explaining and generating high-quality code in the world's most popular programming languages, like Python and Java.
Gemini Nano is a lightweight version designed for offline use on Android devices. On the other hand, Gemini Pro is a more powerful model that will be utilized in various Google AI services and now Bard makes use of Gemini Pro. Gemini Ultra is scheduled for release next year.
Starting from December 13, developers and enterprise customers can access Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud. Although currently available only in English, Google plans to expand Gemini to support other languages in the near future. Ultimately, Google aims to integrate Gemini into its search engine, ad products, Chrome browser and more on a global scale, media reports said.
Google is still trying to catch up in the field of LLMs. OpenAI has a significant advantage in this AI race due to its large user base, and user feedback will only accelerate its development, Xiao Yanghua, a computer science professor at Fudan University, told the Global Times.
The launch of Gemini is a crucial step for Google to catch up for sure, as it is advanced enough to achieve the level of GPT-4. However, surpassing OpenAI's GPT family may prove challenging. Xiao mentioned that the industry needs to wait and see as GPT-5 is officially on the OpenAI roadmap.
According to the comparison results released in Google's Wednesday blog, with OpenAI's current strongest LLM, GPT-4, the results indicate that Gemini Ultra outperforms GPT-4 in areas such as reasoning, mathematics and coding, with the exception of text processing, where GPT-4 scored 86.4 percent compared to Gemini Ultra's 90 percent in MMLU (massive multitask language understanding).
The claimed improvements over GPT-4 are in their own chosen tasks and most of the improvements are only different by a few percentage points, while in audio processing, Gemini is even weaker than GPT-4, Xiao noted.
This scenario of Google making comparisons to GPT-4 is similar to Chinese AI developers claiming their large models can also approach or exceed GPT-3.5 on certain metrics. In July, Chinese tech giant Baidu announced that a new version of its language model, Ernie Bot, also known as Wenxin YiYan 3.5, had surpassed ChatGPT-3.5 and on October 17, Wenxin YiYan 4.0 was released, once again, Baidu claimed it had an integrated capability equal to GPT-4, media reports said.
Latecomers can narrow the gap through active efforts, which could be the biggest revelation Chinese AI enterprises take away from Google releasing Gemini, Xiao said, recommending that domestic large model companies be firm in faith and strive to catch up.
As of October, China has more than 254 large-scale model manufacturers and universities with more than 1 billion parameters. These entities are spread across more than 20 provinces and regions, according to a latest white paper on the AI industry released by the Beijing Science and Technology Commission, Zhongguancun Science and Technology Park Management Committee.
AI focuses on attaining a sophisticated level of capability. Merely surpassing individual indicators does not necessarily mean declaring an absolute advantage for the opponent. Nevertheless, industry analysts noted that China's delay in the overall progress of LLMs can be compensated for by a competitive edge in new areas, such as models in more segmented industries, different scenario-based models and the seamless integration of language models with robots.
Liu Wei, director of the human-machine interaction and cognitive engineering laboratory with the Beijing University of Posts and Telecommunications, told the Global Times that "there is no way out if Chinese AI developers keep acting as followers of ChatGPT or Gemini. They must strive to promote basic and original technological breakthroughs such as large-model algorithms and frameworks so as to create product differentiation."