AI Photo: VCG
Chinese artificial intelligence (AI) firm DeepSeek on Tuesday revealed Native Sparse Attention (NSA), a new mechanism designed to enhance the efficiency of long-context training and inference in AI models.
The move comes as the global AI competition continues to heat up, as Elon Musk's xAI on Tuesday unveiled its latest AI model, Grok 3, claiming it can outperform offerings from OpenAI and DeepSeek based on early testing, according to media reports.
Musk touted Grok 3's capabilities in a videoconference at the World Government Summit in Dubai last week, describing it as "scary smart" and suggesting it will outperform all existing AI solutions, according to a CNBC report.
On Tuesday, DeepSeek uploaded a paper on arXiv, introducing NSA.
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. "We present NSA, a natively trainable sparse attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling," according to an abstract of the paper posted on arXiv's website.
Tian Feng, former dean of Chinese AI software giant SenseTime's Intelligence Industry Research Institute, said that as the global AI competition continues to heat up, different companies have demonstrated competitive edges in different areas.
For example, the resource-efficient, open-source models developed by DeepSeek excel in mathematical reasoning and software engineering tasks, according to Tian, while OpenAI's o1 performs better in general knowledge and problem-solving.
Moreover, Chinese companies have also shown various advantages, including competitive performances and cost-effectiveness. "By leveraging alternative sources of data, developing homegrown technologies, and fostering collaboration within the domestic tech ecosystem, DeepSeek and other Chinese AI companies are able to create solutions that not only meet domestic demand but also enhance competitiveness on a global scale," Tian said.
China is now home to more than one-third of the world's AI large language models, said a whitepaper on the global digital economy released by the China Academy of Information and Communications Technology in July 2024, according to the Xinhua News Agency.