x

Harbin Institute of Technology Shenzhen team enters the multimodal big model market, and its self-developed "Ruoyu-Jiutian" tops the OpenCompass list

Release time: 2023-08-09

Reprinted from 36Kr, author: Ben@36KR

Ruoyu-Jiutian achieves multimodal fusion of text, images, audio and video

36Kr learned that the team of the Computing and Intelligence Research Institute of Harbin Institute of Technology (Shenzhen) has established a multimodal large model research and development enterprise, Shenzhen Ruoyu Technology Co., Ltd. (hereinafter referred to as "Ruoyu Technology"), relying on the school's Harbin Asset Management Co., Ltd. to transform its achievements. Ruoyu Technology's first multimodal large model "Ruoyu-Jiutian" topped the OpenCompass multimodal large model list in its first participation.

640.webp.jpg

Multimodal large model MMBench test list

01 "Ruoyu-Jiutian"

"12.3 billion parameters", "120 million image-text pairs", "5.5 million Chinese-English bilingual corpus samples", "1.2 million fine-tuning data samples", "500,000 enhanced data samples"... The improvement of core parameters has brought about a qualitative change in model capabilities. The Ruoyu-Jiutian multimodal large model has achieved remarkable performance in logical reasoning, relational reasoning, and perception capabilities. With more than 10 billion parameters, Ruoyu-Jiutian has achieved multimodal fusion of text, images, audio, and video. Its intelligent understanding and response capabilities not only cover fields such as natural language processing, computer vision, and speech recognition, but also more effectively break down the information barriers between modalities, integrating them into "Jiutian".

640.webp (1).jpg

Multimodal large model MMBench dev list

"The Nine Heavens represents the highest heaven in ancient Chinese mythology, and symbolizes our infinite pursuit of technological progress and our yearning for an intelligent future. With its powerful understanding and response capabilities, this model transcends the boundaries of multiple modes such as text, images, audio and video, and achieves true multimodal fusion," said Dr. Sun Teng, CEO of Ruoyu Technology.

02 Establishing a top team for large models

Harbin Institute of Technology Shenzhen Campus has established an asset company to encourage faculty and staff to transform and implement their research results. Harbin Institute of Technology (Shenzhen) has policy support for the implementation of industry-university-research cooperation. When Ruoyu Technology was first established, the school participated as a start-up shareholder, providing strong support for the company's development.

Recently, IEEE Intelligent Systems, a well-known magazine in the field of artificial intelligence, announced the list of "AI's 10 to Watch" in 2022. Professor Nie Liqiang was listed among them for his contributions in the field of multimodality. Professor Nie is the winner of the Damo Academy Young Orange Award and the TR35 China Award. He said that the achievements of HIT-Shenzhen in the field of artificial intelligence cannot only exist in the laboratory, but must be transformed to serve national defense, aerospace, and society.

Another AI expert at Ruoyu Technology is co-founder Professor Zhang Min. Professor Zhang is a specially appointed assistant to the president of Harbin Institute of Technology (Shenzhen), the first outstanding young scholar in the field of NLP in China, one of the national "Million Talents", a young and middle-aged expert with outstanding contributions to the country, and enjoys a special allowance from the State Council. Harbin Institute of Technology ranks first among Chinese research institutions in the field of NLP in the authoritative computer science list CSRankings (2022-2023), and Professor Zhang is the person who has made the greatest contribution to this field at Harbin Institute of Technology.

640.webp (2).jpg

Harbin Institute of Technology ranks first among institutions in mainland China in the field of NLP in CSRankings

640.webp (3).jpg

Teacher Zhang Min ranked first in the academic contribution list

Dr. Sun Teng, co-founder and CEO of Ruoyu Technology, is also a core expert of the company's R&D team. Dr. Sun's research direction has always focused on multimedia computing, and related results have been published in CCF Class A conferences and IEEE/ACM Trans. Dr. Sun has previously had successful entrepreneurial experience and has full-process experience and company management experience in the application of artificial intelligence technology in vertical fields. Geng Chen, another co-founder of Ruoyu Technology, serves as the company's strategic advisor. He has been named the best technology analyst by New Fortune many times and has accumulated rich industry resources in his many years of research career. He is responsible for the company's investment and financing and the docking and landing of industry resources.

03 Core Competencies of Ruoyu Technology

"Ruoyu Technology was established at this point in time with its historical mission and ideals. As cutting-edge R&D personnel, we can deeply feel the changes that artificial intelligence will bring to the future society. The productivity explosion brought about by generative artificial intelligence will redefine the production relations in all walks of life. It is our honor and mission to have the opportunity to participate in it."

Computing power, data and talent are the three major barriers to entry for big models. Ruoyu Technology has gathered these core elements since its inception. The endogenous R&D team that cultivates leading talents has formed independent iteration capabilities. In the future, "Ruoyu-Jiutian" will continue to iterate under the leadership of technical experts.

With its top entrepreneurial team, core capabilities of self-developed multimodal large models, and successful implementation experience, Ruoyu Technology says it will bring a touch of brilliance to the "Battle of 100 Models".

04 Build a universal AI large model foundation

It has become an industry consensus to reshape each track based on large model capabilities. According to OpenAI's development path, when the model is large enough, new capabilities will emerge, especially some capabilities that have never been seen before.

Ruoyu-Jiutian will continue to iterate in the future. Dr. Sun Teng said: "Ruoyu-Jiutian is still iterating in two opposite directions: bigger and smaller. On the one hand, it is increasing the magnitude of parameters and exploring nodes that support the emergence of general multi-modal large models; on the other hand, to meet the application needs of industry users and achieve the greatest effect with the least computing power, what must be done is to lightweight compress large models and finally combine them with edge computing devices."

Based on the multimodal big model base of "Ruoyu-Jiutian", Ruoyu's business model is fundamentally different from the AI 1.0 era. In the past, the business model had to re-develop algorithms for each demand, which was a complete project-based system. "Ruoyu-Jiutian" is a unified multimodal big model foundation. It does not need to redesign the base. It only needs to be fine-tuned according to different data in the industry to get the corresponding industry model. Customers can even use data to make secondary fine-tuning according to the needs of the segmented field.

The difficulty of multimodal large models lies in the fusion of multimodal information. Common fusion methods include relatively crude means such as linear superposition and cascading, but the final effect is often not as good as the performance of a single modality. This is because some technical teams lack the experience and ability to adjust multimodal data, and fusion and alignment of multimodal features. Ruoyu-Jiutian has a self-developed full-chain model training framework for multimodal feature extraction, alignment, fusion, and reasoning, as well as a comprehensive and detailed multimodal data collection and cleaning process. The model topped the multimodal large model list, proving the team's leading strength in multimodal large models.

Robots are system-level application products in the industrial field and are the key landing direction of the "Ruoyu-Jiutian" multimodal large model base. Harbin Institute of Technology currently has a deep accumulation of industry-university-research cooperation in the field of robotics. In the future, embodied robots will need to integrate multimodal information such as voice, vision, decision-making, and control to form a closed loop. The "Ruoyu-Jiutian" multimodal large model base will conduct further research integration based on Harbin Institute of Technology's accumulated research on robots, and has currently carried out in-depth cooperation with many large listed companies in the consumer electronics/automotive fields.

With the "Ruoyu-Jiutian" multimodal large model base, Ruoyu Technology has the ability to fine-tune the existing multimodal large model base to provide personalized and customized services to users in different fields, and provide language pre-trained large models, multimodal pre-trained large models, vertical field pre-trained large models and other capabilities, and is committed to building the future AI general platform and infrastructure.

 


Contact Us

business@ruoyutech.com

Address:Room 903, Block A, Zhongguan Times Square, Nanshan District, Shenzhen, Guangdong, China

Copyright@ Ruoyu Technology Powered by EyouCms   粤ICP备2023060245号-2  粤公网安备44030902003927号