The Fengshu Large Model for Wugu Fengdeng: An Innovation Engine for Knowledge Integration in the Soybean Field

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

The Fengshu Large Model for Wugu Fengdeng: An Innovation Engine for Knowledge Integration in the Soybean Field

Authors

Wang, M.; Ling, J.; Qiao, P.; Yang, M.; Liu, D.; He, X.; Hu, Z.; Zhu, R.; Chen, Q.

Abstract

Against the backdrop of global population growth and the continuous escalation of food demand, the acceleration of agricultural modernization has emerged as the core pathway to safeguard food security. As the world\'s fourth-largest food crop, soybean (Glycine max) possesses multiple strategic values for food security, livestock feed, and industrial raw materials, thanks to its high protein content (accounting for over 40% of the dry weight of seeds) and oil resource attributes. However, the soybean industry is confronted with multiple challenges: the long cycle of genetic breeding (8-10 years required by traditional methods), annual losses from diseases and pests reaching 40% (data from FAO 2023), and the reliance on empirical decision-making in field management, all of which urgently call for intelligent solutions. At present, agricultural knowledge is experiencing explosive growth - more than 4,000 new soybean-related literatures are added annually in PubMed, and agricultural technology Q&A platforms (Zhihu, Baidu Tieba, etc.) generate over 3,000 daily questions. However, this multi-source heterogeneous knowledge is scattered in books, literatures, Q&A communities, and gene databases, lacking systematic integration, resulting in a knowledge utilization rate of less than 30%. In response to this, this paper proposes the \"Fengshu-Agri\" large model, an agricultural knowledge integration and innovation engine based on the collaboration of Retrieval-Augmented Generation (RAG) and Knowledge Graph (KG). The model achieves breakthroughs through the following core technological innovations: 1.Deep integration of multi-source heterogeneous data: For the first time, a systematic integration of 256 agricultural monographs (covering standardized knowledge such as planting techniques and pest control), 66,772 cutting-edge research literatures (genomics, agronomic trait analysis, etc.), 880,000 production practice Q&A (covering scenarios such as pest diagnosis and environmental stress response), and 120,000 gene annotation data (functional genes, expression regulation mechanisms) has been carried out to construct the largest knowledge graph in the agricultural field - containing 2 million entity nodes, realizing full-chain knowledge modeling from molecular-level gene regulation to field management. 2.Double-layer collaborative retrieval mechanism: Innovatively integrating RAG semantic vector retrieval with KG structured reasoning to solve problems such as low recall rate (<65%) and semantic ambiguity in traditional systems. Specifically, it achieves dual matching through local keywords (entity-level) and global keywords (relationship-level); 3.Optimization of domain-specific generation engine: Based on the QWEN-QWQ32B large language model, a three-stage fine-tuning is carried out (pre-training, agricultural instruction fine-tuning, and reinforcement learning from human feedback) [15], which improves the professionalism of generated texts by 35% (manual evaluation index) and significantly reduces the LLM hallucination problem [12]. Experimental results show that Fengshu-Agri achieves an accuracy rate of 89.6% in soybean knowledge retrieval tasks (a 25.4% improvement over traditional RAG models), a recall rate of 87.3% (20.5% higher than pure KG reasoning models), and an F1 score of 88.4%. It can efficiently answer complex questions involving multi-entity associations (such as \"the impact mechanism of CRISPR-editing the GmSWEET gene on lepidopteran pest resistance\"). Analysis of typical cases indicates that the model\'s responses not only cover technical principles (sgRNA design, Agrobacterium transformation procedures) but also integrate metabolomics data (70% improvement in insect resistance with no significant yield reduction), demonstrating cross-modal knowledge fusion capabilities. In the future, this model will be expanded to crops such as maize and potato to construct an agricultural intelligent ecosystem covering the entire process of cultivation management and breeding improvement, promoting the upgrading of precision agriculture toward a data-driven model.

Follow Us on

0 comments

Add comment