Task Description
AgroBench is a dataset designed to measure the professional knowledge of a model acquired during pre-training in the field of agronomy.
Agronomy is in the core of agricultural production, studying various aspects of crop cultivation and developing methods for protecting agriculture from adverse natural factors. Agronomy is linked to the farming efficiency, environmental protection, and sustainable use of land resources.
The dataset is created in Russian and is entirely original. The benchmark consists of 2,935 multiple-choice questions with one or several correct answers. Each question may contain from four to eight options. The questions cover various topics (disciplines), such as botany, forage production and grassland management, reclamation farming, general genetics, general agriculture, basics of breeding, crop production, seed breeding and seed science, farming systems in different agro-landscapes, and technologies for cultivating agricultural crops.
Keywords: Agriculture, Agricultural industry, Farming, Agronomy, Botany, General Agriculture, Crop Production, General Genetics, Fundamentals of Breeding, Seed Production and Seed Science, Forage Production and Meadow Management, Reclamation Agriculture, Technologies for Cultivating Agricultural Crops, Agricultural Systems in Various Agro-Landscapes
Authors: Kuban State Agrarian University
Motivation
This task is one of eight benchmarks in the agriculture set, which is intended to assess professional knowledge in the field of agronomy. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of AgroBench in Russian to assess capabilities of our model on real professional tasks.
Dataset Description
Data Fields
subset
— a string indicating the subject area of a question;answer
— a string containing the letters of the correct answers, separated by commas, ranging from A to H;context
— a list of dictionaries, where each dictionary describes the role and the content within that role;role
— a string defining the role (for example, "system" or "user");content
— a string containing the message (for example, the wording of the test question with answer options for the "user" role, and a string containing instructions for the task and information about the model output format requirements for the "system" role).
Prompts
10 prompts of varying difficulty were created for this task. Example
Example:
"Ниже приведены вопросы с множественным выбором (с ответами) по теме {subset}. Напиши только букву\/буквы ответа."
Dataset Creation
All tasks in this set were written by top agronomists, professionally edited, and then manually double-checked by 3 different experts.
Metrics
Accuracy and Exact Match are used as the evaluation metrics.