MERA is a new open independent benchmark for the evaluation of fundamental models for the Russian language.

Task Description

AgroBench is a dataset designed to measure the professional knowledge of a model acquired during pre-training in the field of agronomy.

Agronomy is in the core of agricultural production, studying various aspects of crop cultivation and developing methods for protecting agriculture from adverse natural factors. Agronomy is linked to the farming efficiency, environmental protection, and sustainable use of land resources.

The dataset is created in Russian and is entirely original. The benchmark consists of 2,935 multiple-choice questions with one or several correct answers. Each question may contain from four to eight options. The questions cover various topics (disciplines), such as botany, forage production and grassland management, reclamation farming, general genetics, general agriculture, basics of breeding, crop production, seed breeding and seed science, farming systems in different agro-landscapes, and technologies for cultivating agricultural crops.

Keywords: Agriculture, Agricultural industry, Farming, Agronomy, Botany, General Agriculture, Crop Production, General Genetics, Fundamentals of Breeding, Seed Production and Seed Science, Forage Production and Meadow Management, Reclamation Agriculture, Technologies for Cultivating Agricultural Crops, Agricultural Systems in Various Agro-Landscapes

Authors: Kuban State Agrarian University

Motivation

This task is one of eight benchmarks in the agriculture set, which is intended to assess professional knowledge in the field of agronomy. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of AgroBench in Russian to assess capabilities of our model on real professional tasks.

Dataset Description

Data Fields

subset — a string indicating the subject area of a question;
answer — a string containing the letters of the correct answers, separated by commas, ranging from A to H;
context — a list of dictionaries, where each dictionary describes the role and the content within that role;
role — a string defining the role (for example, "system" or "user");
content — a string containing the message (for example, the wording of the test question with answer options for the "user" role, and a string containing instructions for the task and information about the model output format requirements for the "system" role).

Prompts

10 prompts of varying difficulty were created for this task. Example

Example:

"Ниже приведены вопросы с множественным выбором (с ответами) по теме {subset}. Напиши только букву\/буквы ответа."

Dataset Creation

All tasks in this set were written by top agronomists, professionally edited, and then manually double-checked by 3 different experts.

Metrics

Accuracy and Exact Match are used as the evaluation metrics.

ruTXTAgroBench