MERA is a new open independent benchmark for the evaluation of fundamental models for the Russian language.

Task Description

AquaBench is a dataset designed to assess the professional knowledge of a model during pre-training in the field of Aquaculture. Aquaculture is an important part of industrial agriculture, which is focused on aquatic breeding (fish, crustaceans, mollusks, algae). Aquacultural enterprises produce a valuable source of protein and help to preserve endangered species, such as sturgeon and salmon, by releasing fry into water bodies. It is strategically important to develop aquaculture for national food security and cultivate various aquatic species that cannot be harvested in the wild.

The dataset is created in Russian and is entirely original. It contains 1102 multiple-choice questions. Each question has from four to eight options, and one or several answers are correct. The topics cover several areas, such as industrial aquaculture, feeding of fish and aquatic organisms, mariculture (e.g. crayfish and shrimp breeding, pearl cultivation), as well as ichthyopathology (veterinary science, prevention and optimization of fish cultivation technologies).

Keywords: Agriculture, Agricultural Industry, Fishery, Industrial Aquaculture, Feeding of Fish and Other Aquatic Organisms, Mariculture, Crayfish and Shrimp Farming, Artificial Pearl Cultivation, Ichthyopathology

Authors: Kuban State Agrarian University

Motivation

This task is one of eight benchmarks in the agriculture set, which is intended to assess professional knowledge in the field of aquaculture. It resembles the well-known MMLU test in its structure and purpose, and is suitable for comprehensive testing of language models for the professional quality of understanding and responses. We provide a public MMLU test version of AquaBench in Russian to assess capabilities of our model on real professional tasks.

Dataset Description

Data Fields

subset — a string indicating the subject area of a question;
answer — a string containing the letters of the correct answers, separated by commas, ranging from A to H;
context — a list of dictionaries, where each dictionary describes the role and the content within that role;
role — a string defining the role (for example, "system" or "user");
content — a string containing the message (for example, the wording of the test question with answer options for the "user" role, and a string containing instructions for the task and information about the model output format requirements for the "system" role).

Prompts

10 prompts of varying difficulty were created for this task. Example

Example:

"Ниже приведены вопросы с множественным выбором (с ответами) по теме {subset}. Напиши только букву\/буквы ответа."

Dataset Creation

All tasks in this set were written by top aquaculturists, professionally edited, and then manually double-checked by 3 different experts.

Metrics

Accuracy and Exact Match are used as the evaluation metrics.

ruTXTAquaBench