AI Alliance launches MERA Industrial: a new standard for evaluating industry LLMs to solve business problems
The Alliance for AI has announced the launch of a new section of MERA, “MERA Industrial,” a unique benchmark for evaluating specialized large language models (LLMs) across industries. Medical and agricultural benchmarks are already available on the platform to help companies and experts select and implement LLMs that best fit their business objectives.
There are currently three challenges posted on the site, two of which are for agriculture:
AgroBench: a dataset designed to measure the model's professional knowledge acquired during the agronomy pre-apprenticeship. 2935 original agronomy questions covering botany, forage and grassland production, reclamation agriculture, general genetics, general farming, basics of breeding, plant breeding, seed production and seed science, farming systems in different agro-landscapes, and crop technology.
AquaBench: a dataset designed to measure the model's professional knowledge acquired during aquaculture pre-study. 1102 aquaculture assignments including industrial aquaculture, fish and hydrobiont feeding, mariculture (e.g. crayfish farming, shrimp farming, pearl farming), and ichthyopathology (veterinary medicine, prevention and optimization of fish farming technologies).
Datasets are completely original and compiled in Russian.
One task on medicine, which covers 17 fundamental disciplines - from cell biology to clinical practices (surgery, therapy, laboratory diagnostics, pharmacology). The test includes 270 questions and 30 practice problems for each discipline, allowing the LLM to approach the level of a medical school graduate.
The MERA Industrial benchmark was created with the support of the academic community, in particular, the Skolkovo Institute of Science and Technology, Kuban State Agrarian University, Russian Academy of National Economy and Public Administration, Nizhny Novgorod State University of Architecture and Civil Engineering and others took part in the project. Leading experts carefully formulate assignments to ensure:
•Reliability of information based on confirmed sources
•Full coverage of the industry taxonomy
•Variety of complexity and types of tasks (from academic to practical cases)
•Validation of all key modeling skills
•Originality of wording and absence of internet borrowings
MERA Industrial is not only a tool for evaluating LLMs, but also a platform for formulating new tasks and cases, validating tasks, using ready-made benchmarks for selecting and implementing LLMs in business processes.
With the rapid development of AI and LLM, industry benchmarks are becoming a key tool for objective evaluation and implementation of models that can solve real business challenges - from optimizing production processes to decision support and improving customer service.
MERA Industrial is the new standard for transparency, credibility and efficiency in artificial intelligence selection for industries where the cost of error is particularly high.
The MERA benchmark, created with the participation of teams from Sberbank, MTS AI, Skoltech AI and the National Research University Higher School of Economics, was presented at the AI Journey international conference in 2023. Subsequently, the test methodology was also presented at ACL, the leading academic conference on computational linguistics, which has been held since 1963 and is supported by major IT companies from around the world, including Apple, Google Deep Mind, Baidu, IBM and others. Last year, the benchmark for Russian-speaking LLMs became even better: new datasets, support for APIs and features of SFT models, and an updated leaderboard with a convenient system for filtering results were added.