Large language model-based evolutionary optimizer: Reasoning with elitism

Abstract

Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Large Language-Model-Based Evolutionary Optimizer (LEO). Our hypothesis is supported through numerical examples, spanning benchmark and industrial engineering problems such as supersonic nozzle shape optimization, heat transfer, and windfarm layout optimization. We compare our method to several gradient-based and gradient-free optimization approaches. While LLMs yield comparable results to state-of-the-art methods, their imaginative nature and propensity to hallucinate demand careful handling. We provide practical guidelines for obtaining reliable answers from LLMs and discuss method limitations and potential research directions.

Introduction

The advent of Large Language Models (LLMs) has sparked a revolution in generative Artificial Intelligence (AI) research [1], [2], [3]. Since the introduction of transformer model [4], the generative AI research has seen a surge of activity and every subsequent generation of LLM models have become exceedingly more capable than the previous ones. For example, the first decoder-only model developed by OpenAI in 2018 was called Generative Pre-Trained transformers (GPT) based on the transformer architecture, was capable of processing textual data alone [2], while OpenAI’s latest GPT-4 model released in 2023 is multi-modal, i.e., capable of dealing with natural language, code, as well as images [5]. Several studies since then have shown that Large Language Models (LLMs), such as GPT-4, possess strong reasoning ability [6], [7]. Studies have also shown that a LLM’s performance can be further improved by techniques such as in-context learning [8], chain-of-thought prompting [9], and tree-of-thought prompting [10], [11].

Some examples that highlight the generalization capability of LLM models are: (a) Gato [12], a generalist multi-modal agent based on a LLM capable of performing several tasks. (b) Eureka [13], a human-level reward design algorithm using LLMs, is a gradient-free in-context learning approach to reinforcement learning for robotic applications. (c) Voyager [14], a LLM-powered AI agent, has shown the ability to conduct autonomous exploration, skill acquisition, and discovery in an open-ended Minecraft world without human intervention. To test the reasoning and generalization ability of new generative AI models, Srivastava et al. published a suite of benchmarks containing 204 problems called the Beyond the Imitation Game benchmark (BIG-bench) [15].

This reasoning and generalization ability of LLMs has sparked interest in exploring use of LLM models as AI agents, particularly for applications in science and technology. Bran et al. [16] developed an autonomous AI agent called ChemCrow for computational chemistry research. This AI agent has demonstrated remarkable ability to accomplish tasks across organic synthesis, drug discovery, and material design. Similarly, Boiko et al. [17] presents an autonomous AI agent based on an LLM for chemical engineering research. Blanchard et al. [18] use masked LLMs for automating genetic mutations for molecules for application in drug likeness and synthesizability. Zhang et al. [19] present AutoML-GPT, an AI agent that acts as a bridge between various AI models as well as dynamically trains other AI models with optimized hyperparameters. Stella et al. [20] show that generative AI models can accelerate robot design process at conceptual as well as technical level. They further propose a human-AI co-design strategy for the same. Zheng et al. [21] explores the generative ability of GPT4 as a black-box optimizer to navigate architecture search space, making it an effective agent for neural architecture search. Singh et al. [22] study the utility of LLMs as task planners for robotic applications. Jablonka et al. [23] show that a fine-tuned GPT-3 model can outperform many other ML models, particularly in the low-data limit, for predictive Chemistry.

A common thread which passes through the studies mentioned above is the ability of LLMs to find an optimal solution for complex multi-objective optimization problems at hand. This has attracted a great deal of attention from the scientific community. Several studies have been published exploring the ability of LLMs to work as black-box optimizers. Melamed et al. [24] presents an automatic prompt optimization framework called PROPANE. In a method called InstructZero, Chen et al. optimize a low-dimensional soft-prompt to an open-source LLM, which in turn generates the prompt for the black-box LLM, which then performs a zero-shot evaluation [25]. The soft prompt is optimized using a Bayesian optimization method. Deepmind released a general hyperparameter optimization framework called Optformer based on the transformer architecture [26]. The idea of generation of optimized prompts automatically is also explored in Zhou et al. [27]. Similarly, Pryzant et al. [28] explores incorporating gradient descent into automatic prompt optimization. Chen et al. [29] introduces a discrete prompt-optimization framework incorporating human-designed feedback rules. We also see some examples of using LLM within a Reinforcement Learning (RL) framework for optimization [13], [30], [31].

While the examples mentioned so far focused on optimized prompt generation, a few studies have also explored using LLMs for mathematical optimization directly. Liu et al. [32] propose LLM-based Evolutionary Algorithm (LMEA), in which a LLM is responsible for the selection of the parent solution, mutation, cross-over, and generation of a new solution. Guo et al. [33] conducts an assessment of the various optimization abilities of LLM. Their study concludes that LLMs can perform optimization, including gradient descent, hill-climbing, grid search and black-box optimization well, particularly when the sample sizes are small. Pluhacek et al. [34] presents a strategy for using LLM for swarm intelligence-based optimization algorithms. Liu et al. [35] proposes LLM-based multi-objective optimization method, where LLM serves as black-box search operator for Multi-Objective Evolutionary Algorithms (MOEA). Liu et al. [36] proposes using LLMs for optimization algorithm evolution in a framework called AEL (which stands for Algorithm Evolution using LLMs). Liu et al. [37] adopt automatic hill-climbing process using language models as black box optimizers for vision-language models. They show that LLMs utilize implicit gradient direction for more efficient search. Optimization by PROmpting (OPRO) framework from Google Deepmind generates new solutions autoregressively and is seen to outperform human-level prompts [38]. We also see examples of using LLM for mathematical operations and optimization; for example, deep learning for symbolic mathematics [39], LLM for symbolic regression [40], [41], using transformers for linear algebra, including matrix operations and eigen-decomposition [42] etc. In a framework called OptiMUS, LLM are used for formulating and solving Mixed-Integer Linear Programming (MILP) problems [43]. Zhang et al. [44] use LLMs for hyperparameter optimization. Romera-Paredes et al. [45] introduce an evolutionary procedure called FunSearch (short for searching in function space), where a pretrained LLM model is paired with a systematic evaluator for efficient function space search.

The literature cited here demonstrates beyond doubt the ability of LLMs to perform numerical optimization. However, the following questions remain unanswered:

•
Can it be proved that LLMs develop an understanding of the objective function landscape and tune their search directions by leveraging this understanding? This issue is pivotal to employing LLMs for black-box optimization, and a positive response would solidify the credibility of LLMs as black-box optimizers.
•
Is it possible to devise a reliable mechanism to ensure consistent performance from LLMs without relying on the ‘temperature’ parameter to induce randomness? It is well known that LLMs are prone to hallucinations, repetitive output, and can succumb to ‘mode collapse’ when used autoregressively [46]. Therefore, a foolproof framework is essential to render LLMs a feasible method for black-box optimization.

In this paper, our goal is to showcase the ability of LLMs to perform zero shot black-box optimization. We prove that LLMs showcase reasoning abilities and develop an understanding of the objective function landscape over the course of optimization. Via judicious balance between exploration and exploitation strategies, elitism as guardrails and engineering workarounds, we develop a robust framework to solve various optimization problems that hold practical relevance for the industry. We call this method a Large Language-model based Evolutionary Optimizer (LEO). The main contributions of this paper are as follows:

1.
We introduce a novel population-based, parameter-free optimization approach in which an LLM is used to generate new candidate solutions or perturb existing solutions that perform exploration and exploitation of the design space. We employ elitism-based guardrails to retain best candidate solution in every optimization iterations. In addition, the hybrid optimization framework is assisted with engineering workarounds to overcome the limitations of LLMs.
2.
We present distinguishing features of our method compared to other auto-regressive, evolutionary, or population-based methods using LLMs for black-box optimization, such as in Liu et al. [32], Yang et al. [38], Guo et al. [33], Liu et al. [35] (Section 2). A detailed comparison of LEO with existing approaches in the open literature is made in Table 1 that highlights contrasting features and similarities.
3.
We solve several single and multi-objective benchmark optimization problems, as well as explore the ability of LLMs to solve high-dimensional problems. Additionally, we demonstrate the application of our method to several engineering problems such as aerodynamic shape optimization, heat transfer, and windfarm layout optimization (Section 3).
4.
To quantity the merits of our approach, we compare our method against the state-of-the art methods for optimization, for both gradient-based and gradient-free (Section 3).
5.
We provide evidence of LLM’s ability to reason and perform numerical optimization with the help of two tests (Section 4).

In Section 2, we provide a detailed description of LEO. In Section 3, we highlight the performance of LEO for various benchmark optimization problems as well as engineering applications. This is followed by the Section 4, wherein we discuss the reasoning ability of LLMs to sample better candidate points that result in accelerated convergence followed by challenges associated with LEO and recommendations to overcome them. Lastly, in Section 5, we highlight conclusions.

Access through your organization

Check access to the full text by signing in through your organization.

Access through your organization

Section snippets

Motivation Towards populated-based approach

We begin this section to provide a motivation behind perusing a population-based approach for solving complex non-convex optimization problem. While non-population-based or gradient-based methods are preferred for their quick turn-around time towards convergence, the final solutions are likely to get trapped in local optima for non-convex problems. In this section, we setup a quick experiment to demonstrate this idea via the LLM-assisted optimization framework without a population-based

Numerical results

In this section, we evaluate our proposed optimization strategy via a series of test cases, ranging from simple benchmark problems to engineering applications. These tests are classified into four categories: (a) simple benchmark function optimization problems; (b) multi-objective optimization problems; (c) high-dimensional benchmark optimization problems; and (d) industry-relevant engineering optimization problems. This allows us to scrutinize the approach for a range of problems of different

Discussion

As alluded to in the earlier sections, one of the novel aspects of LEO is its ability to reason towards generating better candidate solutions over the course of optimization. In this section, we undertake experiments to demonstrate the reasoning capabilities of LEO backed by strong evidence. We show that LLMs inherently possess attributes of reasoning that, when assisted by an elitism criterion, allow for a hybrid framework that renders faster convergence towards a global optimal solution for

Conclusions

This paper presents a population-based optimization method based on LLMs called Large Language-model-based Evolutionary Optimizer (LEO). We present a diverse set of benchmark test cases, spanning from elementary examples to multi-objective and high-dimensional numerical optimization problems. Furthermore, we illustrate the practical application of this method to industrial optimization problems, including shape optimization, heat transfer, and windfarm layout optimization.

Several key

CRediT authorship contribution statement

Shuvayan Brahmachary: Writing – review & editing, Visualization, Validation, Methodology, Investigation, Formal analysis, Conceptualization. Subodh M. Joshi: Writing – review & editing, Writing – original draft, Project administration, Methodology, Conceptualization. Aniruddha Panda: Software, Resources, Methodology, Formal analysis. Kaushik Koneripalli: Writing – review & editing, Methodology, Formal analysis, Data curation. Arun Kumar Sagotra: Validation, Investigation, Formal analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors from Shell India Markets Pvt. Ltd. would like to acknowledge the financial support from Shell, United States .

Shuvayan Brahmachary received his Ph.D. in Fluid and Thermal Science stream of Mechanical Engineering Department from Indian Institute of Technology Guwahati in the year 2019. Following his Ph.D., he worked as a postdoctoral researcher in Department of Aeronautics and Astronautics, Kyushu University, Japan until 2021. Starting January 2022, he worked as postdoctoral research fellow in the Department of Informatics, Technical University of Munich, Germany. Presently, he is working as a

References (63)

et al.
Summary of ChatGPT-related research and perspective towards the future of large language models
Meta-Radiology
(2023)
Bonan Min et al.
Recent advances in natural language processing via large pre-trained language models: A survey
ACM Comput. Surv.
(2023)
Wayne Xin Zhao et al.
A survey of large language models
(2023)
Ashish Vaswani et al.
Attention is all you need
OpenAI et al.
GPT-4 technical report
(2023)
Jie Huang et al.
Towards reasoning in large language models: A survey
(2023)
Takeshi Kojima et al.
Large language models are zero-shot reasoners
(2023)
Tom Brown et al.
Language models are few-shot learners
Jason Wei et al.
Chain-of-thought prompting elicits reasoning in large language models
(2023)
Shunyu Yao et al.
Tree of thoughts: Deliberate problem solving with large language models
Adv. Neural Inf. Process. Syst.
(2024)

Jieyi Long

Large language model guided tree-of-thought

(2023)

Scott Reed et al.

A generalist agent

(2022)

Yecheng Jason Ma et al.

Eureka: Human-level reward design via coding large language models

(2023)

Guanzhi Wang et al.

Voyager: An open-ended embodied agent with large language models

(2023)

Aarohi Srivastava et al.

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

(2023)

Andres M. Bran et al.

ChemCrow: Augmenting large-language models with chemistry tools

(2023)

Daniil Boiko et al.

Autonomous chemical research with large language models

Nature

(2023)

Andrew E. Blanchard et al.

Automating genetic algorithm mutations for molecules using a masked language model

IEEE Trans. Evol. Comput.

(2022)

Shujian Zhang et al.

AutoML-GPT: Automatic machine learning with GPT

(2023)

Francesco Stella et al.

How can LLMs transform the robotic design process?

Nat. Mach. Intell.

(2023)

Mingkai Zheng et al.

Can GPT-4 perform neural architecture search?

(2023)

Ishika Singh et al.

ProgPrompt: Generating situated robot task plans using large language models

(2022)

Kevin Maik Jablonka et al.

Leveraging large language models for predictive chemistry

Nat. Mach. Intell.

(2024)

Rimon Melamed et al.

PROPANE: Prompt design as an inverse problem

(2023)

Lichang Chen et al.

InstructZero: Efficient instruction optimization for black-box large language models

(2023)

Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud...

Yongchao Zhou et al.

Large language models are human-level prompt engineers

(2023)

Reid Pryzant et al.

Automatic prompt optimization with ”Gradient Descent” and beam search

(2023)

Yongchao Chen et al.

Prompt optimization in multi-step tasks (PROMST): Integrating human feedback and preference alignment

(2024)

Lili Chen et al.

Decision transformer: Reinforcement learning via sequence modeling

(2021)

Jane X. Wang et al.

Learning to reinforcement learn

(2017)

Cited by (14)

Subodh Joshi received his Masters and Ph.D. degrees in Aerospace Engineering from Indian Institute of Technology Bombay (IIT Bombay), India in 2018. Following his Ph.D., Subodh worked as a postdoctoral researcher at IIT Bombay (2018-19), INRIA Bordeaux Sud Ouest Research Center, France (2019-20), and as a C. V. Raman Postdoctoral Fellow at IISc Bangalore, India (2020-22). Since 2022, Subodh is working as a Scientific Machine Learning Researcher at Shell Technology Center Bangalore. His research interests include Scientific Machine Learning, numerical methods (particularly higher-order accurate schemes) for conservation and balance laws and applications in computational physics, including aeroacoustics, fluid dynamics, industrial and multiphysics systems, and subsurface physics.

Aniruddha Panda has worked at the Shell Technology Center Bangalore and partly in Houston, for the last 5 years. He is currently in the role of Scientific Machine Learning Researcher, and prior to this worked extensively as High-Performance Computing researcher. He has a Ph.D. in computational science from Eindhoven University of Technology. His prior experience is in the areas of scientific computing, physics inspired neural networks, deep learning with applications in geophysics (seismic processing), edge computing, and renewables. In his current role, he is working in the areas of operator learning for subsurface flows, optimization for process engineering and the use of generative AI and agentic workflows for scientific applications.

Kaushik Koneripalli received his Master’s degree in Electric Engineering in Arizona State University in 2019. Following this, he worked as a Research Engineer in Siemens and a Computer Scientist at SRI International, both in the US, before joining the Scientific Machine Learning team in Shell Bangalore, India in 2023. His broad research interests are in deep representation learning with applications rooted in Computer Vision and NLP, with his current focus being on scaling graph ML for industrial applications.

Arun Kumar Sagotra received his Ph.D. in Materials Science and Engineering from University of New South Wales, Sydney in 2019. Following this, he worked as a Data Scientist in Hitachi Vantara and CSIRO in India and Australia respectively. He worked as a Scientific Machine Learning Researcher in Shell, India (2023-2024). Currently, he is working in Micron Technology. His Interest include Machine Learning, Computer Vision, Graph Neural Network, NLP and Materials Discovery.

Harshil Patel is a Scientific Machine Learning researcher at Shell Technology Center Bangalore. He has extensive experience working in the domains of natural language processing, machine vision, physics-inspired neural networks, and optimization during his 5 years of tenure at Shell. He has a Ph.D. in computational science from Eindhoven University of Technology, the Netherlands. He has a keen interest in fusing Artificial Intelligence with Physics to accelerate the scientific and engineering computational workflows.

Ankush Kumar Sharma received his Bachelor of Technology in Computer Science and Engineering from the National Institute of Technology, Hamirpur in 2013. He began his career at Samsung Research Institute, Noida as a software developer for 18 months before joining the Indian Army as an officer in the Corps of Signals in 2014, where he served until 2020. Ankush completed his Master of Technology in Software Systems (Data Analytics) from Birla Institute of Technology and Science, Pilani in 2020. Currently, he is working as a Scientific Machine Learning Researcher at Shell, India with a focus on creating end-to-end machine learning solutions. His research interests include natural language processing, geospatial analysis, knowledge graphs and generative AI exploration.

Dr. Ameya D. Jagtap is an Assistant Professor (tenure-track) in the Department of Aerospace Engineering at Worcester Polytechnic Institute (WPI), USA. Prior to WPI, he was an Assistant Professor of Applied Mathematics (Research) at Brown University for three and a half years. Dr. Jagtap holds a Ph.D. and a Master’s degree in Aerospace Engineering from the Indian Institute of Science, India, and conducted postdoctoral research at TIFR-CAM and Brown University. His research focuses at the intersection of scientific computing and machine learning algorithms, with broad applications in computational physics.

Kaushic Kalyanaraman is a Scientific Machine Learning researcher at Shell Technology Center Bangalore. Kaushic has extensive experience in Graph theory, Generative AI, Scientific ML as well as over 17 years of experience in Energy engineering and economics. He has a bachelors in Civil Engineering from National University of Singapore. In his current role, he is the founder, program manager and principal investigator of the Scientific Machine Learning Research group at Shell R&D. His area of interest is in advancing the Scientific Machine Learning and Generative AI for scientific discovery and advances in Energy Engineering and Economics.

View full text

Abstract

Introduction

Access through your organization

Section snippets

Motivation Towards populated-based approach

Numerical results

Discussion

Conclusions

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgments

Summary of ChatGPT-related research and perspective towards the future of large language models

Meta-Radiology

Recent advances in natural language processing via large pre-trained language models: A survey

ACM Comput. Surv.

A survey of large language models

Attention is all you need

GPT-4 technical report

Towards reasoning in large language models: A survey

Large language models are zero-shot reasoners

Language models are few-shot learners

Chain-of-thought prompting elicits reasoning in large language models

Tree of thoughts: Deliberate problem solving with large language models

Adv. Neural Inf. Process. Syst.

Large language model guided tree-of-thought

A generalist agent

Eureka: Human-level reward design via coding large language models

Voyager: An open-ended embodied agent with large language models

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

ChemCrow: Augmenting large-language models with chemistry tools

Autonomous chemical research with large language models

Nature

Automating genetic algorithm mutations for molecules using a masked language model

IEEE Trans. Evol. Comput.

AutoML-GPT: Automatic machine learning with GPT

How can LLMs transform the robotic design process?

Nat. Mach. Intell.

Can GPT-4 perform neural architecture search?

ProgPrompt: Generating situated robot task plans using large language models

Leveraging large language models for predictive chemistry

Nat. Mach. Intell.

PROPANE: Prompt design as an inverse problem

InstructZero: Efficient instruction optimization for black-box large language models

Large language models are human-level prompt engineers

Automatic prompt optimization with ”Gradient Descent” and beam search

Prompt optimization in multi-step tasks (PROMST): Integrating human feedback and preference alignment

Decision transformer: Reinforcement learning via sequence modeling

Learning to reinforcement learn