GitHub - gqgs/llm100kbench: LLM 100k portfolio management benchmark

3 min read Original article ↗

LLM Investment Benchmark

A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.

Overview

This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:

  • Create new portfolios
  • List current holdings and recent context
  • Update portfolios based on model decisions

The model executions and their current context can be seen here.

Automated Weekly Runs

The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.

The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.

Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.

Why?

To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.

This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.

Notes

  • chatgpt, deepseek, and grok are kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.
  • perplexity is archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.
  • Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.

Project Structure

  • cmd: Contains the main command implementations
    • create: Initialize new portfolios
    • list: Display current holdings and context
    • update: Process investment orders and update holdings
    • stocks: Fetch most recent stock prices

Prompt

The most recent prompt with the clear guidelines can be see here and here.

Current Portfolio (2026-05-27)

Portfolio Value by Model

pie showData
    "deepseek" : 194355
    "chatgpt" : 128951
    "mistral" : 100000
    "qwen" : 100000
    "gpt-oss" : 99841
    "gemini" : 82817
    "llama" : 49848
Loading
Model Ticket Sum Quantity
chatgpt USD 69 69
chatgpt AAPL 128882 418
deepseek AMD 1512 3
deepseek ASML 163203 100
deepseek MSFT 2912 7
deepseek SNPS 26728 50
gemini USD 6335 6335
gemini AAPL 12950 42
gemini AMD 9574 19
gemini ASML 6528 4
gemini CRDO 4433 20
gemini GFS 5398 60
gemini NXPI 9980 30
gemini ON 5334 42
gemini QCOM 11446 46
gemini TSLA 10840 25
mistral USD 100000 100000
llama NVDA 49848 232
gpt-oss USD 250 250
gpt-oss AAPL 99591 323
qwen USD 100000 100000
Model Total Sum Change
deepseek 194355
chatgpt 128951
mistral 100000
qwen 100000
gpt-oss 99841
gemini 82817
llama 49848