Vertex AI pricing - NFHN Reader

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Vertex AI pricing compared to legacy product pricing

The costs for Vertex AI remain the same as they are for the legacy AI Platform and AutoML products that Vertex AI supersedes, with the following exceptions:

Legacy AI Platform Prediction and AutoML Tables predictions supported lower-cost, lower-performance machine types that aren't supported for Vertex AI Inference and AutoML tabular.
Legacy AI Platform Prediction supported scale-to-zero, which isn't supported for Vertex AI Inference.

Vertex AI also offers more ways to optimize costs, such as the following:

Optimized TensorFlow runtime.
Support for co-hosting models.
No minimum usage duration for Training and Prediction. Instead, usage is charged in 30 second increments.

Pricing for Generative AI on Vertex AI

Pricing for AutoML models

For Vertex AI AutoML models, you pay for three main activities:

Training the model
Deploying the model to an endpoint
Using the model to make predictions

Vertex AI uses predefined machine configurations for Vertex AutoML models, and the hourly rate for these activities reflects the resource usage.

The time required to train your model depends on the size and complexity of your training data. Models must be deployed before they can provide online predictions or online explanations.

You pay for each model deployed to an endpoint, even if no prediction is made. You must undeploy your model to stop incurring further charges. Models that are not deployed or have failed to deploy are not charged.

You pay only for compute hours used; if training fails for any reason other than a user-initiated cancellation, you are not billed for the time. You are charged for training time if you cancel the operation.

Select a model type below for pricing information.

Image data

Operation	Price (classification) (USD)	Price (object detection) (USD)
Training	$3.465 / 1 hour	$3.465 / 1 hour
Training (Edge on-device model)	$18.00 / 1 hour	$18.00 / 1 hour
Deployment and online prediction	$1.375 / 1 hour	$2.002 / 1 hour
Batch prediction	$2.222 / 1 hour	$2.222 / 1 hour

Tabular data

Inference charges for Vertex Explainable AI

Compute associated with Vertex Explainable AI is charged at same rate as inference. However, explanations take longer to process than normal inferences, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase inference charges.

Vertex AI Forecast

AutoML

Stage	Pricing
Prediction	0 count to 1,000,000 count $0.20 / 1,000 count, per 1 month / account 1,000,000 count to 50,000,000 count $0.10 / 1,000 count, per 1 month / account 50,000,000 count and above $0.02 / 1,000 count, per 1 month / account
Training	$21.252 / 1 hour
Explainable AI	Explainability using Shapley values. Refer to Vertex AI Inference and Explanation pricing page.

* A prediction data point is one time point in the forecast horizon. For example, with daily granularity a 7-day horizon is 7 points per each time series.

Up to 5 prediction quantiles can be included at no additional cost.
The number of data points consumed per tier is refreshed monthly.

ARIMA+

Stage	Pricing
Prediction	$5.00 / 1,000 count
Training	$250.00 per TB x Number of Candidate Models x Number of Backtesting Windows*
Explainable AI	Explainability with time series decomposition does not add any additional cost. Explainability using Shapley values is not supported.

Refer to the BigQuery ML pricing page for additional details. Each training and prediction job incurs the cost of 1 managed pipeline run, as described in Vertex AI pricing.

* A backtesting window is created for each period in the test set. The AUTO_ARIMA_MAX_ORDER used determines the number of candidate models. It ranges from 6-42 for models with multiple time series.

Custom-trained models

Training

The tables below provide the approximate price per hour of various training configurations. You can choose a custom configuration of selected machine types. To calculate pricing, sum the costs of the virtual machines you use.

If you use Compute Engine machine types and attach accelerators, the cost of the accelerators is separate. To calculate this cost, multiply the prices in the table of accelerators below by how many machine hours of each type of accelerator you use.

Machine types

You can use Spot VMs with Vertex AI custom training. Spot VMs are billed according to Compute Engine Spot VMs pricing. There are Vertex AI custom training management fees in addition to your infrastructure usage, captured in the following tables.

You can use Compute Engine reservations with Vertex AI custom training. When using Compute Engine reservations, you're billed according to Compute Engine Pricing, including any applicable committed use discounts (CUDs). There are Vertex AI custom training management fees in addition to your infrastructure usage, captured in the following tables.

Machine type	Price (USD)
g4-standard-48	$5.1749195 / 1 hour
g4-standard-96	$10.349839 / 1 hour
g4-standard-192	$20.699678 / 1 hour
g4-standard-384	$41.399356 / 1 hour
n1-standard-4	$0.21849885 / 1 hour
n1-standard-8	$0.4369977 / 1 hour
n1-standard-16	$0.8739954 / 1 hour
n1-standard-32	$1.7479908 / 1 hour
n1-standard-64	$3.4959816 / 1 hour
n1-standard-96	$5.2439724 / 1 hour
n1-highmem-2	$0.13604845 / 1 hour
n1-highmem-4	$0.2720969 / 1 hour
n1-highmem-8	$0.5441938 / 1 hour
n1-highmem-16	$1.0883876 / 1 hour
n1-highmem-32	$2.1767752 / 1 hour
n1-highmem-64	$4.3535504 / 1 hour
n1-highmem-96	$6.5303256 / 1 hour
n1-highcpu-16	$0.65180712 / 1 hour
n1-highcpu-32	$1.30361424 / 1 hour
n1-highcpu-64	$2.60722848 / 1 hour
n1-highcpu-96	$3.91084272 / 1 hour
a2-highgpu-1g*	$4.425248914 / 1 hour
a2-highgpu-2g*	$8.850497829 / 1 hour
a2-highgpu-4g*	$17.700995658 / 1 hour
a2-highgpu-8g*	$35.401991315 / 1 hour
a2-megagpu-16g*	$65.707278915 / 1 hour
a3-highgpu-8g*	$101.007352832 / 1 hour
a3-megagpu-8g*	$106.046424032 / 1 hour
a3-ultragpu-8g*	$99.773930496 / 1 hour
a4-highgpu-8g*	$148.212 / 1 hour
e2-standard-4	$0.154126276 / 1 hour
e2-standard-8	$0.308252552 / 1 hour
e2-standard-16	$0.616505104 / 1 hour
e2-standard-32	$1.233010208 / 1 hour
e2-highmem-2	$0.103959618 / 1 hour
e2-highmem-4	$0.207919236 / 1 hour
e2-highmem-8	$0.415838472 / 1 hour
e2-highmem-16	$0.831676944 / 1 hour
e2-highcpu-16	$0.455126224 / 1 hour
e2-highcpu-32	$0.910252448 / 1 hour
n2-standard-4	$0.2233714 / 1 hour
n2-standard-8	$0.4467428 / 1 hour
n2-standard-16	$0.8934856 / 1 hour
n2-standard-32	$1.7869712 / 1 hour
n2-standard-48	$2.6804568 / 1 hour
n2-standard-64	$3.5739424 / 1 hour
n2-standard-80	$4.467428 / 1 hour
n2-highmem-2	$0.1506661 / 1 hour
n2-highmem-4	$0.3013322 / 1 hour
cloud-tpu	Pricing is determined by the accelerator type. See 'Accelerators'.
n2-highmem-8	$0.6026644 / 1 hour
n2-highmem-16	$1.2053288 / 1 hour
n2-highmem-32	$2.4106576 / 1 hour
n2-highmem-48	$3.6159864 / 1 hour
n2-highmem-64	$4.8213152 / 1 hour
n2-highmem-80	$6.026644 / 1 hour
n2-highcpu-16	$0.6596032 / 1 hour
n2-highcpu-32	$1.3192064 / 1 hour
n2-highcpu-48	$1.9788096 / 1 hour
n2-highcpu-64	$2.6384128 / 1 hour
n2-highcpu-80	$3.298016 / 1 hour
c2-standard-4	$0.2401292 / 1 hour
c2-standard-8	$0.4802584 / 1 hour
c2-standard-16	$0.9605168 / 1 hour
c2-standard-30	$1.800969 / 1 hour
c2-standard-60	$3.601938 / 1 hour
m1-ultramem-40	$7.237065 / 1 hour
m1-ultramem-80	$14.47413 / 1 hour
m1-ultramem-160	$28.94826 / 1 hour
m1-megamem-96	$12.249984 / 1 hour

*This amount includes GPU price, since this instance type always requires a fixed number of GPU accelerators.

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Accelerators

Machine type	Price (USD)	Vertex Management Fee
NVIDIA_TESLA_A100	$2.933908 / 1 hour	$0.4400862 / 1 hour
NVIDIA_TESLA_A100_80GB	$3.92808 / 1 hour	$0.589212 / 1 hour
NVIDIA_H100_80GB	$9.79655057 / 1 hour	$1.4694826 / 1 hour
NVIDIA_H200_141GB	$10.708501 / 1 hour	Unavailable
NVIDIA_H100_MEGA_80GB	$11.8959171 / 1 hour	Unavailable
NVIDIA_TESLA_L4	$0.644046276 / 1 hour	Unavailable
NVIDIA_TESLA_P4	$0.69 / 1 hour	Unavailable
NVIDIA_TESLA_P100	$1.679 / 1 hour	Unavailable
NVIDIA_TESLA_T4	$0.4025 / 1 hour	Unavailable
NVIDIA_TESLA_V100	$2.852 / 1 hour	Unavailable
TPU_V2 Single (8 cores)	$5.175 / 1 hour	Unavailable
TPU_V2 Pod (32 cores)^*	$27.60 / 1 hour	Unavailable
TPU_V3 Single (8 cores)	$9.20 / 1 hour	Unavailable
TPU_V3 Pod (32 cores)^*	$36.80 / 1 hour	Unavailable
tpu7x-standard-4t (1 chip)	$13.80 / 1 hour	Unavailable

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

* The price for training using a Cloud TPU Pod is based on the number of cores in the Pod. The number of cores in a pod is always a multiple of 32. To determine the price of training on a Pod that has more than 32 cores, take the price for a 32-core Pod, and multiply it by the number of cores, divided by 32. For example, for a 128-core Pod, the price is (32-core Pod price) * (128/32). For information about which Cloud TPU Pods are available for a specific region, see System Architecture in the Cloud TPU documentation.

Disks

Machine type	Price (USD)
pd-standard	$0.000063014 / 1 gibibyte hour
pd-ssd	$0.000267808 / 1 gibibyte hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

All use is subject to the Vertex AI quota policy.
You are required to store your data and program files in Google Cloud Storage buckets during the Vertex AI lifecycle. See more about Cloud Storage usage.

You are charged for training your models from the moment when resources are provisioned for a job until the job finishes.

Warning: Your training jobs are limited by the Vertex AI quota policy. If you choose a very powerful processing cluster for your first training jobs, it's likely you will exceed your quota.

Scale tiers for predefined configurations (AI Platform Training)

You can control the type of processing cluster to use when training your model. The simplest way is to choose from one of the predefined configurations called scale tiers. Read more about scale tiers.

Machine types for custom configurations

If you use Vertex AI or select CUSTOM as your scale tier for AI Platform Training, you have control over the number and type of virtual machines to use for the cluster's master, worker and parameter servers. Read more about machine types for Vertex AI and machine types for AI Platform Training.

The cost of training with a custom processing cluster is the sum of all the machines you specify. You are charged for the total time of the job, not for the active processing time of individual machines.

Gen AI Evaluation Service

For model-based metrics, charges are applied only for the prediction costs associated with the underlying autorater model. They are billed based on the input tokens that you provide in your evaluation dataset and the autorater output.

Gen AI Evaluation Service is generally available (GA). Pricing change took effect on April 14, 2025.

Metric	Pricing
Pointwise	Default autorater model Gemini 2.0 Flash
Pairwise	Default autorater model Gemini 2.0 Flash

Metric

Pricing

Pointwise

Default autorater model

Gemini 2.0 Flash

Pairwise

Default autorater model

Gemini 2.0 Flash

Computation-based metrics are charged at $0.00003 per 1k characters for input and $0.00009 per 1k characters for output. They are referred to as Automatic Metric in SKU.

Metric Name	Type
Exact Match	Computation-based
Bleu	Computation-based
Rouge	Computation-based
Tool Call Valid	Computation-based
Tool Name Match	Computation-based
Tool Parameter Key Match	Computation-based
Tool Parameter KV Match	Computation-based

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Legacy model-based metrics are charged at $0.005 per 1k characters for input and $0.015 per 1k characters for output.

Metric Name	Type
Coherence	Pointwise
Fluency	Pointwise
Fulfillment	Pointwise
Safety	Pointwise
Groundedness	Pointwise
Summarization Quality	Pointwise
Summarization Helpfulness	Pointwise
Summarization Verbosity	Pointwise
Question Answering Quality	Pointwise
Question Answering Relevance	Pointwise
Question Answering Helpfulness	Pointwise
Question Answering Correctness	Pointwise
Pairwise Summarization Quality	Pairwise
Pairwise Question Answering Quality	Pairwise

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Vertex AI Agent Engine

Vertex AI Agent Engine is a set of services for developers to scale agents in production. Services can be used together or a la carte. You only pay for what you use. Today, you pay for the Agent Engine runtime.

Starting February 11, 2026, billing will begin for Code Execution, Sessions, and Memory Bank.

Runtime

Pricing is based on compute (vCPU hours) and memory (GiB hours) resources used by agents that are deployed to the Agent Engine runtime. Billing is rounded to the nearest second of usage. Idle time for an agent is not billed.

Free Tier

To help you get started with the runtime, we offer a monthly free tier.

vCPU - First 180,000 vCPU-seconds (50 hours) free per month
RAM - First 360,000 GiB-seconds (100 hours) free per month

Usage based pricing

Once your monthly usage exceeds the free tier, billing begins per the rates below.

Resource	Price (USD)
vCPU	0 hour to 50 hour $0.00 (Free) / 3,600 second, per 1 month / project 50 hour and above $0.0864 / 3,600 second, per 1 month / project
RAM	0 gibibyte hour to 100 gibibyte hour $0.00 (Free) / 3,600 gibibyte second, per 1 month / project 100 gibibyte hour and above $0.009 / 3,600 gibibyte second, per 1 month / project

Code Execution

Similar to the runtime, you pay for the compute and memory required to run a sandbox. Billing is rounded to the nearest second of usage. Idle time is not billed.

Compute: $0.0864 per vCPU hour
Memory: $0.0090 per GiB-hour

Sessions

You pay based on the number of events stored in the session service. We bill for stored session events that include content. This includes the initial user request, model responses, function calls, and function responses. We do not bill for system control events (such as checkpoints) that are stored in the session service.

$0.25 per 1,000 events stored

Memory Bank

Pay based on the number of memories stored and returned.

Memory Stored: $0.25 per 1,000 memories stored / month (+ LLM costs to generate memories, paid separately)
Memory retrieval: $0.50 per 1,000 memories returned; the first 1000 memories returned per month are free

Pricing scenarios

To help you understand the cost of using Agent Engine services, we offer two hypothetical agents: a Lightweight Agent and a Standard Agent. For both scenarios, we make the following assumptions:

Free tiers: For these calculations, we assume that the runtime and memory bank free tiers have already been used within a month for prior experimentation.
Runtime Requests per Session: A "full session" or conversation consists of 10 runtime requests.
Sessions: Each runtime request generates an average of 3 session events.
Memory Bank:
Storage: At the end of each full session, 1 memory is extracted and stored.
Retrieval: We assume an average of 1 memory is returned per runtime request.
Code Execution is invoked for 30% of all runtime requests.
Billing Month: All monthly calculations are based on a 30-day month.

Additional notes:

Service modularity: While the below scenarios show the cost of Agent Engine services used together, you can choose to use services individually. For example, you can use the Agent Engine session and memory bank services without using the Agent Engine runtime.
Additional model costs: Agents require LLMs to reason and plan. LLM tokens consumed by agents are billed separately and not included in the below scenarios.
Additional tool costs: Agents require tools to take action. Tools used by agents (e.g. API calls, storage) are billed separately and not included in the below scenarios.

Hypothetical Scenarios

Scenario 1: Lightweight Internal Agent

This scenario represents agents handling low-volume, sporadic traffic.

Use Case Examples: An IT helpdesk bot for a small company, a personal productivity agent that drafts emails, or a Slack bot that provides answers from documentation.
Compute and memory required for runtime and code execution: 1 vCPU / 1 GiB RAM.
Traffic: 0.16 queries per second (10 queries per minute), totaling 432,000 requests per month.
Average Request Duration: 3 seconds

Service	Calculation	Monthly Cost
Runtime	(432,000 requests × 3 sec/req ÷ 3600 sec/hr) = 360 hours vCPU: (360 hrs × 1 vCPU × $0.0864/hr) = $31.10 RAM: (360 hrs × 1 GiB × $0.0090/hr) = $3.24	$34.34
Code Execution	(360 runtime hours × 30% usage) = 108 hrs vCPU: (108 hrs × 1 vCPU × $0.0864/hr) = $9.33 RAM: (108 hrs × 1 GiB × $0.0090/hr) = $0.97	$10.30
Sessions	432,000 requests x 3 events ÷ 1,000 × $0.25	$324
Memory Bank	Stored: (432,000 reqs ÷ 10 reqs/session × 1 memory/session ÷ 1,000) × $0.25 = $10.80 Retrieval: (432,000 reqs × 1 returned memory ÷ 1,000) × $0.50 = $216.00	$226.80
Total Estimated Monthly Cost		$595.44

Scenario 2: Standard Agent

This scenario represents a production agent integrated into a business application, handling consistent user traffic.

Use Case Examples: A customer service agent on an e-commerce site, a lead qualification bot on a B2B website, or an internal data analysis agent for sales teams.
Compute: 2 vCPU / 5 GiB RAM
Traffic: 10 queries per second (600 queries per minute), totaling 25,920,000 requests per month.
Average Request Duration: 5 seconds

Service	Calculation	Monthly Cost
Runtime	(25,920,000 requests × 5 sec/req ÷ 3600 sec/hr) = 36,000 hours vCPU: (36,000 hrs × 2 vCPU × $0.0864/hr) = $6,220.80 RAM: (36,000 hrs × 5 GiB × $0.0090/hr) = $1,620.00	$7,840.80
Code Execution	(36,000 runtime hours × 30% usage) = 10,800 hours vCPU: (10,800 hrs × 2 vCPU × $0.0864/hr) = $1,866.24 RAM: (10,800 hrs × 5 GiB × $0.0090/hr) = $486	$2,352.24
Sessions	25,920,000 requests * 3 events ÷ 1,000 × $0.25	$19,440
Memory Bank	Stored: (25,920,000 reqs ÷ 10 reqs/session × 1 memory/session ÷ 1,000) × $0.25 = $648.00 Retrieval: (25,920,000 reqs × 1 returned memory ÷ 1,000) × $0.50 = $12,960.00	$13,608
Total Estimated Monthly Cost		$43,241.04

Ray on Vertex AI

Training

Machine types

Machine type	Price (USD)
n1-standard-4	$0.2279988 / 1 hour
n1-standard-8	$0.4559976 / 1 hour
n1-standard-16	$0.9119952 / 1 hour
n1-standard-32	$1.8239904 / 1 hour
n1-standard-64	$3.6479808 / 1 hour
n1-standard-96	$5.4719712 / 1 hour
n1-highmem-2	$0.1419636 / 1 hour
n1-highmem-4	$0.2839272 / 1 hour
n1-highmem-8	$0.5678544 / 1 hour
n1-highmem-16	$1.1357088 / 1 hour
n1-highmem-32	$2.2714176 / 1 hour
n1-highmem-64	$4.5428352 / 1 hour
n1-highmem-96	$6.8142528 / 1 hour
n1-highcpu-16	$0.68014656 / 1 hour
n1-highcpu-32	$1.36029312 / 1 hour
n1-highcpu-64	$2.72058624 / 1 hour
n1-highcpu-96	$4.08087936 / 1 hour
a2-highgpu-1g*	$4.408062 / 1 hour
a2-highgpu-2g*	$8.816124 / 1 hour
a2-highgpu-4g*	$17.632248 / 1 hour
a2-highgpu-8g*	$35.264496 / 1 hour
a2-highgpu-16g*	$70.528992 / 1 hour
a3-highgpu-8g*	$105.39898088 / 1 hour
a3-megagpu-8g*	$110.65714224 / 1 hour
a4-highgpu-8g*	$148.212 / 1 hour
e2-standard-4	$0.16082748 / 1 hour
e2-standard-4	$0.32165496 / 1 hour
e2-standard-16	$0.64330992 / 1 hour
e2-standard-32	$1.28661984 / 1 hour
e2-highmem-2	$0.10847966 / 1 hour
e2-highmem-4	$0.21695932 / 1 hour
e2-highmem-8	$0.43391864 / 1 hour
e2-highmem-16	$0.86783728 / 1 hour
e2-highcpu-16	$0.4749144 / 1 hour
e2-highcpu-32	$0.9498288 / 1 hour
n2-standard-4	$0.2330832 / 1 hour
n2-standard-8	$0.4661664 / 1 hour
n2-standard-16	$0.9323328 / 1 hour
n2-standard-32	$1.8646656 / 1 hour
n2-standard-48	$2.7969984 / 1 hour
n2-standard-64	$3.7293312 / 1 hour
n2-standard-80	$4.661664 / 1 hour
n2-highmem-2	$0.1572168 / 1 hour
n2-highmem-4	$0.3144336 / 1 hour
n2-highmem-8	$0.6288672 / 1 hour
n2-highmem-16	$1.2577344 / 1 hour
n2-highmem-32	$2.5154688 / 1 hour
n2-highmem-48	$3.7732032 / 1 hour
n2-highmem-64	$5.0309376 / 1 hour
n2-highmem-80	$6.288672 / 1 hour
n2-highcpu-16	$0.6882816 / 1 hour
n2-highcpu-32	$1.3765632 / 1 hour
n2-highcpu-48	$2.0648448 / 1 hour
n2-highcpu-64	$2.7531264 / 1 hour
n2-highcpu-80	$3.441408 / 1 hour
c2-standard-4	$0.2505696 / 1 hour
c2-standard-8	$0.5011392 / 1 hour
c2-standard-16	$1.0022784 / 1 hour
c2-standard-30	$1.879272 / 1 hour
c2-standard-60	$3.758544 / 1 hour
m1-ultramem-40	$7.55172 / 1 hour
m1-ultramem-80	$15.10344 / 1 hour
m1-ultramem-160	$30.20688 / 1 hour
m1-megamem-96	$12.782592 / 1 hour
cloud-tpu	Pricing is determined by the accelerator type. See 'Accelerators'.

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Accelerators

GPU type	Price (USD)
NVIDIA_TESLA_A100	$3.5206896 / 1 hour
NVIDIA_TESLA_A100_80GB	$4.517292 / 1 hour
NVIDIA_H100_80GB	$11.75586073 / 1 hour
NVIDIA_TESLA_P4	$0.72 / 1 hour
NVIDIA_TESLA_P100	$1.752 / 1 hour
NVIDIA_TESLA_T4	$0.42 / 1 hour
NVIDIA_TESLA_V100	$2.976 / 1 hour
TPU_V2 Single (8 cores)	$5.40 / 1 hour
TPU_V2 Pod (32 cores)^*	$28.80 / 1 hour
TPU_V3 Single (8 cores)	$9.60 / 1 hour
TPU_V3 Pod (32 cores)^*	$38.40 / 1 hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Disks

Disk type	Price (USD)
pd-standard	$0.000065753 / 1 gibibyte hour
pd-ssd	$0.000279452 / 1 gibibyte hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

All use is subject to the Vertex AI quota policy.
You are required to store your data and program files in Google Cloud Storage buckets during the Vertex AI lifecycle. See more about Cloud Storage usage.

You are charged for training your models from the moment when resources are provisioned for a job until the job finishes.

Warning: Your training jobs are limited by the Vertex AI quota policy. If you choose a very powerful processing cluster for your first training jobs, it's likely you will exceed your quota.

Prediction and explanation

The following tables provide the prices of batch prediction, online prediction, and online explanation per node hour. A node hour represents the time a virtual machine spends running your prediction job or waiting in an active state (an endpoint with one or more models deployed) to handle prediction or explanation requests.

You can use Spot VMs with Vertex AI Inference. Spot VMs are billed according to Compute Engine Spot VMs pricing. There are Vertex AI Inference management fees in addition to your infrastructure usage, captured in the following tables.

You can use Compute Engine reservations with Vertex AI Inference. When using Compute Engine reservations, you're billed according to Compute Engine Pricing, including any applicable committed use discounts (CUDs). There are Vertex AI Inference management fees in addition to your infrastructure usage, captured in the following tables.

E2 Series

Machine type	Price (USD)
e2-standard-2	$0.0770564 / 1 hour
e2-standard-4	$0.1541128 / 1 hour
e2-standard-8	$0.3082256 / 1 hour
e2-standard-16	$0.6164512 / 1 hour
e2-standard-32	$1.2329024 / 1 hour
e2-highmem-2	$0.1039476 / 1 hour
e2-highmem-4	$0.2078952 / 1 hour
e2-highmem-8	$0.4157904 / 1 hour
e2-highmem-16	$0.8315808 / 1 hour
e2-highcpu-2	$0.056888 / 1 hour
e2-highcpu-4	$0.113776 / 1 hour
e2-highcpu-8	$0.227552 / 1 hour
e2-highcpu-16	$0.455104 / 1 hour
e2-highcpu-32	$0.910208 / 1 hour

N1 Series

Machine type	Price (USD)
n1-standard-2	$0.1095 / 1 hour
n1-standard-4	$0.219 / 1 hour
n1-standard-8	$0.438 / 1 hour
n1-standard-16	$0.876 / 1 hour
n1-standard-32	$1.752 / 1 hour
n1-highmem-2	$0.137 / 1 hour
n1-highmem-4	$0.274 / 1 hour
n1-highmem-8	$0.548 / 1 hour
n1-highmem-16	$1.096 / 1 hour
n1-highcpu-2	$0.081 / 1 hour
n1-highcpu-4	$0.162 / 1 hour
n1-highcpu-8	$0.324 / 1 hour
n1-highcpu-16	$0.648 / 1 hour
n1-highcpu-32	$1.296 / 1 hour

N2 Series

Machine type	Price (USD)
n2-standard-2	$0.1116854 / 1 hour
n2-standard-4	$0.2233708 / 1 hour
n2-standard-8	$0.4467416 / 1 hour
n2-standard-16	$0.8934832 / 1 hour
n2-standard-32	$1.7869664 / 1 hour
n2-highmem-2	$0.1506654 / 1 hour
n2-highmem-4	$0.3013308 / 1 hour
n2-highmem-8	$0.6026616 / 1 hour
n2-highmem-16	$1.2053232 / 1 hour
n2-highcpu-2	$0.0824504 / 1 hour
n2-highcpu-4	$0.1649008 / 1 hour
n2-highcpu-8	$0.3298016 / 1 hour
n2-highcpu-16	$0.6596032 / 1 hour
n2-highcpu-32	$1.3192064 / 1 hour

N2D Series

Machine type	Price (USD)
n2d-standard-2	$0.0971658 / 1 hour
n2d-standard-4	$0.1943316 / 1 hour
n2d-standard-8	$0.3886632 / 1 hour
n2d-standard-16	$0.7773264 / 1 hour
n2d-standard-32	$1.5546528 / 1 hour
n2d-highmem-2	$0.131077 / 1 hour
n2d-highmem-4	$0.262154 / 1 hour
n2d-highmem-8	$0.524308 / 1 hour
n2d-highmem-16	$1.048616 / 1 hour
n2d-highcpu-2	$0.0717324 / 1 hour
n2d-highcpu-4	$0.1434648 / 1 hour
n2d-highcpu-8	$0.2869296 / 1 hour
n2d-highcpu-16	$0.5738592 / 1 hour
n2d-highcpu-32	$1.1477184 / 1 hour

C2 Series

Machine type	Price (USD)
c2-standard-4	$0.240028 / 1 hour
c2-standard-8	$0.480056 / 1 hour
c2-standard-16	$0.960112 / 1 hour
c2-standard-30	$1.80021 / 1 hour
c2-standard-60	$3.60042 / 1 hour

C2D Series

Machine type	Price (USD)
c2d-standard-2	$0.1044172 / 1 hour
c2d-standard-4	$0.2088344 / 1 hour
c2d-standard-8	$0.4176688 / 1 hour
c2d-standard-16	$0.8353376 / 1 hour
c2d-standard-32	$1.6706752 / 1 hour
c2d-standard-56	$2.9236816 / 1 hour
c2d-standard-112	$5.8473632 / 1 hour
c2d-highmem-2	$0.1408396 / 1 hour
c2d-highmem-4	$0.2816792 / 1 hour
c2d-highmem-8	$0.5633584 / 1 hour
c2d-highmem-16	$1.1267168 / 1 hour
c2d-highmem-32	$2.2534336 / 1 hour
c2d-highmem-56	$3.9435088 / 1 hour
c2d-highmem-112	$7.8870176 / 1 hour
c2d-highcpu-2	$0.086206 / 1 hour
c2d-highcpu-4	$0.172412 / 1 hour
c2d-highcpu-8	$0.344824 / 1 hour
c2d-highcpu-16	$0.689648 / 1 hour
c2d-highcpu-32	$1.379296 / 1 hour
c2d-highcpu-56	$2.413768 / 1 hour
c2d-highcpu-112	$4.827536 / 1 hour

C3 Series

Machine type	Price (USD)
c3-highcpu-4	$0.19824 / 1 hour
c3-highcpu-8	$0.39648 / 1 hour
c3-highcpu-22	$1.09032 / 1 hour
c3-highcpu-44	$2.18064 / 1 hour
c3-highcpu-88	$4.36128 / 1 hour
c3-highcpu-176	$8.72256 / 1 hour

A2 Series

Machine type	Price (USD)
a2-highgpu-1g	$4.2244949 / 1 hour
a2-highgpu-2g	$8.4489898 / 1 hour
a2-highgpu-4g	$16.8979796 / 1 hour
a2-highgpu-8g	$33.7959592 / 1 hour
a2-megagpu-16g	$64.1020592 / 1 hour
a2-ultragpu-1g	$5.7818474 / 1 hour
a2-ultragpu-2g	$11.5636948 / 1 hour
a2-ultragpu-4g	$23.1273896 / 1 hour
a2-ultragpu-8g	$46.2547792 / 1 hour

When consuming from a reservation or spot capacity, billing is spread across two SKUs: the GCE SKU with the label 'vertex-ai-online-prediction' and the Vertex AI Management Fee SKU. This enables you to use your Committed Use Discounts (CUDs) in Vertex AI.

A3 Series

Machine type	Price (USD)
a3-ultragpu-8g	$96.015616 / 1 hour
a3-megagpu-8g	$106.65474 / 1 hour

A4 Series

Machine type	Price (USD)
a4-highgpu-8g	$148.212 / 1 hour

A4X Series

Machine type	Price (USD)
a4x-highgpu-4g	$74.75 / 1 hour

a4x-highgpu-4g requires at least 18 VMs.

G2 Series

Machine type	Price (USD)
g2-standard-4	$0.81293 / 1 hour
g2-standard-8	$0.98181 / 1 hour
g2-standard-12	$1.15069 / 1 hour
g2-standard-16	$1.31957 / 1 hour
g2-standard-24	$2.30138 / 1 hour
g2-standard-32	$1.99509 / 1 hour
g2-standard-48	$4.60276 / 1 hour
g2-standard-96	$9.20552 / 1 hour

G4 Series

Machine type	Price (USD)
g4-standard-48	$5.1749195 / 1 hour
g4-standard-96	$10.349839 / 1 hour
g4-standard-192	$20.699678 / 1 hour
g4-standard-384	$41.399356 / 1 hour

TPU v5e pricing

Machine type	Price (USD)
ct5lp-hightpu-1t	$1.38 / 1 hour
ct5lp-hightpu-4t	$5.52 / 1 hour
ct5lp-hightpu-8t	$5.52 / 1 hour

TPU v6e pricing

Machine type	Price (USD)
ct6e-standard-1t	$3.105 / 1 hour
ct6e-standard-4t	$12.42 / 1 hour
ct6e-standard-8t	$24.84 / 1 hour

Each machine type is charged as the following SKUs on your Google Cloud bill:

vCPU cost: measured in vCPU hours
RAM cost: measured in GB hours
GPU cost: if either built into the machine or optionally configured, measured in GPU hours

The prices for machine types are used to approximate the total hourly cost for each prediction node of a model version using that machine type.

For example, a machine type of n1-highcpu-32 includes 32 vCPUs and 32 GB of RAM. Therefore, the hourly pricing equals 32 vCPU hours + 32 GB hours.

E2 Series

Item	Price (USD)
vCPU	$0.0250826 / 1 hour
RAM	$0.0033614 / 1 gibibyte hour

N1 Series

Item	Price (USD)
vCPU	$0.036 / 1 hour
RAM	$0.005 / 1 gibibyte hour

N2 Series

Item	Price (USD)
vCPU	$0.0363527 / 1 hour
RAM	$0.0048725 / 1 gibibyte hour

N2D Series

Item	Price (USD)
vCPU	$0.0316273 / 1 hour
RAM	$0.0042389 / 1 gibibyte hour

C2 Series

Item	Price (USD)
vCPU	$0.039077 / 1 hour
RAM	$0.0052325 / 1 gibibyte hour

C2D Series

Item	Price (USD)
vCPU	$0.0339974 / 1 hour
RAM	$0.0045528 / 1 gibibyte hour

C3 Series

Item	Price (USD)
vCPU	$0.03908 / 1 hour
RAM	$0.00524 / 1 gibibyte hour

A2 Series

Item	Price (USD)
vCPU	$0.0363527 / 1 hour
RAM	$0.0048725 / 1 gibibyte hour
GPU (A100 40 GB)	$3.3741 / 1 hour
GPU (A100 80 GB)	$4.51729 / 1 hour

A3 Series

Item	Price (USD)
vCPU	$0.0293227 / 1 hour
RAM	$0.0025534 / 1 gibibyte hour
GPU (H100 80 GB)	$11.2660332 / 1 hour
GPU (H200)	$10.708501 / 1 hour

G2 Series

Item	Price (USD)
vCPU	$0.02874 / 1 hour
RAM	$0.00337 / 1 gibibyte hour
GPU (L4)	$0.64405 / 1 hour

Some machine types allow you to add optional GPU accelerators for prediction. Optional GPUs incur an additional charge, separate from those described in the previous table. View each pricing table, which describes the pricing for each type of optional GPU.

Accelerators - price per hour

GPU type	Price (USD)
NVIDIA_TESLA_P4	$0.69 / 1 hour
NVIDIA_TESLA_P100	$1.679 / 1 hour
NVIDIA_TESLA_T4	$0.402 / 1 hour
NVIDIA_TESLA_V100	$2.852 / 1 hour

Pricing is per GPU. If you use multiple GPUs per prediction node (or if your version scales to use multiple nodes),the costs scale accordingly.

AI Platform Prediction serves predictions from your model by running a number of virtual machines ("nodes"). By default, Vertex AI automatically scales the number of nodes running at any time. For online prediction, the number of nodes scales to meet demand. Each node can respond to multiple prediction requests. For batch prediction, the number of nodes scales to reduce the total time it takes to run a job. You can customize how prediction nodes scale.

You are charged for the time that each node runs for your model, including:

When the node is processing a batch prediction job.
When the node is processing an online prediction request.
When the node is in a ready state for serving online predictions.

The cost of one node running for one hour is a node hour. The table of prediction prices describes the price of a node hour, which varies across regions and between online prediction and batch prediction.

You can consume node hours in fractional increments. For example, one node running for 30 minutes costs 0.5 node hours.

Cost calculations for Compute Engine (N1) machine types

The running time of a node is billed in 30-second increments. This means that every 30 seconds, your project is billed for 30 seconds worth of whatever vCPU, RAM, and GPU resources that your node is using at that moment.

More about automatic scaling of prediction nodes

Online prediction	Batch prediction
The priority of the scaling is to reduce the latency of individual requests. The service keeps your model in a ready state for a few idle minutes after servicing a request.	The priority of the scaling is to reduce the total elapsed time of the job.
Scaling affects your total charges each month: the more numerous and frequent your requests, the more nodes will be used.	Scaling should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.
You can choose to let the service scale in response to traffic (automatic scaling) or you can specify a number of nodes to run constantly to avoid latency (manual scaling). If you choose automatic scaling, the number of nodes scales automatically. For AI Platform Prediction legacy (MLS1) machine type deployments, the number of nodes can scale down to zero for no-traffic durations. Vertex AI deployments and other types of AI Platform Prediction deployments cannot scale down to zero nodes. If you choose manual scaling, you specify a number of nodes to keep running all the time. You are charged for all of the time that these nodes are running, starting at the time of deployment and persisting until you delete the model version.	You can affect scaling by setting a maximum number of nodes to use for a batch prediction job, and by setting the number of nodes to keep running for a model when you deploy it.

Online prediction

Batch prediction

The priority of the scaling is to reduce the latency of individual requests. The service keeps your model in a ready state for a few idle minutes after servicing a request.

The priority of the scaling is to reduce the total elapsed time of the job.

Scaling affects your total charges each month: the more numerous and frequent your requests, the more nodes will be used.

Scaling should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.

You can choose to let the service scale in response to traffic (automatic scaling) or you can specify a number of nodes to run constantly to avoid latency (manual scaling).

If you choose automatic scaling, the number of nodes scales automatically. For AI Platform Prediction legacy (MLS1) machine type deployments, the number of nodes can scale down to zero for no-traffic durations. Vertex AI deployments and other types of AI Platform Prediction deployments cannot scale down to zero nodes.
If you choose manual scaling, you specify a number of nodes to keep running all the time. You are charged for all of the time that these nodes are running, starting at the time of deployment and persisting until you delete the model version.

You can affect scaling by setting a maximum number of nodes to use for a batch prediction job, and by setting the number of nodes to keep running for a model when you deploy it.

Batch prediction jobs are charged after job completion

Batch prediction jobs are charged after job completion, not incrementally during the job. Any Cloud Billing budget alerts that you have configured aren't triggered while a job is running. Before starting a large job, consider running some cost benchmark jobs with small input data first.

Example of a prediction calculation

A real-estate company in an Americas region runs a weekly prediction of housing values in areas it serves. In one month, it runs predictions for four weeks in batches of 3920, 4277, 3849, and 3961. Jobs are limited to one node and each instance takes an average of 0.72 seconds of processing.

First calculate the length of time that each job ran:

Each job ran for more than ten minutes, so it is charged for each minute of processing:

The total charge for the month is $0.26.

This example assumed jobs ran on a single node and took a consistent amount of time per input instance. In real usage, make sure to account for multiple nodes and use the actual amount of time each node spends running for your calculations.

Charges for Vertex Explainable AI

Feature-based explanations

Feature-based explanations come at no extra charge to prediction prices. However, explanations take longer to process than normal predictions, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase prediction charges.

Example-based explanations

Pricing for example-based explanations consists of the following:

When you upload a model or update a model's dataset, you are billed:
per node hour for the batch prediction job that is used to generate the latent space representations of examples. This is billed at the same rate as prediction.
a cost for building or updating indexes. This cost is the same as the indexing costs for Vector Search, which is number of examples * number of dimensions * 4 bytes per float * $3.00 per GB. For example, if you have 1 million examples and 1,000 dimension latent space, the cost is $12 (1,000,000 * 1,000 * 4 * 3.00 / 1,000,000,000).
When you deploy to an endpoint, you are billed per node hour for each node in your endpoint. All compute associated with the endpoint is charged at same rate as prediction. However, because Example-based explanations require additional compute resources to serve the Vector Search index, this results in more nodes being started which increases prediction charges.

Vertex AI Neural Architecture Search

The following tables summarize the pricing in each region where Neural Architecture Search is available.

Prices

The following tables provide the price per hour of various configurations.

You can choose a predefined scale tier or a custom configuration of selected machine types. If you choose a custom configuration, sum the costs of the virtual machines you use.

Accelerator-enabled legacy machine types include the cost of the accelerators in their pricing. If you use Compute Engine machine types and attach accelerators, the cost of the accelerators is separate. To calculate this cost, multiply the prices in the following table of accelerators by the number of each type of accelerator you use.

Machine types

Machine type	Price (USD)
n1-standard-4	$0.2849985 / 1 hour
n1-standard-8	$0.569997 / 1 hour
n1-standard-16	$1.139994 / 1 hour
n1-standard-32	$2.279988 / 1 hour
n1-standard-64	$4.559976 / 1 hour
n1-standard-96	$6.839964 / 1 hour
n1-highmem-2	$0.1774545 / 1 hour
n1-highmem-4	$0.1774545 / 1 hour
n1-highmem-8	$0.709818 / 1 hour
n1-highmem-16	$1.419636 / 1 hour
n1-highmem-32	$2.839272 / 1 hour
n1-highmem-64	$5.678544 / 1 hour
n1-highmem-96	$8.517816 / 1 hour
n1-highcpu-16	$0.8501832 / 1 hour
n1-highcpu-32	$1.7003664 / 1 hour
n1-highcpu-64	$3.4007328 / 1 hour
n1-highcpu-96	$5.1010992 / 1 hour
a2-highgpu-1g	$5.641070651 / 1 hour
a2-highgpu-2g	$11.282141301 / 1 hour
a2-highgpu-4g	$22.564282603 / 1 hour
a2-highgpu-8g	$45.128565205 / 1 hour
a2-highgpu-16g	$90.257130411 / 1 hour
e2-standard-4	$0.20103426 / 1 hour
e2-standard-8	$0.40206852 / 1 hour
e2-standard-16	$0.80413704 / 1 hour
e2-standard-32	$1.60827408 / 1 hour
e2-highmem-2	$0.13559949 / 1 hour
e2-highmem-4	$0.27119898 / 1 hour
e2-highmem-8	$0.54239796 / 1 hour
e2-highmem-16	$1.08479592 / 1 hour
e2-highcpu-16	$0.59364288 / 1 hour
e2-highcpu-32	$1.18728576 / 1 hour
n2-standard-4	$0.291354 / 1 hour
n2-standard-8	$0.582708 / 1 hour
n2-standard-16	$1.165416 / 1 hour
n2-standard-32	$2.330832 / 1 hour
n2-standard-48	$3.496248 / 1 hour
n2-standard-64	$4.661664 / 1 hour
n2-standard-80	$5.82708 / 1 hour
n2-highmem-2	$0.196521 / 1 hour
n2-highmem-4	$0.393042 / 1 hour
n2-highmem-8	$0.786084 / 1 hour
n2-highmem-16	$1.572168 / 1 hour
n2-highmem-32	$3.144336 / 1 hour
n2-highmem-48	$4.716504 / 1 hour
n2-highmem-64	$6.288672 / 1 hour
n2-highmem-80	$7.86084 / 1 hour
n2-highcpu-16	$0.860352 / 1 hour
n2-highcpu-32	$1.720704 / 1 hour
n2-highcpu-64	$3.441408 / 1 hour
n2-highcpu-80	$4.30176 / 1 hour
c2-standard-4	$0.313212 / 1 hour
c2-standard-8	$0.626424 / 1 hour
c2-standard-16	$1.252848 / 1 hour
c2-standard-30	$2.34909 / 1 hour
c2-standard-60	$4.69818 / 1 hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Prices for a2-highgpu instances include the charges for the attached NVIDIA_TESLA_A100 Accelerators.

Accelerators

GPU type	Price (USD)
NVIDIA_TESLA_A100	$4.400862 / 1 hour
NVIDIA_TESLA_P4	$0.90 / 1 hour
NVIDIA_TESLA_P100	$2.19 / 1 hour
NVIDIA_TESLA_T4	$0.525 / 1 hour
NVIDIA_TESLA_V100	$3.72 / 1 hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Disks

Disk type	Price (USD)
pd-standard	$0.000082192 / 1 gibibyte hour
pd-ssd	$0.000349315 / 1 gibibyte hour

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Notes:

All use is subject to the Neural Architecture Search quota policy.
You are required to store your data and program files in Cloud Storage buckets during the Neural Architecture Search lifecycle. See more about Cloud Storage usage.
For volume-based discounts, contact the Sales team.
The disk price is only charged when you configure the disk size of each VM to be larger than 100 GB. There is no charge for the first 100 GB (the default disk size) of disk for each VM. For example, if you configure each VM to have 105 GB of disk, then you are charged for 5 GB of disk for each VM.

Required use of Cloud Storage

In addition to the costs described in this document, you are required to store data and program files in Cloud Storage buckets during the Neural Architecture Search lifecycle. This storage is subject to the Cloud Storage pricing policy.

Required use of Cloud Storage includes:

Staging your training application package.
Storing your training input data.

Note: You can use another Google Cloud service to store your input data, such as BigQuery, which has its own associated pricing.

Storing the output of your jobs. Neural Architecture Search doesn't require long-term storage of these items. You can remove the files as soon as the operation is complete.

Free operations for managing your resources

The resource management operations provided by Neural Architecture Search are available free of charge. The Neural Architecture Search quota policy does limit some of these operations.

Resource	Free operations
jobs	get, list, cancel
operations	get, list, cancel, delete

Vertex AI Pipelines

Vertex AI Pipelines charges a run execution fee of $0.03 per Pipeline Run. You are not charged the execution fee during the Preview release. You also pay for Google Cloud resources you use with Vertex AI Pipelines, such as Compute Engine resources consumed by pipeline components (charged at the same rate as for Vertex AI training). Finally, you are responsible for the cost of any services (such as Dataflow) called by your pipeline.

Vertex AI Feature Store

Vertex AI Feature Store is Generally Available (GA) since November 2023. For information on the previous version of the product go to Vertex AI Feature Store (Legacy).

New Vertex AI Feature Store

The new Vertex AI Feature Store supports functionality across 2 types of operations:

Offline operations are operations to transfer, store, retrieve and transform data in the offline store (BigQuery)
Online operations are operations to transfer data into the online store(s) and operations on data while it is in the online store(s).

Offline Operations Pricing

Since BigQuery is used for offline operations, please refer to BigQuery pricing for functionality such as ingestion to the offline store, querying the offline store, and offline storage.

Online Operations Pricing

For online operations, Vertex AI Feature Store charges for any GA features to transfer data into the online store, serve data or store data. A node-hour represents the time a virtual machine spends to complete an operation, charged to the minute.

Operation	Price (USD)
Data processing node Data processing (e.g. ingesting into any online store, monitoring, etc.)	$0.08 / 1 hour
Optimized online serving node Low latency serving and embeddings serving Each node includes 200GB of storage	$0.30 / 1 hour
Bigtable online serving node Serving with Cloud Bigtable	$0.94 / 1 hour
Bigtable online serving storage Storage for serving with Cloud Bigtable	$0.000342466 / 1 gibibyte hour

Optimized online serving and Bigtable online serving use different architectures, so their nodes are not comparable.

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Online Operations Workload Estimates

Consider the following guidelines when estimating your workloads. The number of nodes required for a given workload may differ across each serving approach.

Data processing:
Ingestion - One node can ingest approximately a minimum of 100 MiB of data per hour into a Bigtable Online Store or an Optimized Online Store if no analytical functions are used.
Bigtable online serving: Each node can support approximately 15,000 QPS and up to 5 TB of storage.
Optimized online serving: Performance is based on the machine type and replicas, which are automatically configured to minimize costs subject to the workload. Each node can have a minimum of 2 and a maximum of 6 replicas for high availability and autoscaling. You're charged for the number of replicas accordingly. For more details, see the example monthly scenarios.
For non embeddings-related workloads, each node can support approximately 500 QPS and up to 200 GB of storage.
For embeddings-related workloads, each node can support approximately 500 QPS and up to 4 GB of storage of 512 dimensional data.

You can view the number of nodes (with replicas) in the Metric Explorer:

Metric Explorer to figure out the number of nodes been used.

Example Monthly Scenarios (assuming us-central1)

Data streaming workload - Bigtable online serving with 2.5 TB of data (1 GB refreshed daily) and 1200 QPS

Operations	Monthly Usage	Monthly Cost
Data processing node	(1 GB/day) * (30 days/month) * (1,000 MB/GB) * (1 node-hr / 100 MB) = 300 node-hr	300 node-hr * ($0.08 per node-hr) = $24
Optimized online serving node	N/A	N/A
Bigtable online serving node	(1 node) * (24 hr/day) * (30 days/month) = 720 node-hr	720 node-hr * ($0.94 per node-hr) = $677
Bigtable online serving storage	(2.5 TB-month) * (1000 GB/TB) = 2500 GB-month	2500 GB-month * ($0.25 per GB-month) = $625
Total		$1,326

High QPS workload - Optimized online serving with 10GB of non-embedding data (5GB refreshed daily) and 2000QPS

Operations	Monthly Usage	Monthly Cost
Data processing node	(5 GB/day) * (30 days/month) * (1,000 MB/GB) * (1 node-hr / 100MB) = 1500 node-hr	1500 node-hr * ($0.08 per node-hr) = $120
Optimized online serving node	Roundup(10GB * (1 node / 200 GB)) = 1 * max(2 default replicas, 2000 QPS * (1 replica / 500 QPS)) = 4 total nodes * (24 hr/day) * (30days/month) =2880 node-hr	2880 node-hr * (0.30 per node-hr) = $864
Bigtable online serving node	N/A	N/A
Bigtable online serving storage	N/A	N/A
Total		$984

Embeddings serving workload - Optimized online serving with 20GB of embeddings data (2GB refreshed daily) and 800QPS

Operations	Monthly Usage	Monthly Cost
Data processing node	(2 GB/day) * (30 days/month) * (1,000 MB/GB) * (1 node-hr / 100MB) = 600 node-hr	600 node-hr * ($0.08 per node-hr) = $48
Optimized online serving node	Roundup(20GB* (1 node / 4GB) = 5 * max(2 default replicas, 800 QPS * (1 replica / 500 QPS)) = 10 total nodes * (24 hr/day) * (30days/month) = 7200 node-hr	7200 node-hr * (0.30 per node-hr) = $2160
Bigtable online serving node	N/A	N/A
Bigtable online serving storage	N/A	N/A
Total		$2,208

Vertex AI Feature Store (Legacy)

Prices for Vertex AI Feature Store (Legacy) are based on the amount of feature data in online and offline storage as well as the availability of online serving. A node per hour represents the time a virtual machine spends serving feature data or waiting in a ready state to handle feature data requests.

Operation	Price (USD)
Online storage	$0.25 per GB-month
Offline storage	$0.023 per GB-month
Online serving	$0.94 per node per hour
Batch export	$0.005 per GB
Streaming ingestion	$0.10 per GB of ingestion

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

When you enable feature value monitoring, billing includes applicable charges above in addition to applicable charges that follow:

$3.50 per GB for all data analyzed. With snapshot analysis enabled, snapshots taken for data in Vertex AI Feature Store (Legacy) are included. With import feature analysis enabled, batches of ingested data are included.
Additional charges for other Vertex AI Feature Store (Legacy) operations used with feature value monitoring include the following:
The snapshot analysis feature periodically takes a snapshot of the feature values based on your configuration for the monitoring interval.
The charge for a snapshot export is the same as a regular batch export operation.

Snapshot Analysis Example

A data scientist enables feature value monitoring for their Vertex AI Feature Store (Legacy) and turns on monitoring for a daily snapshot analysis. A pipeline runs daily for the entity types monitoring. The pipeline scans 2GB of data in Vertex AI Feature Store (Legacy) and exports a snapshot containing 0.1GB of data. The total charge for one day's analysis is:

(0.1 GB * $3.50) + (2 GB * $0.005) = $0.36

Ingestion Analysis Example

A data scientist enables feature value monitoring for their Vertex AI Feature Store (Legacy) and turns on monitoring for ingestion operations. An ingestion operation imports 1GB of data into Vertex AI Feature Store (Legacy). The total charge for feature value monitoring is:

(1 GB * $3.50) = $3.50

Vertex AI TensorBoard

To use Vertex AI TensorBoard, request that the IAM administrator of the project assign you to the role "Vertex AI TensorBoard Web App User". The Vertex AI Administrator role also has access.

Beginning in August 2023, Vertex AI TensorBoard pricing changed from a per-user monthly license of $300/month to $10 GiB/month for data storage of logs and metrics. This means no more subscription fees. You will only pay for the storage you’ve used. See the Vertex AI TensorBoard: Delete Outdated TensorBoard Experiments tutorial for how to manage storage.

Vertex AI Vizier

Vertex AI Vizier is a black-box optimization service inside Vertex AI. The Vertex AI Vizier pricing model consists of the following:

There is no charge for trials that use RANDOM_SEARCH and GRID_SEARCH. Learn more about the search algorithms.
The first 100 Vertex AI Vizier trials per calendar month are available at no charge (trials using RANDOM_SEARCH and GRID_SEARCH do not count against this total).
After 100 Vertex AI Vizier trials, subsequent trials during the same calendar month are charged at $1 per trial (trials that use RANDOM_SEARCH or GRID_SEARCH incur no charges).

Vector Search

Pricing for Vector Search Approximate Nearest Neighbor service consists of:

Per node hour pricing for each VM used to host a deployed index.
A cost for building new indexes, updating existing indexes, and using streaming index updates.

Data processed during building and updating indexes is measured in binary gigabytes (GiB), where 1 GiB is 1,073,741,824 bytes. This unit of measurement is also known as a gibibyte.

Vector Search charges $3.00 per gibibyte (GiB) of data processed in all regions. Vector Search charges $0.45/GiB ingested for Streaming Update inserts.

The following tables summarize the pricing of an index serving in each region where the Vector Search is available. The price corresponds to the machine type, by region, and is charged per node hour.

Machine type	Price (USD)
e2-standard-2	$0.0938084 / 1 hour
e2-standard-16	$0.7504672 / 1 hour
e2-highmem-16	$1.012368 / 1 hour
n2d-standard-32	$1.8926208 / 1 hour
n1-standard-16	$1.0639944 / 1 hour
n1-standard-32	$2.1279888 / 1 hour

Vector Search pricing examples

Vector Search pricing is determined by the size of your data, the amount of queries per second (QPS) you want to run, and the number of nodes you use. To get your estimated serving cost, you need to calculate your total data size. Your data size is the number of your embeddings/vectors* the number of dimensions you have* 4 bytes per dimension. After you have the size of your data you can calculate the serving cost and the building cost. The serving cost plus the building cost equals your monthly total cost.

Serving cost: # replicas/shard * # shards (~data size/shard size) * hourly cost * 730 hours
Building cost: data size(in GiB) * $3/GiB * # of updates/month

Streaming update: Vector Search uses heuristics-based metrics to determine when to trigger compaction. If the oldest uncompacted data is five days old, compaction is always triggered. You are billed for the cost of rebuilding the index at the same rate of a batch update, in addition to the streaming update costs.

Number of embeddings/vectors	Number of dimensions	Queries per second (QPS)	Machine Type	Nodes	Estimated monthly serving cost (USD)
2 million	128	100	e2-standard-2	1	$68
20 million	256	1,000	e2-standard-16	1	$547
20 million	256	3,000	e2-standard-16	3	$1,642
100 million	256	500	e2-highmem-16	2	$1,477
1 billion	100	500	e2-highmem-16	8	$5,910

All examples are based on machine types in us-central1. The cost you incur will vary with recall rate and latency requirements. The estimated monthly serving cost is directly related to the number of nodes used in the console. To learn more about configuration parameters that affect cost, see Configuration parameters which affect recall and latency.

If you have high queries per second (QPS), batching these queries can reduce total costs up to 30%-40%.

Storage-optimized Vector Search tier

Instead of managing and paying for individual VMs and separate compaction jobs, there are two SKUs:

Capacity Units (CUs): This is the primary meter for serving. A CU is a bundled unit of compute and active SSD storage, up to 1 TiB each, priced at $2.30 per CU, per hour, per replica. This single rate includes the cost of serving and ongoing index management. The service auto-scales and additional CUs are added, as the index size grows.
Write Units: To add data, there is a charge of $0.45 per GiB. The charge applies to both batch and streaming index updates.

Unit	Price
Storage-Optimized Capacity Unit	$2.30 / 1 hour
Write Unit	$0.45 / 1 gibibyte

Vertex AI Model Registry

The Vertex AI Model Registry is a central repository which tracks and lists your models and model versions. You can import models into Vertex AI and they appear in the Vertex AI Model Registry. There is no cost associated with having your models in the Model Registry. Cost is only incurred when you deploy the model to an endpoint or perform a batch prediction on the model. This cost is determined by the type of model you are deploying.

To learn more about pricing for deploying custom models from the Vertex AI Model Registry, see Custom-trained models. To learn more about pricing for deploying AutoML models, see Pricing for AutoML models.

Vertex AI Model Monitoring

Vertex AI enables you to monitor the continued effectiveness of your model after you deploy it to production. For more information, see Introduction to Vertex AI Model Monitoring.

When you use Vertex AI Model Monitoring, you are billed for the following:

$3.50 per GB for all data analyzed, including the training data provided and prediction data logged in a BigQuery table.
Charges for other Google Cloud products that you use with Model Monitoring, such as BigQuery storage or Batch Explain when attribution monitoring is enabled.

Vertex AI Model Monitoring is supported in the following regions: us-central1, europe-west4, asia-east1, and asia-southeast1. Prices are the same for all regions.

Data sizes are measured after they are converted to TfRecord format.

Training datasets incur a one-time charge when you set up a Vertex AI Model Monitoring job.

Prediction Datasets consist of logs collected from the Online Prediction service. As prediction requests arrive during different time windows, the data for each time window is collected and the sum of the data analyzed for each prediction window is used to calculate the charge.

Example: A data scientist runs model monitoring on the prediction traffic belonging to their model.

The model is trained from a BigQuery dataset. The data size after converting to TfRecord is 1.5GB.
Prediction data logged between 1:00 - 2:00 p.m. is 0.1 GB, between 3:00 - 4:00 p.m. is 0.2 GB.
The total price for setting up the model monitoring job is:
(1.5 GB * $3.50) + ((0.1 GB + 0.2 GB) * $3.50) = $6.30

Vertex AI Workbench

Select instances, managed notebooks, or user-managed notebooks for pricing information.

Instances

The tables below provide the approximate price per hour of various VM configurations. You can choose a custom configuration of selected machine types. To calculate pricing, sum the costs of the virtual machines you use.

CPUs

Machine type	Price per vCPU (USD)
N1	$0.0379332 / 1 hour
N2	$0.0379332 / 1 hour
E2	$0.026173908 / 1 hour
A2	$0.0379332 / 1 hour

Memory

Machine type	Price (USD)
N1	$0.0050844 / 1 gibibyte hour
N2	$0.0050844 / 1 gibibyte hour
E2	$0.003508236 / 1 gibibyte hour
A2	$0.0050844 / 1 gibibyte hour

Accelerators

Machine type	Price (USD)
Nvidia Tesla A100	$4.400862 / 1 hour
Nvidia Tesla A100 80GB	$4.51729 / 1 hour
Nvidia Tesla T4	$0.525 / 1 hour
Nvidia Tesla V100	$3.72 / 1 hour
Nvidia Tesla P100	$2.19 / 1 hour

Disks

Machine type	Price (USD)
Hyperdisk Extreme provisioned space	$0.000205479 / 1 gibibyte hour
Balanced provisioned space	$0.000164384 / 1 gibibyte hour
Extreme provisioned space	$0.000205479 / 1 gibibyte hour
SSD provisioned space	$0.000279452 / 1 gibibyte hour
Standard provisioned space	$0.000065753 / 1 gibibyte hour

Your Vertex AI Workbench instance incurs charges as follows:

For CPU and accelerator usage, you're charged when the instance is in the following states:
STARTING
PROVISIONING
ACTIVE
UPGRADING
ROLLBACKING
RESTORING
STOPPING
SUSPENDING
For disk storage, you're charged when the instance is in the following states:
STARTING
PROVISIONING
ACTIVE
UPGRADING
ROLLBACKING
RESTORING
STOPPING
STOPPED
SUSPENDING
SUSPENDED

Managed notebooks

Pricing is composed of the compute and storage resources that you use, management fees for your Vertex AI Workbench instances, and any additional cloud resources that you use. See the following sections for more details.

Compute and storage resources

Management fees

There are Vertex AI Workbench management fees in addition to your infrastructure usage, captured in the tables below.

SKU	Price (USD)
vCPU	$0.05 per vCPU
T4 and P4 (Standard GPU)	$0.35 per GPU
P100, V100, L4, and A100 GPU (Premium GPU)	$2.48 per GPU

User-managed notebooks

Compute and storage resources

Management fees

There are Vertex AI Workbench management fees in addition to your infrastructure usage, captured in the tables below.

SKU	Price (USD)
vCPU	$0.005 per vCPU
T4 and P4 (Standard GPU)	$0.035 per GPU
P100, V100, and A100 GPU (Premium GPU)	$0.25 per GPU

Additional Google Cloud resources

In addition to the costs mentioned previously, you also pay for any Google Cloud resources that you use. For example:

Data analysis services: You incur BigQuery costs when you issue SQL queries within a notebook (see BigQuery pricing).
Customer-managed encryption keys: You incur costs when you use customer-managed encryption keys. Each time your managed notebooks or user-managed notebooks instance uses a Cloud Key Management Service key, that operation is billed at the rate of Cloud KMS key operations (see Cloud Key Management Service pricing).

Colab Enterprise

Deep Learning Containers, Deep Learning VM, and AI Platform Pipelines

For Deep Learning Containers, Deep Learning VM Images, and AI Platform Pipelines, pricing is calculated based on the compute and storage resources that you use. These resources are charged at the same rate you currently pay for Compute Engine and Cloud Storage.

In addition to the compute and storage costs, you also pay for any Google Cloud resources that you use. For example:

Data analysis services: You incur BigQuery costs when you issue SQL queries within a notebook (see BigQuery pricing).
Customer-managed encryption keys: You incur costs when you use customer-managed encryption keys. Each time your managed notebooks or user-managed notebooks instance uses a Cloud Key Management Service key, that operation is billed at the rate of Cloud KMS key operations (see Cloud Key Management Service pricing).

Data labeling

Vertex AI enables you to request human labeling for a collection of data that you plan to use to train a custom machine learning model. Prices for the service are computed based on the type of labeling task.

For regular labeling tasks, the prices are determined by the number of annotation units.
For an image classification task, units are determined the number of images and the number of human labelers. For example, an image with 3 human labelers counts for 1 * 3 = 3 units. The price for single-label and multi-label classification are the same.
For an image bounding box task, units are determined by the number of bounding boxes identified in the images and the number of human labelers. For example, if an image with 2 bounding boxes and 3 human labelers counts for 2 * 3 = 6 units. Images without bounding boxes will not be charged.
For an image segmentation/rotated box/polyline/polygon task, units are determined in the same way as a image bounding box task.
For a video classification task, units are determined by the video length (every 5 seconds is a price unit) and the number of human labelers. For example, a 25 seconds video with 3 human labelers counts for 25 / 5 * 3 = 15 units. The price for single-label and multi-label classification are the same.
For a video object tracking task, unit are determined by the number of objects identified in the video and the number of human labelers. For example, for a video with 2 objects and 3 human labelers, it counts for 2 * 3 = 6 units. Video without objects will not be charged.
For a video action recognition task, units are determined in the same way as a video object tracking task.
For a text classification task, units are determined by text length (every 50 words is a price unit) and the number of human labelers. For example, one piece of text with 100 words and 3 human labelers counts for 100 / 50 * 3 = 6 units. The price for single-label and multi-label classification is the same.
For a text sentiment task, units are determined in the same way as a text classification task.
For a text entity extraction task, units are determined by text length (every 50 words is a price unit), the number of entities identified, and the number of human labelers. For example, a piece of text with 100 words, 2 entities identified, and 3 human labelers counts for 100 / 50 * 2 * 3 = 12 units. Text without entities will not be charged.
For image/video/text classification and text sentiment tasks, human labelers may lose track of classes if the label set size is too large. As a result, we send at most 20 classes to the human labelers at a time. For example, if the label set size of a labeling task is 40, each data item will be sent for human review 40 / 20 = 2 times, and we will charge 2 times of the price (calculated above) accordingly.
For a labeling task that enables the custom labeler feature, each data item is counted as 1 custom labeler unit.
For an active learning labeling task for data items with annotations that are generated by models (without a human labeler's help), each data item is counted as 1 active learning unit.
For an active learning labeling task for data items with annotations that are generated by human labelers, each data item is counted as a regular labeling task as described above.

The table below provides the price per 1,000 units per human labeler, based on the unit listed for each objective. Tier 1 pricing applies to the first 50,000 units per month in each Google Cloud project; Tier 2 pricing applies to the next 950,000 units per month in the project, up to 1,000,000 units. Contact us for pricing above 1,000,000 units per month.

Data type	Objective	Unit	Tier 1 price (USD)	Tier 2 price (USD)
Image	Classification	Image	$35	$25
	Bounding box	Bounding box	$63	$49
	Segmentation	Segment	$870	$850
	Rotated box	Bounding box	$86	$60
	Polygon/polyline	Polygon/Polyline	$257	$180
Video	Classification	5 sec video	$86	$60
	Object tracking	Bounding box	$86	$60
	Action recognition	Event in 30 sec video	$214	$150
Text	Classification	50 words	$129	$90
	Sentiment	50 words	$200	$140
	Entity extraction	Entity	$86	$60
Active Learning	All	Data item	$80	$56
Custom Labeler	All	Data item	$80	$56

Required use of Cloud Storage

In addition to the costs described in this document, you are required to store data and program files in Cloud Storage buckets during the Vertex AI lifecycle. This storage is subject to the Cloud Storage pricing policy.

Required use of Cloud Storage includes:

Staging your training application package for custom-trained models.
Storing your training input data.
Storing the output of your training jobs. Vertex AI does not require long-term storage of these items. You can remove the files as soon as the operation is complete.

Free operations for managing your resources

The resource management operations provided by AI Platform are available free of charge. The AI Platform quota policy does limit some of these operations.

Resource	Free operations
models	create, get, list, delete
versions	create, get, list, delete, setDefault
jobs	get, list, cancel
operations	get, list, cancel, delete

Google Cloud costs

If you store images to be analyzed in Cloud Storage or use other Google Cloud resources in tandem with Vertex AI, then you will also be billed for the use of those services.

To view your current billing status in the Google Cloud console, including usage and your current bill, see the Billing page. For more details about managing your account, see the Cloud Billing Documentation or Billing and Payments Support.

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.