AI Gateway Overview
Use popular AI models in your code, without needing to manage API keys or external accounts.
Overview
Section titled “Overview”The AI Gateway service simplifies technical and operational concerns when using AI inference in your code, by removing the need to:
- Open an account with each provider you want to use.
- Maintain a separate credit balance with each provider.
- Copy the API key from each provider to your projects on Netlify.
How it works
Section titled “How it works”By default, Netlify automatically sets the appropriate environment variables that AI client libraries typically use for configuration, in all Netlify compute contexts (e.g., Netlify Functions, Edge Functions, Preview Server, etc.).
These variables include:
- API keys for OpenAI, Anthropic, and Google Gemini.
- A custom base URL for each provider, to route requests via the AI Gateway service.
These variables are picked up by the official client libraries of these providers, so no extra configuration is necessary. Alternatively, if you make AI calls via a provider’s REST API, these values are easy to incorporate in your code.
When receiving a request from a client, the AI Gateway makes the call to the AI provider on your behalf. Then, it bills your Netlify account by converting the actual token usage in the request into credits, using your existing credit quota.
The AI Gateway does not store your prompts or model outputs. Learn more about Security and Privacy for AI features.
To opt-out of the AI Gateway, see Using the AI Gateway below.
Support in web frameworks
Section titled “Support in web frameworks”When you develop server-side code with any web framework supported by Netlify (e.g., Astro; Tanstack Start; Next.js, Gatsby, Nuxt, etc.), your code is packaged in Netlify Functions and Edge Functions under the hood, as part of the build process.
Therefore, the above environment variables are available as when explicitly using Netlify compute primitives, without any further settings required.
Using the AI Gateway
Section titled “Using the AI Gateway”For a quickstart, check out our Quickstart for AI Gateway.
The AI Gateway is available by default in all credit-based plans, unless:
- Netlify AI Features are turned off in your team settings.
- The environment variable
AI_GATEWAY_INJECTION
is set tofalse
for a project or team. - If you have API keys set via environment variables for any of the AI providers supported by the gateway (OpenAI, Anthropic, Google Gemini), these are not replaced - Netlify does not override usage of your own keys. You can set your own keys, or remove these, at any point.
For full information on which environment variables are automatically set, and how to control this behavior, see here.
Using official client libraries
Section titled “Using official client libraries”If you’re using any of the following libraries, no configuration is required:
- OpenAI TypeScript and JavaScript API Library
- Anthropic TypeScript API Library (for Claude models)
- Google Gen AI SDK for TypeScript and JavaScript (for Google Gemini models)
Using official REST APIs
Section titled “Using official REST APIs”// Anthropic Claude APIconst ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;const ANTHROPIC_BASE_URL = process.env.ANTHROPIC_BASE_URL;
async function callAnthropic() { const response = await fetch(`${ANTHROPIC_BASE_URL}/v1/messages`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': ANTHROPIC_API_KEY, 'anthropic-version': '2023-06-01' }, body: JSON.stringify({ model: 'claude-sonnet-4-5-20250929', max_tokens: 1024, messages: [{ role: 'user', content: 'Hello!' }] }) }); return await response.json();}
// OpenAI APIconst OPENAI_API_KEY = process.env.OPENAI_API_KEY;const OPENAI_BASE_URL = process.env.OPENAI_BASE_URL;
async function callOpenAI() { const response = await fetch(`${OPENAI_BASE_URL}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${OPENAI_API_KEY}` }, body: JSON.stringify({ model: 'gpt-5', messages: [{ role: 'user', content: 'Hello!' }] }) }); return await response.json();}
// Google Gemini APIconst GEMINI_API_KEY = process.env.GEMINI_API_KEY;const GEMINI_BASE_URL = process.env.GOOGLE_GEMINI_BASE_URL;
async function callGemini() { const response = await fetch( `${GEMINI_BASE_URL}/v1beta/models/gemini-2.5-pro:generateContent?key=${GEMINI_API_KEY}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ contents: [{ parts: [{ text: 'Hello!' }] }] }) } ); return await response.json();}
Using third-party client libraries
Section titled “Using third-party client libraries”If you are using a client library that does not work out-of-the-box with the environment variables set for the AI Gateway, you need to manually pass the API key and base URL as arguments to the library.
This is similar to manually reading & passing variable values when using a provider’s REST API. See Using official REST APIs above for the relevant variable names.
Managing environment variables
Section titled “Managing environment variables”If you have already set an API key or base URL at the project or team level, Netlify will never override it.
When a Netlify Function or Edge Function is initialized, the following environment variables are set to the appropriate values for the AI Gateway:
OPENAI_API_KEY
andOPENAI_BASE_URL
- unless any of these is already set by you at the project or team level.ANTHROPIC_API_KEY
andANTHROPIC_BASE_URL
- unless any of these is already set by you.GEMINI_API_KEY
andGOOGLE_GEMINI_BASE_URL
- unless any of these is already set by you, or if eitherGOOGLE_API_KEY
orGOOGLE_VERTEX_BASE_URL
are set.
For each provider, the above check is done separately. Meaning, if you have only set OPENAI_API_KEY
to your own API key, it will not be overridden (and neither would OPENAI_BASE_URL
be set) - but the values for Anthropic and Google will be set.
AI_GATEWAY_API_KEY
and AI_GATEWAY_BASE_URL
environment variables are always injected into the AI Gateway-supported runtimes. If you want to mix different setups with your own keys and Netlify’s or you want to be explicit about using AI Gateway keys in your calls, use these env variables as they will never collide with other environment variables values.
Disabling automatic environment variables
Section titled “Disabling automatic environment variables”To prevent any variables from being automatically set, you can:
- Disable AI Features at the Team Settings level (this requires Team Owner permissions),
- Or, create an environment variable named
AI_GATEWAY_INJECTION
and set its value tofalse
. You can define this variable at either the project or the team level.
Model availability
Section titled “Model availability”The AI Gateway supports the following AI providers and models.
AI Provider | Model |
---|---|
OpenAI | gpt-5 |
OpenAI | gpt-5-codex |
OpenAI | gpt-5-mini |
OpenAI | gpt-5-nano |
OpenAI | gpt-4.1 |
OpenAI | gpt-4.1-mini |
OpenAI | gpt-4.1-nano |
OpenAI | gpt-4o |
OpenAI | gpt-4o-mini |
OpenAI | o4-mini |
OpenAI | o3 |
OpenAI | o3-mini |
OpenAI | codex-mini-latest |
Anthropic | claude-opus-4-1-20250805 |
Anthropic | claude-opus-4-20250514 |
Anthropic | claude-sonnet-4-5-20250929 |
Anthropic | claude-sonnet-4-20250514 |
Anthropic | claude-3-7-sonnet-20250219 |
Anthropic | claude-3-7-sonnet-latest |
Anthropic | claude-3-5-haiku-20241022 |
Anthropic | claude-3-5-haiku-latest |
Anthropic | claude-3-haiku-20240307 |
gemini-2.5-pro | |
gemini-flash-latest | |
gemini-2.5-flash | |
gemini-2.5-flash-preview-09-2025 | |
gemini-flash-lite-latest | |
gemini-2.5-flash-lite | |
gemini-2.5-flash-lite-preview-09-2025 | |
gemini-2.5-flash-image-preview | |
gemini-2.0-flash | |
gemini-2.0-flash-lite |
You can also programatically access the up-to-date list in JSON format via a public API endpoint: https://api.netlify.com/api/v1/ai-gateway/providers
.
Pricing
Section titled “Pricing”To understand pricing for AI Gateway, check out our Pricing for AI features docs.
Rate limits
Section titled “Rate limits”The AI Gateway has two types of limits: Requests Per Minute (RPM) and Tokens Per Minute (TPM). Limits are per model and differ by your account plan.
The rate limit is scoped to your account. Meaning, requests made and tokens used for any project in your account are counted together towards your limits.
Enterprise customers have extended limits - contact your Account Manager to learn more.
Requests per minute (RPM)
Section titled “Requests per minute (RPM)”AI Provider | Model | Free plan | Personal plan | Pro plan |
---|---|---|---|---|
OpenAI | gpt-5 | 6 | 30 | 60 |
OpenAI | gpt-5-codex | 6 | 30 | 60 |
OpenAI | gpt-5-mini | 10 | 50 | 80 |
OpenAI | gpt-5-nano | 50 | 100 | 150 |
OpenAI | gpt-4.1 | 6 | 30 | 60 |
OpenAI | gpt-4.1-mini | 10 | 50 | 80 |
OpenAI | gpt-4.1-nano | 50 | 100 | 150 |
OpenAI | gpt-4o | 6 | 30 | 60 |
OpenAI | gpt-4o-mini | 50 | 100 | 150 |
OpenAI | o4-mini | 6 | 30 | 60 |
OpenAI | o3 | 3 | 6 | 20 |
OpenAI | o3-mini | 6 | 30 | 60 |
OpenAI | codex-mini-latest | 6 | 30 | 60 |
Anthropic | claude-opus-4-1-20250805 | 3 | 6 | 20 |
Anthropic | claude-opus-4-20250514 | 3 | 6 | 20 |
Anthropic | claude-sonnet-4-5-20250929 | 6 | 30 | 60 |
Anthropic | claude-sonnet-4-20250514 | 6 | 30 | 60 |
Anthropic | claude-3-7-sonnet-20250219 | 6 | 30 | 60 |
Anthropic | claude-3-7-sonnet-latest | 6 | 30 | 60 |
Anthropic | claude-3-5-haiku-20241022 | 10 | 50 | 80 |
Anthropic | claude-3-5-haiku-latest | 10 | 50 | 80 |
Anthropic | claude-3-haiku-20240307 | 50 | 100 | 150 |
gemini-2.5-pro | 6 | 30 | 60 | |
gemini-flash-latest | 10 | 50 | 80 | |
gemini-2.5-flash | 10 | 50 | 80 | |
gemini-2.5-flash-preview-09-2025 | 10 | 50 | 80 | |
gemini-flash-lite-latest | 50 | 100 | 150 | |
gemini-2.5-flash-lite | 50 | 100 | 150 | |
gemini-2.5-flash-lite-preview-09-2025 | 50 | 100 | 150 | |
gemini-2.0-flash | 50 | 100 | 150 | |
gemini-2.0-flash-lite | 50 | 100 | 150 | |
gemini-2.5-flash-image-preview | 3 | 6 | 20 |
Tokens per minute (TPM)
Section titled “Tokens per minute (TPM)”For TPM, both input and output tokens are counted towards the limit.
However, cached input tokens are excluded for Anthropic models, and included for other providers.
AI Provider | Model | Free plan | Personal plan | Pro plan |
---|---|---|---|---|
OpenAI | gpt-5 | 18,000 | 90,000 | 180,000 |
OpenAI | gpt-5-codex | 18,000 | 90,000 | 180,000 |
OpenAI | gpt-5-mini | 60,000 | 300,000 | 480,000 |
OpenAI | gpt-5-nano | 300,000 | 600,000 | 900,000 |
OpenAI | gpt-4.1 | 18,000 | 90,000 | 180,000 |
OpenAI | gpt-4.1-mini | 50,000 | 250,000 | 400,000 |
OpenAI | gpt-4.1-nano | 250,000 | 500,000 | 750,000 |
OpenAI | gpt-4o | 18,000 | 90,000 | 180,000 |
OpenAI | gpt-4o-mini | 250,000 | 500,000 | 750,000 |
OpenAI | o4-mini | 30,000 | 150,000 | 300,000 |
OpenAI | o3 | 90,000 | 180,000 | 600,000 |
OpenAI | o3-mini | 30,000 | 150,000 | 300,000 |
OpenAI | codex-mini-latest | 30,000 | 150,000 | 300,000 |
Anthropic | claude-opus-4-1-20250805 | 1,800 | 3,600 | 12,000 |
Anthropic | claude-opus-4-20250514 | 1,800 | 3,600 | 12,000 |
Anthropic | claude-sonnet-4-5-20250929 | 18,000 | 90,000 | 180,000 |
Anthropic | claude-sonnet-4-20250514 | 18,000 | 90,000 | 180,000 |
Anthropic | claude-3-7-sonnet-20250219 | 18,000 | 90,000 | 180,000 |
Anthropic | claude-3-7-sonnet-latest | 18,000 | 90,000 | 180,000 |
Anthropic | claude-3-5-haiku-20241022 | 1,200 | 6,000 | 9,600 |
Anthropic | claude-3-5-haiku-latest | 1,200 | 6,000 | 9,600 |
Anthropic | claude-3-haiku-20240307 | 6,000 | 12,000 | 18,000 |
gemini-2.5-pro | 24,000 | 120,000 | 240,000 | |
gemini-flash-latest | 8,000 | 40,000 | 64,000 | |
gemini-2.5-flash | 8,000 | 40,000 | 64,000 | |
gemini-2.5-flash-preview-09-2025 | 8,000 | 40,000 | 64,000 | |
gemini-flash-lite-latest | 50,000 | 100,000 | 150,000 | |
gemini-2.5-flash-lite | 50,000 | 100,000 | 150,000 | |
gemini-2.5-flash-lite-preview-09-2025 | 50,000 | 100,000 | 150,000 | |
gemini-2.0-flash | 50,000 | 100,000 | 150,000 | |
gemini-2.0-flash-lite | 50,000 | 100,000 | 150,000 | |
gemini-2.5-flash-image-preview | 3,000 | 6,000 | 20,000 |
Limitations
Section titled “Limitations”The AI Gateway has the following limitations at this time:
- Built-in tool usage (a.k.a server tools - tools that the provider manages and runs on their server, such as web search) is not currently supported.
- Note: custom tools (a.k.a. client tools), which you run in your own code, are supported.
- The context window (input prompt) is limited to 200k tokens.
- Prompt caching:
- Anthropic Claude: only the default 5-minute ephemeral cache duration is supported for Claude.
- OpenAI: the AI Gateway sets a per-account prompt_cache_key.
- Google Gemini: explicit context caching is not supported.
- The AI Gateway does not pass through any request headers (and thus you cannot enable proprietary experimental features via headers).
- Batch inference is not supported.
- Priority processing (an OpenAI feature) is not supported.
Did you find this doc useful?
Your feedback helps us improve our docs.