How to Deploy Claude on Google Cloud Vertex AI: Setup & Enterprise Patterns

Q: Which GCP regions support Claude on Vertex AI?

us-central1 has the broadest model coverage. EU support is available in europe-west4 and europe-west1. APAC coverage includes asia-southeast1 (Singapore) and asia-northeast1 (Tokyo). Regional availability changes as Google and Anthropic expand capacity — always verify in the Vertex AI console before deploying.

Key Takeaways

Claude 3 and 3.5 models (Haiku, Sonnet, Opus) are available through Vertex AI Model Garden
Authentication uses Google's Workload Identity Federation — no long-lived API keys in production
VPC Service Controls create a security perimeter around Vertex AI, blocking exfiltration paths
Vertex AI offers Claude via both the Predictions API and the unified Vertex Generative AI SDK
Data residency is controlled at the GCP project/region level — EU, US, and APAC regions supported

Google Cloud Vertex AI brought Anthropic's Claude to Google's enterprise AI platform through the Claude on Vertex partnership — a direct result of Google's substantial investment in Anthropic. For organisations running workloads on Google Cloud, this integration means Claude is available as a first-class Vertex AI model: accessible through the same SDK, governed by the same IAM policies, and protected by the same security controls as every other GCP service.

The Claude on Google Cloud Vertex AI deployment path is particularly relevant for enterprises using BigQuery for data, Looker for analytics, or Workspace for productivity — all of which can be connected to Claude through Vertex AI extensions and MCP server integrations. If your data already lives in GCP, keeping inference inside the same platform boundary simplifies your data governance posture considerably.

This guide walks through the complete deployment path: enabling the Claude models in Vertex Model Garden, configuring service accounts and IAM, making API calls via the Vertex SDK, setting up VPC Service Controls, and building production-grade architecture patterns. If you're comparing GCP vs AWS for your Claude deployment, see our complementary guide on Claude on AWS Bedrock.

What Is Vertex AI and How Does Claude Fit In?

Vertex AI is Google Cloud's unified machine learning platform. It spans everything from training custom models to calling pre-trained foundation models via managed inference endpoints. The Model Garden is the catalogue of pre-trained models — including Claude from Anthropic, Gemini from Google, and dozens of open-source models — available as pay-per-use managed APIs.

Claude on Vertex AI works differently from Claude on AWS Bedrock: Vertex uses Google's authentication infrastructure (OAuth 2.0, service accounts, Workload Identity Federation) rather than AWS IAM. For teams already operating in GCP, this is familiar and manageable. For teams trying to support both AWS and GCP, it requires maintaining separate auth configurations for each platform.

From a data privacy standpoint, Google's Customer Data Processing Addendum and Vertex AI terms state that customer data processed through Vertex AI is not used to train Google's foundation models. Anthropic retains responsibility for Claude's model behaviour. Review the Anthropic and Google DPA documents with your legal and compliance teams before deploying regulated data.

Step 1: Enable Claude in Vertex AI Model Garden

Unlike AWS Bedrock where you request model access through the console, Claude on Vertex AI requires enabling the Vertex AI API and accepting Anthropic's terms through the Model Garden interface. This process takes a few minutes for standard accounts.

Prerequisites

A GCP project with billing enabled
The Vertex AI API enabled (aiplatform.googleapis.com)
A GCP account with the roles/aiplatform.user role or higher
The Google Cloud CLI (gcloud) installed and authenticated

Enable the Vertex AI API

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
  --project=your-project-id

# Verify it's enabled
gcloud services list --enabled \
  --filter="aiplatform" \
  --project=your-project-id

Accessing Claude in Model Garden

Navigate to the GCP Console → Vertex AI → Model Garden. Search for "Claude" and you'll find the available Anthropic models. Click on a model and accept Anthropic's usage terms. Acceptance is per-project — if you have multiple GCP projects (dev, staging, prod), you must accept separately in each.

Model availability in Vertex AI follows regional constraints. The us-central1 (Iowa) region has the broadest Claude model coverage. EU-based deployments should use europe-west4 (Netherlands) or europe-west1 (Belgium). Always verify current regional availability in the Model Garden console before finalising architecture.

Step 2: IAM Roles and Authentication

GCP uses service accounts for machine-to-machine authentication — the GCP equivalent of AWS IAM roles. Your application (Cloud Run service, GKE pod, Cloud Function) should run as a service account with the minimum permissions required to call Vertex AI. Never use user credentials or download service account key files for production deployments.

Creating a Service Account

# Create dedicated service account for Claude inference
gcloud iam service-accounts create claude-inference-sa \
  --display-name="Claude Inference Service Account" \
  --project=your-project-id

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:claude-inference-sa@your-project-id.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

The roles/aiplatform.user role grants permissions to call Vertex AI endpoints. For tighter control, create a custom IAM role that grants only aiplatform.endpoints.predict — the specific permission for making inference calls. This follows least-privilege principles and reduces the blast radius of a compromised service account.

Workload Identity Federation for GKE

For applications running on Google Kubernetes Engine (GKE), use Workload Identity Federation to bind Kubernetes service accounts to GCP service accounts. This eliminates the need for service account key files entirely — pods authenticate automatically using the pod's projected service account token.

# Enable Workload Identity on GKE cluster
gcloud container clusters update your-cluster \
  --workload-pool=your-project-id.svc.id.goog

# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  claude-inference-sa@your-project-id.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:your-project-id.svc.id.goog[namespace/ksa-name]"

Step 3: Calling Claude via the Vertex AI SDK

Vertex AI provides a Python SDK that wraps the underlying REST API. The anthropic Python package also has native Vertex AI support — if you're already using the Anthropic SDK, you can switch to Vertex by changing the client initialisation without rewriting your prompt logic.

Using the Anthropic SDK (Vertex Mode)

import anthropic

# Vertex AI client — uses Application Default Credentials automatically
client = anthropic.AnthropicVertex(
    project_id="your-project-id",
    region="us-central1"
)

message = client.messages.create(
    model="claude-3-5-sonnet@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Analyse the credit risk in this term sheet..."
        }
    ]
)

print(message.content[0].text)

The AnthropicVertex client authenticates using Application Default Credentials (ADC) — it automatically uses the service account attached to your Cloud Run service, GKE pod, or Compute Engine instance. In local development, it uses your gcloud auth application-default login credentials.

Using the Vertex AI Python SDK Directly

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="your-project-id", location="us-central1")

model = GenerativeModel("claude-3-5-sonnet@20241022")

response = model.generate_content(
    "Summarise the key provisions in this employment agreement...",
    generation_config={
        "max_output_tokens": 1024,
        "temperature": 0.2
    }
)

print(response.text)

Streaming with Vertex AI

response = model.generate_content(
    "Draft an executive summary of the Q4 earnings results...",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Deploying Claude on Vertex AI for the First Time?

Authentication misconfigurations, missing API enablement, and regional quota issues are the top three causes of failed Vertex AI rollouts. Our Claude API integration service includes a complete GCP architecture review and production-ready implementation patterns.

Book a Free Architecture Review →

Step 4: VPC Service Controls for Data Residency

VPC Service Controls is GCP's equivalent of AWS PrivateLink for API protection — but it works differently. Rather than routing traffic through private endpoints, VPC Service Controls creates a security perimeter around GCP resources. Access to Vertex AI from outside the perimeter is blocked; exfiltration of data from inside the perimeter is prevented.

For enterprise Claude governance, VPC Service Controls enables two critical policies: ensuring Claude inference only happens from approved network contexts, and preventing credentials or model outputs from being accessed outside your organisation's boundary.

Creating a Service Perimeter

# Create access policy (org-level, done once)
gcloud access-context-manager policies create \
  --organization=org-id \
  --title="Enterprise AI Perimeter Policy"

# Create service perimeter including Vertex AI
gcloud access-context-manager perimeters create claude-perimeter \
  --policy=policy-id \
  --title="Claude Inference Perimeter" \
  --resources="projects/your-project-id" \
  --restricted-services="aiplatform.googleapis.com" \
  --access-levels="accessPolicies/policy-id/accessLevels/corporate-network"

Once the perimeter is in place, Vertex AI calls can only succeed from contexts that match the access level — typically your corporate network (via Cloud VPN or Cloud Interconnect) or from within GCP using authorised service accounts. This creates a verifiable, auditable boundary around every Claude inference call.

Step 5: Vertex AI Extensions — Claude with Tools and Data

Vertex AI Extensions is Google's native agentic framework for adding tool use and data connections to Claude. Extensions allow Claude to call REST APIs, query BigQuery datasets, search through Cloud Storage documents, and interact with Google Workspace. For enterprises deeply embedded in the Google ecosystem, this is the fastest path to an agentic Claude deployment.

Connecting Claude to BigQuery

The BigQuery extension lets Claude run SQL queries against your data warehouse in response to natural language questions. A financial analyst can ask "what was our Q4 revenue by product line?" and Claude translates it to SQL, executes the query, and returns a natural language answer with the underlying figures. This is a pattern we deploy frequently for financial services clients building internal analytics assistants.

For teams needing more control over the tool use architecture — custom tool definitions, complex multi-step workflows, external API integrations — the Claude tool use guide covers building tool use directly in the Anthropic API, which works on Vertex AI with the AnthropicVertex client.

Vertex AI Pricing for Claude Models

Vertex AI pricing for Claude follows the same token-based structure as Bedrock and the direct API. As of early 2026, indicative on-demand pricing for us-central1 is approximately:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude 3.5 Haiku	$0.80	$4.00
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Opus	$15.00	$75.00

Pricing parity with Bedrock is intentional — both platforms price at or near the same level as the direct Anthropic API, with minor premiums reflecting managed infrastructure costs. Google Cloud Committed Use Discounts (CUDs) do not apply to Vertex AI foundation model inference, but Google does offer negotiated enterprise pricing for large-scale deployments. If you're projecting more than $50,000/month in Claude inference costs on Vertex, contact Google Cloud sales directly for custom terms.

Production Architecture on Vertex AI

The Vertex AI production stack for Claude follows GCP-native patterns. Your choice of compute layer depends on traffic profile and latency requirements.

Cloud Run for Stateless Inference Services

Cloud Run is the most common deployment pattern for Claude-powered APIs on GCP. It provides auto-scaling from zero, per-request billing, and clean service account IAM. A Cloud Run service calling Vertex AI is a fully serverless architecture — no infrastructure management, no capacity planning, and cost proportional to actual usage.

GKE for High-Throughput Applications

For applications requiring fine-grained control over concurrency, connection pooling, and request routing — a Kubernetes deployment on GKE is the right platform. Use Workload Identity for authentication, configure horizontal pod autoscaling based on request queue depth, and implement circuit breakers for Vertex AI throttling events.

Dataflow for Batch Processing

Google Cloud Dataflow (Apache Beam) is the GCP-native batch processing framework. For large-scale document processing — classifying 10 million records, extracting data from 50,000 contracts, analysing a year of customer feedback — Dataflow pipelines calling Vertex AI Batch Prediction deliver results efficiently with automatic parallelism and worker scaling.

Integrating with BigQuery ML

GCP's unique advantage is BigQuery ML's ability to call Vertex AI models directly from SQL using the ML.GENERATE_TEXT function. This means analysts without Python skills can run Claude inference directly in BigQuery SQL, making Claude accessible across your entire data analytics organisation without needing to build custom APIs.

-- Call Claude directly from BigQuery SQL
SELECT
  contract_id,
  ML.GENERATE_TEXT(
    MODEL your_project.your_dataset.claude_model,
    STRUCT(
      CONCAT('Extract payment terms from this contract: ', contract_text) AS prompt,
      1024 AS max_output_tokens
    )
  ) AS extracted_terms
FROM
  your_dataset.contracts
WHERE
  processed = FALSE

This pattern is particularly powerful for legal teams and accounting teams where the analysts live in BigQuery and BI tools — not in Python notebooks. Our Claude enterprise implementation service includes a BigQuery ML integration module for these teams.

Need a Production-Grade Vertex AI Claude Deployment?

We've deployed Claude on Vertex AI for regulated enterprises across financial services, healthcare, and legal. Our Claude API integration service covers everything from VPC Service Controls to BigQuery ML integration.

Talk to a Claude Architect → See Our GCP Deployments

Frequently Asked Questions

Does Google use my Vertex AI prompts to train models?

Google's Vertex AI terms state that customer prompts and responses are not used to train Google's foundation models. Anthropic's model is served by Google's infrastructure but trained and maintained by Anthropic independently. Review Google's DPA and Anthropic's terms for details specific to your compliance requirements (GDPR, HIPAA, SOC 2).

Which GCP regions support Claude on Vertex AI?

us-central1 has the broadest model coverage. EU support is available in europe-west4 and europe-west1. APAC coverage includes asia-southeast1 (Singapore) and asia-northeast1 (Tokyo). Regional availability changes as Google and Anthropic expand capacity — always verify in the Vertex AI console before deploying.

Can I use the same prompts on Vertex AI and the direct Anthropic API?

Yes. The model is identical — the same weights, same capabilities, same safety training. The only difference is the API surface. The AnthropicVertex client uses the same messages.create format, so migrating existing prompts from the Anthropic API to Vertex is typically a one-line change in client initialisation.

How do I handle Vertex AI quota and throttling?

Vertex AI enforces per-project, per-region rate limits on model predictions. Default limits are visible in the Google Cloud console under IAM & Admin → Quotas, filtered by "Vertex AI API." Request quota increases through the console for production workloads. Implement exponential backoff and retry logic in your application code to handle RESOURCE_EXHAUSTED errors gracefully.

Is Claude available in Vertex AI Studio for testing?

Yes — Vertex AI Studio provides a browser-based playground for testing Claude prompts without writing code. It's useful for rapid prompt iteration, parameter tuning, and demonstrating capabilities to stakeholders. Studio calls are billed at standard token rates and count against your project quotas.

ClaudeImplementations Team

Claude Certified Architects with 50+ enterprise deployments across financial services, legal, healthcare, and manufacturing. About us →