← Tech Guides

Cloud Platforms

AWS · Azure · GCP — Conceptual Reference

Guide #23 12 Sections Multi-Cloud Comparison
Section I

Cheat Sheet

Essential CLI commands and core concepts for each cloud provider at a glance. Authenticate, list resources, and deploy—side by side.

Essential CLI Commands

AWS CLI
# authenticate aws configure # list S3 buckets aws s3 ls # list EC2 instances aws ec2 describe-instances # list Lambda functions aws lambda list-functions # list EKS clusters aws eks list-clusters
Azure CLI
# authenticate az login # list storage blobs az storage blob list # list virtual machines az vm list # list function apps az functionapp list # list AKS clusters az aks list
GCP CLI
# authenticate gcloud auth login # list Cloud Storage gsutil ls # list Compute Engine gcloud compute instances list # list Cloud Functions gcloud functions list # list GKE clusters gcloud container clusters list

Key Concepts

Concept AWS Azure GCP
Geography Regions → Availability Zones Regions → Availability Zones Regions → Zones
Identity IAM Users & Roles Entra ID (Azure AD) Cloud IAM
IaC (Native) CloudFormation Bicep / ARM Templates Terraform (preferred)
Account Hierarchy Organization → OUs → Accounts Management Groups → Subscriptions → Resource Groups Organization → Folders → Projects
CLI Tool aws az gcloud / gsutil
Console AWS Management Console Azure Portal Google Cloud Console
Pro Tip
All three providers offer a free tier for new accounts. AWS gives 12 months of select services, Azure provides $200 in credits for 30 days plus 12 months of popular services, and GCP offers $300 in credits for 90 days plus an always-free tier.
↑ Back to Top
Section II

The Big Three

AWS, Azure, and GCP dominate the cloud market with distinct strengths. Understanding their positioning helps you choose the right platform—or the right combination.

Cloud Market Share (2025)

31–32%
AWS Market Share
23–28%
Azure Market Share
11–13%
GCP Market Share
$600B+
Total Cloud Market (2025)

Provider Profiles

Amazon Web Services
250+
Services Available

The pioneer and market leader. Launched in 2006, AWS offers the broadest service catalog and the most mature ecosystem. With 33 regions and 105 availability zones globally, it provides unmatched geographic coverage.

Strongest for: Startups, maximum service breadth, mature ecosystem, largest community & talent pool.
Microsoft Azure
60+
Regions Worldwide

The enterprise choice. Deep integration with Microsoft 365, Active Directory (Entra ID), and the .NET ecosystem. Azure Arc extends management to hybrid and multi-cloud. Fastest growing of the Big Three.

Strongest for: Enterprise Microsoft shops, hybrid cloud, .NET workloads, government & compliance.
Google Cloud Platform
40+
Regions Worldwide

The data and AI leader. Built on Google’s internal infrastructure (Borg, Spanner, BigQuery). Kubernetes creator (GKE is the gold standard). Vertex AI and TPUs give it an edge in machine learning workloads.

Strongest for: Data analytics, ML/AI workloads, Kubernetes-native architectures, BigQuery analytics.

Key Metrics Comparison

Metric AWS Azure GCP
Launched 2006 2010 2008
Market Share 31–32% 23–28% 11–13%
Regions 33 regions, 105 AZs 60+ regions 40+ regions, 121 zones
Service Count 250+ 200+ 150+
Revenue (Annual) ~$100B ~$96B (est.) ~$41B
Growth Rate ~17% YoY ~29% YoY ~26% YoY
Key Differentiator Breadth & maturity Enterprise integration Data & AI leadership
Certifications Solutions Architect, DevOps, Specialty AZ-900, AZ-104, AZ-305, AI-900 Cloud Engineer, Architect, ML Engineer
Market Context
Azure’s market share range (23–28%) varies depending on the analyst. Gartner, Synergy Research, and Canalys use different methodologies. Azure includes non-cloud revenue in some reports. The trend matters more than the exact number: Azure is growing fastest, GCP is catching up, and AWS retains the largest absolute share.
↑ Back to Top
Section III

Compute

Virtual machines, serverless functions, containers, and PaaS offerings—compared across all three platforms. The biggest decision is not which provider, but which compute model fits your workload.

Compute Services at a Glance

Category AWS Azure GCP
Virtual Machines EC2 Virtual Machines Compute Engine
Serverless Functions Lambda Functions Cloud Functions
Managed Kubernetes EKS AKS GKE
Serverless Containers Fargate Container Apps Cloud Run
PaaS Elastic Beanstalk App Service App Engine
Batch Computing AWS Batch Azure Batch Cloud Batch

Virtual Machines

VMs remain the backbone of cloud computing. All three providers offer per-second billing, customizable instance types, and spot/preemptible pricing for significant savings.

EC2 (AWS)

The original cloud VM. 750+ instance types across families (general, compute, memory, GPU, storage-optimized). Spot instances save up to 90%. Graviton (ARM) processors offer best price-performance.

Virtual Machines (Azure)

Tight Windows Server integration with Azure Hybrid Benefit (reuse on-prem licenses). Spot VMs available. Confidential computing VMs for sensitive workloads. Familiar to Windows admins.

Compute Engine (GCP)

Custom machine types (choose exact vCPU/RAM). Live migration means zero-downtime maintenance. Sustained-use discounts applied automatically. Preemptible & Spot VMs save up to 91%.

Serverless Functions

Event-driven compute with zero server management. Pay only for execution time. Ideal for APIs, event processing, and glue code between services.

Lambda (AWS)

The serverless pioneer. Up to 15-minute timeout, 10 GB memory. Lambda@Edge for CDN-based compute. Supports container images up to 10 GB. 1 million free requests/month.

Functions (Azure)

Durable Functions for stateful workflows. Consumption, Premium, and Dedicated plans. Premium plan eliminates cold starts. Native bindings to Azure services (Cosmos DB, Event Grid, Service Bus).

Cloud Functions (GCP)

2nd gen built on Cloud Run (same infrastructure). Up to 60-minute timeout, 32 GB memory. Eventarc integration for unified eventing. Concurrency support reduces cold starts.

Managed Kubernetes

Kubernetes is the industry standard for container orchestration. The managed offerings differ significantly in pricing, features, and developer experience.

Feature EKS AKS GKE
Control Plane Cost $0.10/hr ($73/mo) FREE Free (Standard), $0.10/hr (Enterprise)
Max Nodes Up to 5,000 Up to 5,000 Up to 15,000
Cluster Startup ~10 minutes ~5–8 minutes ~3–5 minutes
Autopilot Mode No (Fargate for serverless) No (uses node pools) Autopilot Full node management
Key Advantage Deep AWS integration Free control plane, Entra ID K8s creator, fastest upgrades
Recommendation
GKE is widely considered the best managed Kubernetes experience. Google created Kubernetes, and GKE typically gets new K8s features first. AKS’s free control plane makes it compelling for cost-conscious teams. EKS excels when your infrastructure is already deeply integrated with AWS services.

Serverless Containers

Run containers without managing clusters. These services abstract away the orchestration layer, offering a middle ground between functions-as-a-service and full Kubernetes.

Fargate (AWS)

Serverless compute engine for ECS and EKS. No cluster management. Per-vCPU and per-GB memory pricing. Works with existing ECS task definitions. Supports both Linux and Windows containers.

Container Apps (Azure)

Built on Kubernetes (KEDA + Envoy). Auto-scaling including scale-to-zero. Dapr integration for microservices. Simpler than AKS while more capable than Functions. Revision-based deployments.

Cloud Run (GCP)

The developer favorite. Deploy any container with a single command. Scale to zero, pay per 100ms. Request-based autoscaling. Supports gRPC, WebSockets, and HTTP/2. GPU support for AI inference.

Platform-as-a-Service

Fully managed application hosting. Push code, get a URL. Best for teams that want to focus on application logic rather than infrastructure.

Elastic Beanstalk (AWS)

Deploys to EC2, supports Docker, Java, .NET, Node.js, Python, Ruby, Go, PHP. Environment cloning for staging. Rolling and blue/green deployments. Lowest adoption of the three PaaS options.

App Service (Azure)

The strongest PaaS offering. Built-in CI/CD, custom domains, SSL, authentication. Deployment slots for zero-downtime releases. WebJobs for background tasks. Excellent .NET and Windows support.

App Engine (GCP)

The original PaaS (launched 2008). Standard environment for auto-managed runtimes, Flexible for custom Docker containers. Traffic splitting for A/B testing. Firewall rules built in.

Compute Decision Matrix
Use VMs when you need full OS control, specific hardware, or legacy workloads. Use serverless functions for event-driven, short-lived tasks. Use managed K8s for complex microservices at scale. Use serverless containers when you want container flexibility without cluster management. Use PaaS for straightforward web apps where you just want to push code.
↑ Back to Top
Section IV

Storage & Databases

Object stores, block volumes, managed databases, and data warehouses—the persistence layer that underpins every cloud application. Each provider has a killer service here.

Storage Services at a Glance

Category AWS Azure GCP
Object Storage S3 Blob Storage Cloud Storage
Block Storage EBS Managed Disks Persistent Disk
File Storage EFS Azure Files Filestore
Archive S3 Glacier Archive Storage Archive Storage

Object Storage Deep Dive

Amazon S3

The gold standard. 11 nines (99.999999999%) durability. Virtually unlimited storage capacity. Multiple storage classes: Standard, Intelligent-Tiering, Glacier for archival. Free tier includes 5 GB.

# upload to S3 aws s3 cp file.zip s3://my-bucket/ # sync a directory aws s3 sync ./data s3://my-bucket/data

Azure Blob Storage

Four access tiers for cost optimization: Hot, Cool, Cold, and Archive. Cold tier at 0.36¢/GB/mo and Archive at 0.099¢/GB/mo make it excellent for tiered data strategies.

# upload to blob az storage blob upload \ --file data.csv \ --container-name mycontainer

Cloud Storage

Unified API across all storage classes. Per-second billing for fine-grained cost control. Multi-regional option for highest availability. Tight integration with BigQuery for analytics.

# upload to Cloud Storage gsutil cp file.zip gs://my-bucket/ # rsync a directory gsutil rsync -r ./data gs://my-bucket/data

Database Services

Category AWS Azure GCP
Relational (managed) RDS Azure SQL Database Cloud SQL
Proprietary Relational Aurora AlloyDB
Global Relational Aurora Global Azure SQL geo-replication Cloud Spanner
NoSQL Document DynamoDB Cosmos DB Firestore
Wide-Column Cosmos DB (Cassandra) Cloud Bigtable
In-Memory Cache ElastiCache Cache for Redis Memorystore
Data Warehouse Redshift Synapse Analytics BigQuery

Key Database Differentiators

Cloud Spanner: Globally distributed, strongly consistent relational database—unique in the market. Combines the structure of SQL with the horizontal scalability of NoSQL. No other provider offers a true equivalent.
Cosmos DB: Multi-model database supporting document, key-value, graph, and column-family data models. Multiple APIs (SQL, MongoDB, Cassandra, Gremlin, Table). Turnkey global distribution with five consistency levels.
BigQuery: Serverless data warehouse built on Google’s Dremel technology. Petabyte-scale analytics with no infrastructure to manage. Run ML models directly in SQL with BigQuery ML. Separate compute and storage billing.
Aurora Serverless v2: Auto-scales capacity in fine-grained increments. Pay only for the compute you use. Scales to hundreds of thousands of transactions per second. Compatible with MySQL and PostgreSQL.
DynamoDB: Single-digit millisecond latency at any scale. Fully serverless—no capacity planning required. On-demand and provisioned capacity modes. Global tables for multi-region replication.
GCP’s Killer Service
BigQuery is arguably GCP’s single most compelling service. Its serverless architecture, separation of storage and compute, and ability to query petabytes in seconds make it the go-to choice for analytics workloads. Many organizations use GCP exclusively for BigQuery while running everything else on AWS or Azure.
↑ Back to Top
Section V

Networking

Virtual networks, load balancers, CDNs, and dedicated connections. GCP’s global VPC is a fundamental architectural difference that changes how you design multi-region deployments.

Networking Services at a Glance

Category AWS Azure GCP
Virtual Network VPC VNet VPC (global)
DNS Route 53 Azure DNS Cloud DNS
CDN CloudFront Front Door / CDN Cloud CDN
Load Balancer ALB / NLB Application Gateway / LB Cloud Load Balancing
API Gateway API Gateway API Management API Gateway
DDoS Protection Shield DDoS Protection Cloud Armor
Private Connect PrivateLink Private Link Private Service Connect
Dedicated Line Direct Connect ExpressRoute Cloud Interconnect

The Global VPC Difference

GCP’s VPC is global. A single VPC spans all regions automatically, and subnets are regional (not zonal). This means resources in different regions can communicate over internal IPs without peering, VPN tunnels, or transit gateways. AWS and Azure VPCs/VNets are regional—cross-region communication requires explicit connectivity setup.
Architectural Impact
This is not just a convenience feature. GCP’s global VPC fundamentally simplifies multi-region architectures. No VPC peering, no transit gateways, no cross-region routing tables. You define one VPC, create subnets in whatever regions you need, and everything just routes internally.

Global Infrastructure

AWS
39 Regions
100+ Availability Zones

The largest footprint. Multiple AZs per region for high availability. Local Zones and Wavelength for edge computing. Outposts for on-premises AWS infrastructure.

Microsoft Azure
60+ Regions
Broadest Global Coverage

More regions than any other provider. Paired regions for built-in disaster recovery. Availability Zones in major regions. Government and sovereign cloud offerings.

Google Cloud
40+ Regions
Global VPC Architecture

Premium tier uses Google’s private backbone network for lowest latency. Global load balancing with a single anycast IP. Subsea cables connecting continents.

Network Topology Patterns

Hub-Spoke (Azure)

Azure’s recommended pattern. A central hub VNet hosts shared services (firewalls, DNS, VPN gateways). Spoke VNets peer to the hub. Azure Virtual WAN automates this at scale. Clean separation of concerns with centralized security inspection.

Transit Gateway (AWS)

AWS Transit Gateway acts as a cloud router connecting VPCs, VPN, and Direct Connect. Supports thousands of VPCs with centralized routing. Route tables enable network segmentation. Inter-region peering for global connectivity.

Shared VPC (GCP)

A host project owns the VPC; service projects use its subnets. Centralizes network administration while keeping workloads separated. Combined with the global VPC, this eliminates most cross-region complexity found in AWS and Azure.

Network Cost Trap
Data egress (outbound transfer) is where cloud networking costs catch people off guard. All three providers charge for data leaving their network, typically $0.08–$0.12/GB. Data transfer between regions within the same provider also incurs charges. Always model your egress costs before committing to a multi-region architecture.
↑ Back to Top
Section VI

Identity & Security

IAM models, key management, threat detection, and compliance. Azure’s enterprise identity dominance is the clearest competitive advantage any provider holds in any category.

Security Services at a Glance

Category AWS Azure GCP
Identity Service IAM Entra ID (Azure AD) Cloud IAM
Key Management KMS Key Vault Cloud KMS
Secret Management Secrets Manager Key Vault Secret Manager
WAF AWS WAF Azure WAF Cloud Armor
SIEM Microsoft Sentinel Chronicle
Threat Detection GuardDuty Defender for Cloud Security Command Center

IAM Model Differences

Each provider takes a fundamentally different approach to identity and access management. Understanding these models is critical because IAM mistakes are the #1 cause of cloud security breaches.

AWS IAM

Users, Groups, Roles, and Policies in a flat (non-hierarchical) model. Explicit deny always wins. Supports attribute-based access control (ABAC) with resource tags for fine-grained permissions. The most granular policy language of the three.

# list IAM users aws iam list-users # attach policy to role aws iam attach-role-policy \ --role-name MyRole \ --policy-arn arn:aws:iam::aws:policy/ReadOnlyAccess

Azure Entra ID

Hierarchical model with Management Groups, Subscriptions, Resource Groups, and Resources. Native Active Directory integration. RBAC with built-in and custom roles. The strongest enterprise identity solution—Windows, Office 365, and GitHub all tie in natively.

# list role assignments az role assignment list \ --assignee user@domain.com # assign a role az role assignment create \ --role "Reader" \ --assignee user@domain.com

GCP Cloud IAM

Cleanest hierarchy: Organization → Folders → Projects → Resources. Policies inherit downward with clean override semantics. Service accounts for machine identity. Workload Identity Federation for external identity providers.

# list IAM policy gcloud projects get-iam-policy \ my-project-id # add IAM binding gcloud projects add-iam-policy-binding \ my-project-id \ --member="user:dev@example.com" \ --role="roles/viewer"

Key Differentiators

Entra ID (formerly Azure AD) is the clear winner for enterprise identity. Native integration with Windows, Office 365, GitHub, and thousands of SaaS apps via SAML/OIDC. Conditional Access policies, Privileged Identity Management (PIM), and identity governance. AWS and GCP simply cannot match this depth in enterprise identity.
AWS IAM policy language is the most granular of the three. Fine-grained conditions, ABAC with tags, permission boundaries, and service control policies (SCPs) in AWS Organizations. More complex to learn, but more powerful once mastered.
GCP’s hierarchy model (Org → Folder → Project) is the cleanest. Policies cascade naturally. Projects provide a strong isolation boundary that maps well to teams and environments. Workload Identity Federation eliminates the need for service account keys.
Enterprise Identity Reality
If your organization runs on Microsoft (Active Directory, Office 365, Windows endpoints), Azure’s Entra ID integration is a massive advantage. AWS and GCP both require federation with an external IdP for enterprise scenarios. For greenfield or non-Microsoft shops, all three providers are roughly comparable—but Microsoft shops should weight Azure heavily for this reason alone.

Compliance & Certifications

All three major providers maintain comprehensive compliance programs. The core certifications are shared across providers:

SOC 2
Security Controls
ISO 27001
Info Security Mgmt
HIPAA
Healthcare Data
PCI DSS
Payment Card Data
GDPR
EU Data Protection
FedRAMP
US Federal Govt
Compliance Note
Cloud provider compliance certifications cover the infrastructure layer. You are still responsible for configuring your workloads securely (the “shared responsibility model”). AWS, Azure, and GCP all publish detailed shared responsibility matrices—review them before assuming a certification covers your entire stack.
↑ Back to Top
Section VII

AI & Machine Learning

Foundation models, ML platforms, and custom AI across the cloud giants

ML & AI Services at a Glance

Category AWS Azure GCP
ML Platform SageMaker Azure ML Vertex AI
Foundation Models Bedrock OpenAI Service Gemini API / Model Garden
AutoML SageMaker Autopilot AutoML Vertex AutoML
Custom Hardware Inferentia / Trainium TPU (v2–v5p)
AI Agents Bedrock Agents Azure AI Studio Vertex AI Agents

Platform Deep Dives

AWS — SageMaker & Bedrock
34%
ML Platform Market Share

SageMaker is the end-to-end ML platform that dominates the market. Model customization that once took months can be completed in days with built-in tools for training, tuning, and deployment.

Bedrock provides managed access to foundation models from Anthropic, Meta, Mistral, and Amazon’s own Nova family. Agent workflows with server-side tools landed in 2026, enabling complex multi-step AI orchestration.

The Amazon Nova 2 model family includes Sonic (speech-to-speech), Lite (1M token context window), and Omni (multimodal). OpenAI models are now also available on Bedrock, giving teams access to virtually every major foundation model through a single API.

Azure — OpenAI Service & AI Studio
29%
ML Platform Market Share

Azure OpenAI Service provides enterprise-grade REST API access to GPT-4 and powers the entire Copilot ecosystem. Azure’s exclusive partnership with OpenAI is its strongest AI differentiator.

Azure AI Studio is the unified environment for building custom copilots and AI agents. Azure Copilot (2026) introduces agentic cloud management—an AI interface for managing Azure resources at no additional cost.

Deprecation Notice
Azure ML SDK v1 is deprecated. Support ends June 2026. Migrate to the v2 SDK and Azure AI Studio.
GCP — Vertex AI & Gemini
22%
ML Platform Market Share

Vertex AI is the most feature-rich ML platform of the three, with 200+ models available in Model Garden. It provides a unified surface for training, tuning, deploying, and monitoring models from Google and third parties.

Gemini 3 delivers state-of-the-art reasoning and multimodal capabilities. TPUs (Tensor Processing Units) are Google’s custom AI hardware, offering 10–100x faster performance for suited workloads compared to general-purpose GPUs.

GCP is 18–22% cheaper for AI training workloads as of 2026, driven by TPU efficiency and aggressive pricing on GPU instances.

ML Platform Market Share (2026)

34%
SageMaker
29%
Azure ML
22%
Vertex AI
GCP’s TPU Cost Advantage
If your workload is AI/ML training, GCP deserves serious evaluation. TPUs offer dramatically better price-performance for large model training, and Vertex AI’s 18–22% cost advantage over equivalent AWS/Azure GPU instances can translate to hundreds of thousands of dollars saved annually at scale. Google’s own infrastructure (the same systems that train Gemini) is available to you.
↑ Back to Top
Section VIII

Infrastructure as Code

Declarative infrastructure management from native tools to cross-platform standards

IaC Tools at a Glance

Category AWS Azure GCP
Native IaC CloudFormation ARM Templates Deployment Manager (EOL 3/2026)
Modern Native CDK Bicep Infrastructure Manager
Serverless IaC SAM
Cross-Platform Terraform, Pulumi Terraform, Pulumi Terraform (preferred), Pulumi

Native IaC Tools

AWS CloudFormation + CDK

CloudFormation is AWS’s declarative IaC service using JSON or YAML templates. It provides the most comprehensive coverage of AWS services—new features typically get CloudFormation support on launch day.

CDK (Cloud Development Kit) lets you write infrastructure in TypeScript, Python, C#, or Java. CDK code synthesizes into CloudFormation templates, giving you the expressiveness of real programming languages with the reliability of CloudFormation’s deployment engine.

SAM (Serverless Application Model) is a compact abstraction for serverless resources. It supports local testing via sam local invoke and generates CloudFormation under the hood. All three tools ultimately use the CloudFormation engine.

# CloudFormation YAML Resources: MyBucket: Type: AWS::S3::Bucket Properties: BucketName: my-app-bucket VersioningConfiguration: Status: Enabled

Azure ARM Templates + Bicep

ARM Templates are Azure’s native JSON format for infrastructure definitions. They provide full control over every Azure resource but are notoriously verbose and difficult to read.

Bicep is Microsoft’s modern declarative language that compiles to ARM. It offers dramatically simplified syntax, modules for reuse, and first-class VS Code support. As of 2026, Bicep is the recommended choice over ARM for all new Azure deployments.

Deprecation Notice
Azure Blueprints is deprecated as of July 2026. Migrate to Template Specs + Deployment Stacks for governance and repeatable deployments.
// Bicep resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: 'myappstorage' location: resourceGroup().location kind: 'StorageV2' sku: { name: 'Standard_LRS' } }

GCP Deployment Manager → Terraform

Deployment Manager is GCP’s original native IaC tool, but it reaches end-of-life on March 31, 2026. Google has not invested in a direct successor in the same mold.

Infrastructure Manager is Google’s managed Terraform service—essentially “Terraform as a Service.” Rather than building their own IaC language, Google embraced Terraform as the recommended tool for GCP infrastructure.

The DM Convert tool is available to help migrate existing Deployment Manager configurations to Terraform HCL format.

# Terraform HCL resource "google_storage_bucket" "default" { name = "my-app-bucket" location = "US" force_destroy = true versioning { enabled = true } }

Cross-Platform Tools

Terraform by HashiCorp is the industry standard for multi-cloud IaC. Its HCL (HashiCorp Configuration Language) is declarative and purpose-built for infrastructure. The provider ecosystem covers AWS, Azure, GCP, and hundreds of other services. State management and plan/apply workflow provide safety and predictability.

Pulumi is the newer alternative that lets you write infrastructure using real programming languages—TypeScript, Python, Go, and C#. It appeals to developers who prefer familiar languages over learning HCL. Growing ecosystem but smaller community than Terraform.

Which IaC Tool Should You Choose?
Terraform is the safest bet for multi-cloud environments—it works everywhere and has the largest ecosystem. If you are Azure-only, use Bicep for its native integration and simplified syntax. If you are AWS-only, CDK is excellent for teams that prefer TypeScript or Python over YAML. For GCP, Terraform is the officially recommended path forward.
↑ Back to Top
Section IX

Observability

Monitoring, logging, and tracing across cloud platforms

Observability Services at a Glance

Category AWS Azure GCP
Monitoring CloudWatch Azure Monitor Cloud Monitoring
Logging CloudWatch Logs Monitor Logs (KQL) Cloud Logging
Tracing X-Ray Application Insights Cloud Trace
Profiling Application Insights Cloud Profiler
Audit Trail CloudTrail Activity Log Audit Logs

The Three Pillars of Observability

Modern observability is built on three complementary signal types. Together, they answer what happened, why it happened, and where it happened in distributed systems.

Metrics

Numerical measurements collected over time. CPU utilization, memory consumption, request latency, error rates, queue depth. Metrics are cheap to store, fast to query, and ideal for alerting. They tell you that something is wrong.

# AWS CloudWatch: get CPU metric aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 \ --metric-name CPUUtilization \ --period 300 \ --statistics Average

Logs

Structured or unstructured records of discrete events. Application logs, access logs, audit trails, error messages. Logs provide rich context but are expensive to store at scale. They tell you why something went wrong.

# Azure: query logs with KQL az monitor log-analytics query \ --workspace my-workspace \ --analytics-query "requests | where resultCode >= 500 | summarize count() by bin(timestamp, 5m)"

Traces

End-to-end request paths through distributed systems. Each trace shows the full journey of a request across services, with timing data for every hop. Traces reveal latency bottlenecks, failing dependencies, and cascade failures. They tell you where the problem is.

# GCP: list recent traces gcloud trace traces list \ --project=my-project \ --limit=10

Provider Observability Stacks

AWS CloudWatch + X-Ray

CloudWatch is the unified observability platform for AWS. It collects metrics, logs, and alarms across all AWS services and on-premises resources. Custom metrics, dashboards, composite alarms, and anomaly detection are all built in.

X-Ray provides distributed tracing with service maps that visualize request flows. It integrates directly with CloudWatch Logs for correlated debugging—jump from a trace span to the exact log lines.

CloudTrail records every API call made in your AWS account. Essential for security auditing, governance, and compliance. All three services work together for full-stack observability.

Azure Monitor + Application Insights

Azure Monitor is the unified observability platform that merged Log Analytics and Application Insights into a single pane of glass. It covers infrastructure metrics, application telemetry, and log analytics.

Application Insights is the APM (Application Performance Monitoring) layer, now built on OpenTelemetry. Real-time Live Metrics stream provides instant visibility. Smart detection automatically identifies anomalies.

KQL (Kusto Query Language) is Azure’s powerful log query language—expressive, fast, and consistent across Monitor, Sentinel, and Defender. Learning KQL is a high-value investment for Azure teams.

Migration Notice
Microsoft Sentinel migrates to the Defender portal in July 2026. Plan your SIEM workflows accordingly.
GCP Cloud Operations

Cloud Monitoring and Cloud Logging (formerly Stackdriver) form the core of GCP’s observability stack. Tight integration with all GCP services, plus support for AWS and on-premises via agents.

Cloud Trace provides distributed tracing for latency analysis across microservices. Cloud Profiler continuously profiles production applications to identify CPU and memory hotspots with minimal overhead.

GCP’s operations suite is clean and well-integrated, but has fewer third-party integrations than CloudWatch.

Pricing Change
GCP alerting pricing starts May 2026: $0.10/month per alerting condition. Free tier covers up to 500 conditions. Plan your alerting strategy accordingly.

Third-Party & Open Source

Cloud-native observability tools are powerful, but vendor lock-in is a real concern. The industry is converging on open standards and multi-cloud solutions.

OpenTelemetry

The vendor-neutral, open-source standard for metrics, logs, and traces. Backed by the CNCF, OpenTelemetry provides SDKs for every major language and a collector that routes telemetry to any backend. Instrument once, send data anywhere. All three cloud providers now support OTLP (OpenTelemetry Protocol) natively.

Grafana LGTM Stack

The open-source observability stack: Loki (logs), Grafana (visualization), Tempo (traces), and Mimir (metrics). Runs on any cloud or on-premises. Grafana Cloud offers the managed version. Excellent choice for teams wanting cloud-agnostic observability.

Commercial Platforms

Datadog and Dynatrace are the leading commercial observability platforms. Both provide unified metrics, logs, traces, and APM with sophisticated AI-powered analysis. Popular in enterprise environments where a single platform across all clouds is valued over cost optimization.

The Multi-Cloud Observability Standard
OpenTelemetry is rapidly becoming the universal standard for cloud observability. If you are building for multi-cloud or want to avoid vendor lock-in, instrument your applications with OpenTelemetry SDKs from day one. You can route telemetry to any backend—CloudWatch, Azure Monitor, Cloud Operations, Grafana, or Datadog—without changing application code. The investment in OTel instrumentation pays for itself every time you change or add an observability backend.
↑ Back to Top
Section X

Pricing & Cost

Understanding cloud economics across AWS, Azure, and GCP. Pricing models, free tiers, hidden costs, and the strategies that separate optimized bills from runaway spend.

Pricing Models Comparison

Model AWS Azure GCP
On-Demand Per-second (60s min) Per-second (60s min) Per-second (1min min)
Reserved / Committed Savings Plans & RIs (up to 72%) Reserved Instances (up to 72%) CUDs (57–70%)
Spot / Preemptible Spot (up to 90% off) Spot VMs (up to 90% off) Preemptible (up to 80% off)
Auto Discounts None None Sustained Use (up to 30%)
Hybrid Licensing Azure Hybrid Benefit (28–80%)

Key Pricing Differentiators

GCP Sustained-Use Discounts: The only provider with automatic sustained-use discounts—no commitment needed, just use it. If a VM runs for more than 25% of a month, GCP automatically applies incremental discounts up to 30%. Zero friction cost optimization.
Azure Hybrid Benefit: Massive savings for Microsoft shops. Bring your existing Windows Server or SQL Server licenses to Azure for 28–80% savings. Combined with Reserved Instances, this makes Azure the cheapest option for organizations already invested in Microsoft licensing.
AWS Pricing Complexity: The most complex pricing of the three—AWS averages 197 monthly price changes across its services. This requires expertise to optimize but offers maximum flexibility. Savings Plans provide the most versatile commitment model, covering EC2, Lambda, and Fargate with a single commitment.

Free Tier Comparison

AWS Free Tier

12 months free from account creation. EC2 750 hrs/mo (t2.micro), S3 5 GB storage, RDS $100 credits. Always free: Lambda 1M requests/mo, DynamoDB 25 GB, CloudWatch 10 custom metrics.

The broadest free tier with the most service coverage, but the 12-month clock starts immediately.

Azure Free Tier

$200 credits for 30 days to explore any service. 12 months free: B1s VM 750 hrs/mo, Blob Storage 5 GB, SQL Database 250 GB. Always free: Functions 1M requests/mo, Cosmos DB 1000 RU/s.

The upfront credit model lets you test expensive services risk-free before committing.

GCP Free Tier

$300 credits for new users (90-day expiry). Always free: e2-micro VM (1 per month, US regions), Cloud Functions 2M invocations/mo, BigQuery 1 TB queries/mo, Cloud Storage 5 GB.

The always-free e2-micro VM is unique—a perpetually free compute instance for lightweight workloads.

Kubernetes Control Plane Pricing

$0.10/hr
EKS per cluster + EC2 nodes (most expensive)
FREE
AKS control plane (most cost-effective)
$0.10/hr
GKE Autopilot or free standard mode
The Hidden Trap: Data Egress Costs
Data egress (outbound transfer) is the most common source of surprise cloud bills across all three providers. Ingress is free, but egress is metered—and charges compound quickly at scale. Organizations typically overspend by 30–40% in their first year, with egress fees being a primary contributor. Always model your data transfer patterns before choosing a provider or architecture. Use private endpoints, CDNs, and regional placement to minimize cross-region and internet egress.
↑ Back to Top
Section XI

Architecture Patterns

Common cloud architectures and well-architected frameworks. The blueprints that translate business requirements into resilient, cost-effective infrastructure across AWS, Azure, and GCP.

Well-Architected Frameworks

Each provider publishes a Well-Architected Framework—a set of design principles and best practices. The pillars overlap significantly but reflect each provider’s priorities.

AWS (6 Pillars)

Operational Excellence — Run and monitor systems to deliver business value and continuously improve processes.

Security — Protect information, systems, and assets through risk assessment and mitigation.

Reliability — Recover from failure and meet demand dynamically.

Performance Efficiency — Use resources efficiently as demand and technology evolve.

Cost Optimization — Avoid unnecessary costs and understand spending.

Sustainability — Minimize environmental impact of cloud workloads. AWS is the only provider with a dedicated sustainability pillar.

Azure (5 Pillars)

Reliability — Ensure workloads are resilient and available.

Security — Protect workloads from threats with defense-in-depth.

Cost Optimization — Manage costs to maximize delivered value.

Operational Excellence — Streamline operations with DevOps practices.

Performance Efficiency — Adapt to changes in demand efficiently.

Azure places Reliability first, reflecting Microsoft’s enterprise focus on uptime and availability SLAs.

GCP (Different Approach)

Operational Excellence — Efficient deployment, operation, and monitoring of services.

Security, Privacy & Compliance — Maximize data security and regulatory compliance.

Reliability — Design resilient, highly available workloads.

Cost Optimization — Optimize resource usage and reduce waste.

Performance Optimization — Design for latency, throughput, and resource utilization.

GCP uniquely combines Security, Privacy, and Compliance into a single pillar, reflecting Google’s data-centric worldview.

Common Architecture Patterns

Three-Tier Web Application

Pattern: Presentation → Business Logic → Data. The most common cloud architecture for web applications and APIs.

  • AWS CloudFront + ALB + EC2/ECS + RDS
  • Azure Front Door + App Service + SQL Database
  • GCP Cloud CDN + Cloud Run + Cloud SQL

Serverless

Pattern: Event-driven, fully managed, pay-per-invocation. No servers to manage, infinite automatic scaling.

  • AWS API Gateway + Lambda + DynamoDB + S3
  • Azure API Management + Functions + Cosmos DB + Blob
  • GCP API Gateway + Cloud Functions + Firestore + Cloud Storage

Microservices on Kubernetes

Pattern: Containerized services orchestrated by Kubernetes, with service mesh, event-driven communication, and independent scaling.

  • AWS EKS + Fargate + ALB + EventBridge
  • Azure AKS + Container Apps + Service Bus + Event Grid
  • GCP GKE + Cloud Run + Pub/Sub

Data Lake / Lakehouse

Pattern: Bronze → Silver → Gold medallion architecture. Ingest raw data, refine progressively, serve for analytics and ML.

  • AWS S3 + Glue + Athena + Redshift
  • Azure Data Lake Storage + Synapse (Medallion architecture)
  • GCP Cloud Storage + BigQuery (most streamlined)

Multi-Cloud Adoption

87%
Enterprises use multi-cloud strategy
8%
Remain single-cloud only
Architecture Is About Constraints, Not Choices
The best architecture is not the one with the most services or the most advanced patterns. It is the one that fits your team’s capabilities, your organization’s compliance requirements, your budget constraints, and your users’ expectations. Start simple. Earn complexity through proven need, not anticipated scale. Every additional service is a dependency to monitor, secure, and pay for.
↑ Back to Top
Section XII

Decision Framework

Choosing the right cloud for your workload. A structured approach to evaluating AWS, Azure, and GCP based on your organization, team, and technical requirements.

Decision Matrix

Factor Choose AWS Choose Azure Choose GCP
Org Type Startups, cloud-native Enterprise, Microsoft shops Data/ML-focused teams
Identity Custom / Cognito Active Directory / Entra ID Google Workspace
Strength Broadest services (250+) Hybrid cloud (Arc) Data analytics (BigQuery)
Containers EKS ecosystem AKS (free control plane) GKE (gold standard)
AI / ML Bedrock + SageMaker OpenAI Service + Copilot Vertex AI + TPUs
IaC CloudFormation / CDK Bicep Terraform
Pricing Edge Spot instances (90% off) Hybrid Benefit (80% off) Auto sustained discounts
Best For Maximum flexibility Microsoft ecosystem Developer experience

When to Choose Each Provider

Choose AWS If…

You need the broadest service catalog (250+ services). You are a startup already in the AWS ecosystem. You require maximum flexibility and advanced networking (EFA for HPC). You want the strongest MLOps tooling with SageMaker. You need the most mature serverless ecosystem (Lambda, Step Functions, EventBridge).

42%
of enterprises prefer AWS as primary cloud
Choose Azure If…

You are a Microsoft-centric enterprise with Active Directory, Office 365, and Windows Server. You need hybrid cloud capabilities (Azure Arc). You want Windows Server and SQL Server integration with Hybrid Benefit savings. You need a free Kubernetes control plane (AKS). OpenAI integration is critical for your AI strategy. You need Entra ID for enterprise identity.

36%
of enterprises prefer Azure as primary cloud
Choose GCP If…

Your workloads are data and analytics heavy (BigQuery is unmatched). AI/ML is your primary focus (Vertex AI, TPUs). You need Kubernetes excellence (GKE, created by Google). You want the cleanest developer experience and CLI tooling. You prefer automatic cost optimization with sustained-use discounts. You value simplicity over breadth of services.

12%
of enterprises prefer GCP as primary cloud

Multi-Cloud Strategy

The majority of enterprises do not choose a single cloud. Instead, they adopt a multi-cloud strategy that leverages the strengths of each provider for different workloads.

87%
Enterprises adopt multi-cloud
Avoid Lock-in
Reduce vendor dependency
Best-of-Breed
Use each provider’s strength
Terraform
Unifying IaC layer across clouds
The Bottom Line
The best cloud is the one your team can operate, your organization can afford, and your users never notice. Technology choices matter far less than operational maturity, organizational alignment, and the discipline to keep things simple until complexity is earned.
↑ Back to Top
GuideCloud Platforms
SubjectAWS · Azure · GCP Reference
FormatConceptual Reference
Sections12
Created2026-02-10
VersionMulti-Cloud 2026
StyleChinese Porcelain
Number23
StatusActive