12-01 Cloud Providers - AWS, Azure, and GCP Required
Cloud providers offer on-demand computing resources and services that power modern applications. This lecture compares the major cloud providers and their offerings.
Major Cloud Providers
The Big Three
The cloud computing market is dominated by three major providers:
- Amazon Web Services (AWS) - Market leader
- Microsoft Azure - Strong enterprise presence
- Google Cloud Platform (GCP) - Innovation leader
Amazon Web Services (AWS)
Market Position: The largest public cloud by far
Usage: Powers millions of applications
- Netflix: Video streaming infrastructure
- Reddit: Social platform backend
- Spotify: Music streaming services
- Millions of other applications
[!WARNING] Critical Infrastructure
When AWS goes down, half of the internet goes down. Example: The infamous S3 outage in February 2017 affected thousands of websites and services.
AWS Key Services
Compute:
- EC2 (Elastic Compute Cloud): Virtual machines
- Lambda: Serverless functions
- ECS/EKS: Container orchestration
Storage:
- S3 (Simple Storage Service): Object storage (powers much of the internet)
- EBS: Block storage for EC2
- Glacier: Archival storage
Database:
- RDS: Relational databases (MySQL, PostgreSQL, etc.)
- DynamoDB: NoSQL database
- Aurora: High-performance relational database
Big Data:
- EMR (Elastic MapReduce): Managed Hadoop/Spark
- Redshift: Data warehouse
- Kinesis: Real-time data streaming
Microsoft Azure
Market Position: Second largest, strong in enterprise
Key Strengths:
- Integration with Microsoft products (Office 365, Active Directory)
- Hybrid cloud capabilities
- Enterprise support
Azure Key Services
Compute:
- Virtual Machines: Similar to AWS EC2
- App Service: PaaS for web apps
- Azure Functions: Serverless
Storage:
- Azure Storage: Blob, file, queue, table storage
- Azure Data Lake: Big data storage
Database:
- Azure SQL Database: Managed SQL Server
- Cosmos DB: Globally distributed NoSQL
- Azure Database for PostgreSQL/MySQL
Big Data:
- HDInsight: Managed Hadoop/Spark/Kafka
- Azure Synapse Analytics: Data warehouse
- Azure Databricks: Apache Spark platform
Google Cloud Platform (GCP)
Market Position: Third largest, innovation leader
Key Strengths:
- Advanced data analytics and ML
- Kubernetes (originated from Google)
- Global network infrastructure
GCP Key Services
Compute:
- Compute Engine: Virtual machines
- Cloud Functions: Serverless
- GKE (Google Kubernetes Engine): Managed Kubernetes
Storage:
- Cloud Storage: Object storage
- Persistent Disk: Block storage
Database:
- Cloud SQL: Managed MySQL/PostgreSQL
- Cloud Spanner: Globally distributed database
- BigTable: NoSQL for big data
- BigQuery: Serverless data warehouse
Big Data:
- Dataproc: Managed Hadoop/Spark
- Dataflow: Stream and batch processing
- Pub/Sub: Messaging service
Feature Parity
Key Insight: All clouds try to compete on features, so they all end up having extremely similar feature sets.
| Service Type | AWS | Azure | GCP |
|---|---|---|---|
| VMs | EC2 | Virtual Machines | Compute Engine |
| Serverless | Lambda | Functions | Cloud Functions |
| Containers | ECS/EKS | AKS | GKE |
| Object Storage | S3 | Blob Storage | Cloud Storage |
| NoSQL | DynamoDB | Cosmos DB | BigTable |
| Data Warehouse | Redshift | Synapse | BigQuery |
| Big Data | EMR | HDInsight | Dataproc |
Virtual Machines Comparison
AWS EC2
Range: From tiny to gigantic
- T2.Nano: 1 vCPU, 512 MB RAM
- X1.32xlarge: 128 vCPU, 2000 GB RAM
Features:
- ✓ GPUs available (useful for deep learning)
- ✓ Priced per-second
- ✓ On-Demand: Pay for what you use
- ✓ Spot Instances: Auction for unused capacity (much cheaper)
- Caveat: VM may be shut down with short notice
Azure Virtual Machines
Similar to AWS:
- ✓ GPUs available
- ⚠ Max 32 vCPUs (currently)
- ⚠ Max 800 GB RAM (currently)
- Note: Most applications won’t hit these limits
Google Compute Engine
Features:
- ✓ Largest: 96 vCPU, 624 GB RAM
- ✓ Custom-sized machines (unique feature)
- ✓ Per-second billing
- ✓ Sustained use discounts
Storage Services
Object Storage
All three providers offer massive-scale object storage:
AWS S3:
- Most widely used
- Powers much of the internet (e.g., Imgur)
- 11 9’s of durability (99.999999999%)
Azure Blob Storage:
- Similar to S3
- Integrated with Azure ecosystem
Google Cloud Storage:
- Similar to S3
- Multiple storage classes
Use Cases
Web Assets → S3/Blob/GCS
Backups → Glacier/Archive/Coldline
Big Data → Data Lake on S3/ADLS/GCS
Hosted Data Processing
All providers offer managed big data services:
Amazon EMR
- Managed Hadoop, Spark, HBase, Presto, Hive
- Automatic cluster scaling and provisioning
Microsoft HDInsight
- Managed Hadoop, Spark, Kafka, HBase
- Integration with Azure services
Google Dataproc
- Managed Hadoop and Spark
- Fast cluster creation (90 seconds)
- Per-second billing
Database Services
Managed Relational Databases
AWS RDS:
- MySQL, PostgreSQL, MariaDB, Oracle, SQL Server
- Aurora (AWS proprietary, MySQL/PostgreSQL compatible)
Azure:
- Azure SQL Database (SQL Server)
- Azure Database for MySQL/PostgreSQL
GCP:
- Cloud SQL (MySQL, PostgreSQL, SQL Server)
NoSQL Databases
AWS DynamoDB:
- Key-value and document database
- Serverless, auto-scaling
Azure Cosmos DB:
- Multi-model database
- Global distribution
- Multiple consistency levels
GCP BigTable:
- Wide-column NoSQL
- Powers Google Search, Maps, Gmail
GCP BigQuery:
- Serverless data warehouse
- Petabyte-scale analytics
- SQL interface
Cloud Regions and Availability
Azure Regions
- 60+ Azure Regions worldwide
- Most extensive global presence
Explore: Azure Datacenter Map
Reasons to Select a Region
1. Cost
- Egress fees: Moving data between regions costs money
- Regional pricing: Some services cheaper in certain regions
- Tip: Use pricing calculators to estimate costs
2. Available Resources
- Not all services available in all regions
- Check “Products by Region” documentation
3. Security and Compliance
- Government clouds: Special regions for government data
- Data sovereignty: Keep data in specific countries
- Compliance: GDPR, HIPAA, etc.
4. Speed (Latency)
- Pick closest region to your users
- Lower latency = better user experience
Availability Zones
Definition: Physically separate locations within each region
Purpose: Tolerance to local failures
- Earthquakes
- Floods
- Fires
- Power outages
Characteristics:
- Redundancy and logical isolation
- High-performance network between zones
- Round-trip latency < 2ms (Azure)
┌─────────────────────────────────────┐
│ Region: East US │
│ ┌───────────┐ ┌───────────┐ │
│ │ Zone 1 │ │ Zone 2 │ │
│ │ (DC 1,2) │ │ (DC 3,4) │ │
│ └───────────┘ └───────────┘ │
│ ┌───────────┐ │
│ │ Zone 3 │ │
│ │ (DC 5,6) │ │
│ └───────────┘ │
└─────────────────────────────────────┘
Unique Features
Google Cloud Platform
Cloud Spanner:
- Planet-scale distributed database
- Strong consistency (CP system)
- Horizontal scalability
Tensor Processing Unit (TPU):
- Custom hardware for deep learning
- Faster than GPUs for specific workloads
BigQuery:
- Serverless data warehouse
- Analyze petabytes in seconds
AWS
Absurdly large feature set:
- 200+ services
- Most mature ecosystem
FPGAs (Field-Programmable Gate Arrays):
- Custom hardware acceleration
- F1 instances
Global Infrastructure:
- Most regions and availability zones
Azure
Hybrid Cloud:
- Azure Stack (on-premises Azure)
- Azure Arc (manage resources anywhere)
Enterprise Integration:
- Active Directory integration
- Office 365 integration
- Strong Windows support
Cloud Security
Key Security Features
Data Storage
- Encryption at rest: All providers encrypt stored data
- Encryption in transit: HTTPS/TLS for data transfer
- Regulatory standards: Compliance with GDPR, HIPAA, SOC 2, etc.
Compliance
- Certifications: ISO 27001, PCI DSS, FedRAMP
- Audit logs: Track all access and changes
- Data residency: Control where data is stored
Data Migration
- Secure transfer: How to move sensitive data?
- Physical transfer: AWS Snowball, Azure Data Box
- Network transfer: VPN, Direct Connect, ExpressRoute
Cloud Permissions
- IAM (Identity and Access Management)
- Role-based access control (RBAC)
- Principle of least privilege
- Example: Students don’t get sudo access!
DDoS Mitigation
- AWS Shield: DDoS protection
- Azure DDoS Protection: Network security
- GCP Cloud Armor: Web application firewall
High Scalability
- Auto-scaling with security settings
- Load balancing across zones
- Redundancy for high availability
Top Benefits of Cloud Computing
- Cost Efficiency
- Pay only for what you use
- No upfront capital expenditure
- Economies of scale
- Scalability
- Scale up/down on demand
- Handle traffic spikes
- Global reach
- Reliability
- High availability (99.9%+ uptime)
- Disaster recovery
- Automatic backups
- Performance
- Latest hardware
- Global CDN
- Low latency
- Security
- Professional security teams
- Compliance certifications
- Regular updates
- Innovation
- Access to latest technologies
- AI/ML services
- Managed services
Choosing a Cloud Provider
Decision Factors
Existing Infrastructure:
- Already using Microsoft? → Azure
- Already using Google Workspace? → GCP
- Need broadest service selection? → AWS
Specific Requirements:
- Best data analytics? → GCP (BigQuery)
- Best enterprise integration? → Azure
- Most mature ecosystem? → AWS
Cost:
- Use pricing calculators
- Consider egress fees
- Look for committed use discounts
Skills:
- Team expertise
- Training availability
- Community support
Multi-Cloud Strategy
Many organizations use multiple clouds:
Benefits:
- ✓ Avoid vendor lock-in
- ✓ Use best service from each provider
- ✓ Geographic coverage
Challenges:
- ⚠ Increased complexity
- ⚠ Higher management overhead
- ⚠ Data transfer costs
Summary
Cloud providers offer similar core services with unique strengths:
- AWS: Largest, most mature, broadest service selection
- Azure: Best for enterprises, hybrid cloud, Microsoft integration
- GCP: Innovation leader, best data analytics, Kubernetes expertise
Key Considerations:
- Feature parity across providers
- Regional availability and compliance
- Security and compliance features
- Cost optimization strategies
Choose based on your specific requirements, existing infrastructure, and team expertise.