Big Data has become a cornerstone of modern computing, enabling organizations to extract insights from massive datasets that were previously impossible to process. This lecture covers the fundamentals of big data and the platforms that make big data processing possible.
Big Data refers to:
[!NOTE] It’s not just about size
Big data is characterized by multiple dimensions beyond just volume.
┌─────────────────────────────────────────┐
│ Big Data = Volume + Variety + │
│ Velocity + Veracity │
└─────────────────────────────────────────┘
Data generated by human activities:
Data from monitoring equipment and environments:
Data from advanced instruments:
Purpose: Business/system operations
┌─────────────────────────────────┐
│ Operational Database │
│ - Current state │
│ - Frequent updates │
│ - Transaction focused │
│ - Normalized schema │
└─────────────────────────────────┘
Purpose: Understanding and optimization
┌─────────────────────────────────┐
│ Analytical Data Warehouse │
│ - Historical data │
│ - Read-optimized │
│ - Aggregated views │
│ - Denormalized schema │
└─────────────────────────────────┘
Both types matter: Big data platforms must handle both operational and analytical workloads.
Current hot topics: Large Language Models (LLMs) / Generative AI
Top-down perspective: Data economy
More data → More insights → Better decisions → Business success
Bottom-up perspective: Optimization
Understanding → Optimizing → Saving cost / Creating value
Key Principle: With more data, the same algorithm performs much better!
Source: Halevy, Norvig, and Pereira (2009)
This principle shows that:
From “Platform Revolution”:
“A platform is a business based on enabling value-creating interactions between external producers and consumers. The platform provides an open, participative infrastructure for these interactions and sets governance conditions for them.”
Big data platforms are large-scale service platforms that:
Not just: A database or data marketplace (even if big!)
Concept: Data is valuable and must be managed and exploited
Concept: Product thinking for data processing and delivery
Modern approach: Combine both perspectives for effective data management
┌─────────────────────────────────────────┐
│ Big Data Services & Applications │
│ (Analytics, ML, BI, Dashboards) │
├─────────────────────────────────────────┤
│ Middleware Platforms │
│ (Building, deploying, operating │
│ reliable big data services) │
├─────────────────────────────────────────┤
│ Data-Centric Virtualized │
│ Infrastructures │
│ (Compute, Storage, Network) │
├─────────────────────────────────────────┤
│ Consumers & Producers │
│ (Sensors, Things, People, Processes) │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Data Sources │
│ (IoT, B2B/B2C, Transport, etc.) │
└────────────┬────────────────────────────┘
│
┌────────────▼────────────────────────────┐
│ Core Services │
│ ┌──────────┐ ┌──────────┐ │
│ │ Data │ │ Data │ │
│ │ Ingest │ │ Store │ │
│ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Data │ │ Data │ │
│ │Processing│ │ Query │ │
│ └──────────┘ └──────────┘ │
└────────────┬────────────────────────────┘
│
┌────────────▼────────────────────────────┐
│ Analytics & Applications │
│ - ML Algorithms & Pipelines │
│ - Visualization │
│ - Business Applications │
└─────────────────────────────────────────┘
Data Source → Ingestion → Raw Data Store
↓
Data Processing
↓
Analytical Store → Visualization
↓
Machine Learning → Model Serving
┌─────────────────────────────────────────┐
│ You (Big Data Platform Expert) │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │Programming │ │ Data Mgmt │ │
│ │Models & │ │ Models & │ │
│ │Frameworks │ │ Tools │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────────────────────┐ │
│ │ Large-Scale Computing │ │
│ │ Platforms (Service-Based) │ │
│ └─────────────────────────────┘ │
│ │
│ ┌─────────────────────────────┐ │
│ │ Provisioning, Automation │ │
│ │ and Analytics Processes │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────────┘
Data → Analysis → Result
↓ ↓ ↓
Where? Price? Quality?
Privacy? Ethics? Compliance?
Characteristics:
Example:
Daily customer transactions → Batch ETL → Data warehouse
Characteristics:
Example:
IoT sensor data → Stream processing → Real-time alerts
Characteristics:
Example:
User clicks → Real-time recommendation engine
Big data platforms are essential infrastructure for modern data-driven applications:
Understanding big data platforms is fundamental for building scalable, data-intensive applications.