2022年3000元左右显卡推荐 Design Load Balancer

What is a Load Balancer?

A Load Balancer is a critical infrastructure component that distributes incoming network traffic across multiple servers to ensure no single server becomes overwhelmed.

Loading simulation...

The core idea is straightforward: instead of routing all traffic to one server (which would eventually crash under hey load), a load balancer acts as a traffic cop, intelligently spreading requests across a pool of healthy servers. This improves application ailability, responsiveness, and overall system reliability.

Popular Examples: AWS Elastic Load Balancer (ELB), NGINX, HAProxy

In this chapter, we will explore the high-level design of a Load Balancer.

Load balancers appear in almost every distributed system architecture. Understanding how to design one from scratch demonstrates deep knowledge of networking, high ailability, and system scalability.

Lets start by clarifying the requirements.

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

Discussion

Candidate: "What type of traffic should the load balancer handle? Are we focusing on HTTP/HTTPS traffic, or should it support any TCP/UDP traffic?"

Interviewer: "Let's design a general-purpose load balancer that supports both Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) load balancing."

Candidate: "What is the expected scale? How many requests per second should the system handle?"

Interviewer: "The load balancer should handle up to 1 million requests per second at peak traffic."

Candidate: "How should we handle server failures? Should the load balancer automatically detect and route around unhealthy servers?"

Interviewer: "Yes, health checking is critical. The system should detect failures within seconds and stop routing traffic to unhealthy servers."

Candidate: "Do we need to support session persistence, where requests from the same client go to the same backend server?"

Interviewer: "Yes, sticky sessions should be supported for stateful applications, but it should be configurable."

Candidate: "What about SSL/TLS termination? Should the load balancer handle encryption?"

Interviewer: "Yes, SSL termination at the load balancer is required to offload encryption work from backend servers."

Candidate: "What ailability target should we aim for?"

Interviewer: "The load balancer itself must be highly ailable with 99.99% uptime, since it's on the critical path for all traffic."

After gathering the details, we can summarize the key system requirements.

1.1 Functional RequirementsTraffic Distribution: Distribute incoming requests across multiple backend servers using configurable algorithms.Health Checking: Continuously monitor backend servers and automatically remove unhealthy ones from the pool.Session Persistence: Support sticky sessions to route requests from the same client to the same server.SSL Termination: Handle SSL/TLS encryption and decryption to offload work from backend servers.Layer 4 and Layer 7 Support: Support both transport-level (TCP/UDP) and application-level (HTTP/HTTPS) load balancing.1.2 Non-Functional RequirementsHigh Availability: The load balancer must be highly ailable (99.99% uptime) with no single point of failure.Low Latency: Should add minimal latency to requests (< 1ms overhead).High Throughput: Handle up to 1 million requests per second at peak.Scalability: Should scale horizontally to handle increasing traffic.Fault Tolerance: Continue operating even when individual components fail.2. Back-of-the-Envelope Estimation

To understand the scale of our system, let's make some reasonable assumptions.

TrafficPeak requests: 1 million RPS (requests per second)Average requests: ~300,000 RPS (assuming 3x peak factor)Concurrent connections: ~500,000 (assuming erage connection duration of 500ms)BandwidthAverage request size: ~2 KB (headers + small payload)Average response size: ~10 KBIngress bandwidth: 1M RPS × 2 KB = 2 GB/sEgress bandwidth: 1M RPS × 10 KB = 10 GB/sHealth ChecksBackend servers: 1,000 servers across multiple data centersHealth check interval: 5 secondsHealth check traffic: 1,000 servers × (1/5) = 200 health checks/secondConnection Table

Each connection requires state tracking:

Per-connection memory: ~500 bytes (source IP, port, destination, timestamps, etc.)Memory for connections: 500,000 × 500 bytes = 250 MB

These numbers indicate we need a system capable of handling massive throughput with minimal memory overhead per connection.

3. Core APIs

A load balancer exposes both a data plane (handling actual traffic) and a control plane (configuration and management). Below are the core APIs.

1. Register Backend ServerEndpoint: POST /api/v1/backends

Adds a new backend server to the load balancer pool.

Request Parameters:address _(required)_: IP address or hostname of the backend server.port _(required)_: Port number the backend is listening on.weight _(optional)_: Weight for weighted load balancing (default: 1).health_check_path _(optional)_: HTTP path for health checks (default: /health).Sample Response:backend_id: Unique identifier for the registered backend.status: Current status (healthy/unhealthy/unknown).Error Cases:400 Bad Request: Invalid address or port.409 Conflict: Backend already registered.2. Remove Backend ServerEndpoint: DELETE /api/v1/backends/{backend_id}

Removes a backend server from the pool. Existing connections are gracefully drained.

Response:200 OK: Backend removed successfully.404 Not Found: Backend ID does not exist.3. Get Backend Health StatusEndpoint: GET /api/v1/backends/{backend_id}/health

Returns the current health status and metrics for a specific backend.

Sample Response:status: Current health status.last_check: Timestamp of last health check.response_time_ms: Average response time.active_connections: Number of active connections.4. Configure Load Balancing AlgorithmEndpoint: PUT /api/v1/config/algorithm

Sets the load balancing algorithm for traffic distribution.

Request Parameters:algorithm _(required)_: One of round_robin, weighted_round_robin, least_connections, ip_hash, random.sticky_sessions _(optional)_: Enable session persistence (default: false).sticky_ttl_seconds _(optional)_: TTL for sticky session cookies.Error Cases:400 Bad Request: Invalid algorithm name.4. High-Level Design

At a high level, our load balancer must satisfy these core requirements:

Traffic Distribution: Route incoming requests to healthy backend servers.Health Monitoring: Detect and isolate unhealthy servers.High Availability: Remain operational even if load balancer nodes fail.

The architecture can be broken down into a data plane (handles actual traffic) and a control plane (manages configuration and health).

Note

Instead of presenting the full architecture at once, we'll build it incrementally by addressing one requirement at a time.

4.1 Requirement 1: Traffic Distribution

The primary function is accepting client connections and forwarding them to backend servers.

Components NeededFrontend Listener

The entry point for all client traffic. It accepts incoming connections on configured ports (e.g., 80 for HTTP, 443 for HTTPS).

Responsibilities:

Accept TCP connections from clients.Parse protocol headers (for Layer 7).Hand off connections to the routing engine.Routing Engine

The brain of the load balancer. It decides which backend server should handle each request.

Responsibilities:

Maintain a list of ailable backend servers.Apply the configured load balancing algorithm.Track connection counts per backend (for least-connections algorithm).Backend Pool

A logical group of backend servers that can handle the same type of requests.

Responsibilities:

Store backend server metadata (address, port, weight).Track health status of each backend.Support multiple pools for different services.Flow: Routing a RequestClient sends a request to the load balancer's public IP.The Frontend Listener accepts the connection.The Routing Engine selects a backend using the configured algorithm.The request is forwarded to the selected Backend Server.The backend processes the request and sends the response.The load balancer forwards the response back to the client.4.2 Requirement 2: Health Monitoring

Without health checks, the load balancer would continue sending traffic to crashed or overloaded servers, causing user-facing errors.

Additional Components NeededHealth Checker

A background service that continuously monitors the health of all backend servers.

Responsibilities:

Send periodic health probes to each backend.Track success/failure history.Update backend status (healthy/unhealthy).Notify the routing engine of status changes.Health Check TypesTypeHow It WorksUse CaseTCP CheckAttempts TCP connectionBasic connectivityHTTP CheckSends HTTP request, expects 2xxWeb applicationsCustom ScriptRuns user-defined checkComplex health logicFlow: Health Check ProcessHealth Checker sends probes to each backend at configured intervals (e.g., every 5 seconds).Backend 1 and 3 respond successfully (marked healthy).Backend 2 fails to respond within timeout (marked unhealthy).Health Checker notifies the Routing Engine of the status change.Routing Engine removes Backend 2 from active rotation.Traffic is distributed only to healthy backends (1 and 3).4.3 Requirement 3: High Availability

A single load balancer is a single point of failure. If it crashes, all traffic stops.

Additional Components NeededMultiple LB Nodes

Deploy multiple load balancer instances that can handle traffic independently.

Virtual IP (VIP)

A floating IP address that can be moved between LB nodes. Clients connect to the VIP, not individual LB IPs.

Failover Manager

Coordinates which LB node is active and handles failover when the primary fails.

High Availability PatternsActive-Passive (Failover)One LB handles all traffic (Active).A standby LB monitors the active via heartbeats.If active fails, standby takes over the VIP within seconds.

Pros: Simple, no state synchronization needed. Cons: Standby resources are wasted during normal operation.

Active-ActiveMultiple LB nodes handle traffic simultaneously.DNS or upstream router distributes traffic across LB nodes.If one LB fails, others continue handling traffic.

Pros: Better resource utilization, higher throughput. Cons: Requires state synchronization for sticky sessions.

4.4 Putting It All Together

Here is the complete architecture combining all requirements:

Core Components SummaryComponentPurposeVirtual IP / DNSSingle entry point for clientsLB NodesAccept and route traffic to backendsSession StoreShared state for sticky sessions (Redis)Health CheckerMonitor backend healthConfig ManagerManage LB configurationBackend PoolGroup of application servers5. Database Design

A load balancer is primarily an in-memory, real-time system. It does not typically use a traditional database for the data plane. However, the control plane needs persistent storage for configuration.

5.1 Storage ConsiderationsData TypeStorageReasonActive connectionsIn-memory (LB node)Ultra-low latency requiredBackend server listIn-memory + Config storeFast lookups, persistent configHealth statusIn-memoryChanges frequently, needs sub-second accessSession mappingsRedis/MemcachedShared across LB nodesConfigurationetcd/Consul/PostgreSQLPersistent, versionedMetrics/LogsTime-series DB (InfluxDB, Prometheus)Historical analysis5.2 Configuration SchemaBackend Servers TableFieldTypeDescriptionbackend_idString (PK)Unique identifierpool_idString (FK)Backend pool this server belongs toaddressStringIP address or hostnameportIntegerPort numberweightIntegerWeight for weighted algorithmsmax_connectionsIntegerConnection limitenabledBooleanWhether backend is enabledcreated_atTimestampCreation timeBackend Pools TableFieldTypeDescriptionpool_idString (PK)Unique identifiernameStringHuman-readable namealgorithmEnumLoad balancing algorithmhealth_check_pathStringHTTP path for health checkshealth_check_intervalIntegerSeconds between checkssticky_sessionsBooleanEnable session persistencesticky_ttlIntegerSession cookie TTLSession Mappings (Redis)6. Design Deep Dive

Now that we he the high-level architecture in place, let's dive deeper into some critical design choices.

6.1 Load Balancing Algorithms

The choice of load balancing algorithm significantly impacts traffic distribution, backend utilization, and overall system performance.

A good algorithm should:

Distribute load evenly across healthy backends.Minimize response time by oiding overloaded servers.Support various use cases (stateless, stateful, heterogeneous servers).

Let's explore the primary approaches.

Approach 1: Round Robin

The simplest algorithm. Requests are distributed sequentially across all ailable backends.

How It Works

Maintain a counter that increments with each request. Select the backend at index counter % number_of_backends.

ProsSimple to implement: Just a counter and modulo operation.Fair distribution: Each backend gets equal traffic over time.No state required: Works independently across LB nodes.ConsIgnores server capacity: A weak server gets the same load as a powerful one.Ignores current load: Doesn't consider existing connections or response times.Not ideal for variable request costs: A hey request counts the same as a light one.

Best For: Homogeneous backends with similar capacity and stateless requests.

Approach 2: Weighted Round Robin

An extension of round robin that accounts for different server capacities.

How It Works

Each backend is assigned a weight proportional to its capacity. Servers with higher weights receive more requests.

Implementation

Maintain a weighted list or use algorithms like Smooth Weighted Round Robin to oid bursts to high-weight servers.

ProsRespects server capacity: Powerful servers handle more traffic.Simple configuration: Just assign weights based on server specs.ConsStatic weights: Doesn't adapt to runtime conditions.Manual tuning: Weights must be configured correctly.

Best For: Heterogeneous server pools with known capacity differences.

Approach 3: Least Connections

Routes each new request to the backend with the fewest active connections.

How It Works

Track the number of active connections per backend. When a new request arrives, select the backend with the minimum count.

ProsAdapts to load: Naturally balances based on current state.Handles slow requests: Servers stuck on slow requests get fewer new ones.ConsRequires connection tracking: Must maintain state across all LB nodes.Cold start issue: New backends may get overwhelmed initially.

Best For: Workloads with varying request processing times.

Approach 4: Weighted Least Connections

Combines least connections with server weights.

How It Works

Select the backend with the lowest ratio of active_connections / weight.

ProsBest of both worlds: Considers both capacity and current load.Optimal utilization: Keeps all servers proportionally loaded.ConsMore complex: Requires tracking connections and weights.

Best For: Production environments with heterogeneous servers.

Approach 5: IP Hash (Source Hashing)

Routes requests based on a hash of the client's IP address.

How It Works

The same client IP always routes to the same backend (assuming the backend pool doesn't change).

ProsSession affinity without cookies: Achieves stickiness at network level.No shared state: Each LB can compute independently.ConsUneven distribution: Some IP ranges may cluster on one backend.Disruption on pool changes: Adding/removing backends reshuffles mappings.

Best For: Simple session persistence without application-level changes.

Approach 6: Consistent Hashing

An advanced form of hashing that minimizes disruption when backends are added or removed.

How It WorksBackends are placed on a virtual hash ring based on their identifiers.Each request's key (e.g., client IP) is hashed to a point on the ring.The request routes to the first backend found clockwise on the ring.ProsMinimal disruption: Adding/removing a backend only affects a small portion of requests.Better distribution: Virtual nodes spread load evenly.ConsMore complex: Requires maintaining the hash ring structure.Hotspot potential: Without virtual nodes, distribution can be uneven.

Best For: Systems where backends frequently scale up/down.

Summary and RecommendationAlgorithmProsConsBest ForRound RobinSimple, statelessIgnores capacityHomogeneous backendsWeighted Round RobinRespects capacityStatic weightsKnown capacity differencesLeast ConnectionsAdapts to loadRequires stateVariable request timesWeighted Least ConnOptimal utilizationComplexProduction environmentsIP HashSimple stickinessUneven distributionBasic session persistenceConsistent HashMinimal disruptionComplex setupDynamic scaling

Recommendation: For most production systems, Weighted Least Connections provides the best balance of adaptability and efficiency. Use Consistent Hashing when backends scale frequently or for cache-aware routing.

6.2 Health Checking Strategies

Health checks are the foundation of reliable load balancing. Without proper health monitoring, the load balancer would continue routing traffic to failed servers.

Health Check ParametersParameterDescriptionTypical ValueIntervalTime between checks5-10 secondsTimeoutMax wait for response2-3 secondsHealthy ThresholdConsecutive passes to mark healthy2-3Unhealthy ThresholdConsecutive failures to mark unhealthy2-3Health Check Types1. TCP Health Check

Simply attempts to establish a TCP connection.

Pros: Simple, low overhead, works for any TCP service. Cons: Doesn't verify application is actually working.

2. HTTP Health Check

Sends an HTTP request and validates the response.

Pros: Verifies application is responding correctly. Cons: Higher overhead, requires health endpoint.

3. Custom Script Check

Runs a user-defined script or command.

Pros: Maximum flexibility for complex checks. Cons: Higher complexity and security considerations.

Graceful Degradation

When a backend fails health checks:

Connection Draining: Allow existing connections to complete.Remove from Pool: Stop routing new requests to the backend.Alert Operations: Notify monitoring systems.Auto-Recovery: Return backend to pool after passing health checks.6.3 Session Persistence (Sticky Sessions)

Some applications require that all requests from a client go to the same backend server, typically when session state is stored locally on the server.

The Problem

Without stickiness:

Approaches to Session PersistenceApproach 1: Cookie-Based Stickiness

The load balancer injects a cookie containing the backend identifier.

Client's first request → LB routes to Backend 1LB adds cookie: Set-Cookie: SERVERID=backend1Client's next request includes: Cookie: SERVERID=backend1LB reads cookie and routes to Backend 1

Pros:

Works across LB restarts.No shared state needed between LB nodes.

Cons:

Requires HTTP-level inspection (Layer 7 only).Cookie must be secure and tamper-proof.Approach 2: Source IP Persistence

Route based on client IP address (similar to IP Hash).

Pros:

Works at Layer 4 (no HTTP parsing).Simple to implement.

Cons:

Breaks with NAT (many clients share one IP).No persistence across backend changes.Approach 3: Application-Level Session Store

Move session state out of the backend entirely.

Pros:

Any backend can handle any request.Best for horizontal scaling.Load balancer remains simple.

Cons:

Requires application changes.Adds dependency on session store.Recommendation

Best Practice: Design applications to be stateless and store session data in a shared store (Redis, Memcached). This eliminates the need for sticky sessions and improves scalability.

Use cookie-based stickiness only for legacy applications that cannot be modified.

6.4 Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model, each with distinct capabilities and trade-offs.

Layer 4 (Transport Layer)

Routes based on IP address and TCP/UDP port only. Does not inspect packet contents.

CharacteristicsAspectLayer 4SpeedVery fast (hardware-accelerated possible)IntelligenceLimited (no content awareness)Use CasesAny TCP/UDP trafficSSL HandlingPass-through onlySticky SessionsIP-based onlyLayer 7 (Application Layer)

Inspects HTTP headers, URLs, cookies, and content to make routing decisions.

CharacteristicsAspectLayer 7SpeedSlower (content parsing)IntelligenceHigh (content-based routing)Use CasesHTTP/HTTPS trafficSSL HandlingTermination + inspectionSticky SessionsCookie, header, or URL-basedLayer 7 Routing CapabilitiesWhen to Use EachScenarioRecommendedRaw TCP traffic (databases, custom protocols)Layer 4Maximum performance, simple routingLayer 4Content-based routingLayer 7SSL terminationLayer 7Cookie-based sticky sessionsLayer 7HTTP header inspection/modificationLayer 76.5 SSL/TLS Termination

SSL termination means the load balancer handles encryption/decryption, so backend servers communicate in plain HTTP.

SSL Termination ArchitectureBenefitsOffloads CPU: Encryption is CPU-intensive. Centralizing it ses backend resources.Simplified Certificate Management: Certificates managed in one place.Content Inspection: LB can read HTTP headers for routing decisions.Performance Optimization: TLS session reuse across clients.Security Considerations

Traffic between LB and backends is unencrypted. Options:

Trust the network: If LB and backends are in a private network, plain HTTP may be acceptable.End-to-end encryption: Use HTTPS to backends (re-encryption).Mutual TLS: Both LB and backends authenticate each other.SSL Configuration Best PracticesUse TLS 1.2+ only (disable older protocols).Enable HTTP/2 for multiplexed connections.Implement OCSP stapling for faster certificate validation.Configure cipher suites to prioritize security and performance.6.6 Handling Load Balancer Failures

Since the load balancer sits on the critical path, its failure means complete service outage. Designing for high ailability is essential.

Failure Detection

Load balancer nodes monitor each other using:

Heartbeat messages: Periodic pings between nodes.Health checks: Same mechanism used for backends.Shared storage: Write timestamps to detect node liveness.Failover MechanismsVRRP (Virtual Router Redundancy Protocol)

Industry standard for IP failover.

Failover time: 1-3 seconds

DNS-Based Failover

Multiple LB IPs registered in DNS with health checks.

Failover time: DNS TTL (can be slow)

Anycast

Multiple LB nodes share the same IP address. BGP routing directs traffic to the nearest healthy node.

Failover time: Seconds (BGP convergence)

Stateless vs Stateful FailoverApproachConnection HandlingComplexityStatelessActive connections dropped on failoverSimpleStatefulConnections migrated to backupComplex (requires state sync)

Recommendation: Design for stateless failover. Modern applications handle connection drops gracefully with retries.

ReferencesNGINX Load Balancing Documentation - Comprehensive guide on HTTP load balancing with NGINXHAProxy Documentation - Detailed documentation for HAProxy configuration and algorithmsAWS Elastic Load Balancing - AWS documentation covering ALB, NLB, and CLBGoogle Cloud Load Balancing - Google's approach to global load balancingDesigning Data-Intensive Applications by Martin Kleppmann - Chapter on distributed systems and load balancing conceptsQuizDesign Load Balancer Quiz1 / 20Multiple Choice

What is the primary purpose of a load balancer in a distributed system?