Talk about architectural design

The occurrence of accidents is the result of quantity accumulation, and nothing is as simple as it seems. In the process of software operation, with the increase of users, failure will occur sooner or later without considering high availability. High availability design should not be considered in advance, and high availability is a huge knowledge

What do you consider when designing a high availability system? During the architectural design process

This article focuses on how to ensure high usability of stateless services at the architectural level

★ Stateless service: At any time the service does not store data (except cache), can be arbitrarily destroyed and created, user data will not be lost, can arbitrarily switch to any copy, does not affect the user.”

The high availability of stateless services aims to ensure that data is not lost under any circumstances, services are not failed, and in the event of a failure of some services, the impact is minimal and recovery can be fast

It can be considered from these aspects

Redundant deployment

In a single point architecture, as the amount of data increases, the load on a single point is too heavy, which leads to service breakdown and unavailability. For stateless services, you can deploy services on multiple nodes to disperse the load

For how to schedule incoming requests, you can refer to the way of load balancing to ensure full utilization of server resources

Load balancing for stateless services

You can use the four algorithms provided in load balancing:

The problem of the first two algorithms is that when the back-end server has different load pressures or server configurations, it cannot guarantee the multi-allocation with low pressure and the small allocation with high pressure, so it is introduced

The algorithm above is used for stateless applications and is needed if communication state is to be saved

How to select a load balancing algorithm?

First, discard the random algorithm and use the basic rotation algorithm for the simplest configuration. This algorithm applies to the scenario where the server configurations are consistent, for example, when VMS are used, the server configurations can be dynamically adjusted. At the same time, ensure that dedicated VMS do not have other applications deployed on them

However, multiple applications are often installed on the server, so consider choosing between a weighted rotation and a minimum number of connections

Weighted rotation is suitable for short connection scenarios, such as HTTP services. In K8S, because each POD is independent, the default service policy is non-weighted rotation

If the scenario without cookie function is considered in the system architecture, the source address hash algorithm can be used to map the source IP to the same RS all the time, which is called session persistence mode in K8S, and forward to the same POD each time


Identification of high concurrency applications

The primary metric QPS is the number of responses processed per second, such as 10W PV per day

The formula (100000 * 80%) / (86400*20%) = 4.62 QPS(peakQPS)

How it works: 80% of your daily visits are focused on 20% of the time, which is called peak time.

For example, the system I made hosts up to 5W machines, and each machine has PV once a minute, with relatively uniform time. That is

((60*24)*50000)/(86400)=833 QPS

Generally, the magnitude of hundreds can be called high concurrency. The information found on the Internet is that the system with more than 100 million PV per day is generally 1500QPS and the peak value of 5000QPS.

In addition to QPS, there are service response time and number of concurrent users for reference

When the server load is high, the processing speed is slow, the network is disconnected, the service processing fails, and exceptions are reported. Specific problems need to be analyzed

You can obtain server performance status through monitoring, and dynamically adjust and retry to ensure service availability and reduce maintenance costs. In general, vertical expansion can be considered when a single server is under great pressure

Vertical scaling

Vertical scaling is to increase the processing capacity of a single server, mainly in three ways

The way to enhance the performance of a single machine is the fastest and easiest way, but there are limits in the performance of a single machine. At the same time, if a failure occurs during the deployment of a single machine, it is fatal to the application. We should ensure that the application is always in the available state, that is, to ensure the reliability of the five nys as the saying goes

Horizontal automatic expansion

Knowing the limitations of a single machine, consider scaling horizontally

Horizontal scaling is the time when the pressure increases, add new nodes to share the pressure, but it isn’t enough to deploy more, for the continued growth of the business, there is always one day will break through the service pressure limit, if the surge of traffic scene, artificial response would be caught off guard, so I need a kind of automatic telescopic means

★ Iaas infrastructure as a Service (IAAS) represents the management of hardware resources such as servers, storage, and networks.

Note: Elastic scaling applies to stateless service scenarios

In addition, stateless machines are not enough to bear the request traffic, and the threshold for horizontal scaling is usually thousands of QPS. At the same time, there will be pressure on the database. Therefore, it is recommended not to deploy stateful services on the horizontal scaling server

The stateful service stress dispersion will be covered in a later article


For a website, the user interaction page is a special service, which contains many static resources, such as pictures, videos and pages (HTML/CSS/JS). These resources need to be downloaded on site when users request them. The download speed determines the loading speed

At this level, we can consider using CDN content distribution network to cache the front-end static data to the edge server

★ An edge server (edge node) is either a server that interacts with the user or a server node that is close to the user. Proximity to the user reduces the time required for network transmission.

If the CDN web service is used, you can bind the HTTPS certificate to the CDN, configure the return source timeout and 301/302 status codes in the return source policy, and intelligently compress web pages and customize error pages

Oss is a special storage scheme in the form of objects that can theoretically store an unlimited number of files

Consider using the OSS object store and CDN to store media resources on the object store, or compress and archive cold data on the OSS

Most common video websites use OSS, and the data of Weibo n years ago should be archived into the object storage


This article introduces the common high availability architecture designs for stateless services, which are

Note that stateless applications should not store sessions, nor should they store data

This paper introduces six algorithms of load balancing, but does not introduce the specific implementation of each algorithm, which is left for readers to study, these schemes will have certain difficulty in actual use, any of the causes of service failure are broad and profound knowledge, programmers are not only writing code

This is just part of the high availability scenario for stateless services. What else do you know about stateless services and design at the code level?

Sometimes in more demanding cases, with no more server resources, how can you improve code performance with limited servers?