Support docker and k8s in native

Question

Support docker and k8s in native

winlinvip opened this issue 4 years ago · comments

Docker and K8S need SRS to make some modifications, especially in the Origin and Edge Cluster modes, to provide better support for Docker and K8S. This way, users can quickly build a streaming media source station and distribution cluster using SRS, and it is also convenient for scaling, resizing, monitoring, and operation.

The important direction of SRS4 is to be cloud-native, creating a streaming media cluster on the cloud.

Please refer to the K8S Wiki at: https://github.com/ossrs/srs/wiki/v4_CN_K8s

TRANS_BY_GPT3

Winlin · Answer 1 · Sat Feb 08 2020 19:58:57 GMT+0800 (China Standard Time)

daemon is a background service that uses two forks to make the parent process become 1, thus differentiating it from background processes. In Docker, Docker implements the daemon, so SRS needs to have the default daemon set to off, which means it will always start in the foreground. This is a change that SRS needs to make to become cloud-native. For more details, refer to #1594.

Solution: Add the configuration disable_daemon_for_docker to automatically disable the daemon in Docker.

TRANS_BY_GPT3

Winlin · Answer 2 · Sat Feb 08 2020 20:11:57 GMT+0800 (China Standard Time)

The machines in Docker are all internal network addresses within NAT, no longer having external IP addresses. Therefore, when directing to a source station cluster with flows, there may be an issue of returning internal IP addresses. Please refer to #1501 for more details.

Solution: For small-scale source station clusters with less than 5k flow levels, Origin Cluster can be directly used. Please refer to #1501 (comment) for more information.

Solution: For medium-scale source station clusters with less than 100k flow levels, it is necessary to use Origin Cluster Master. Please refer to #1607 (comment) for more information.

For flow levels exceeding 100k, it is necessary to define them on your own. There are some suggested solutions available, please refer to #1607 (comment) for more information.

Solution: In K8S, you can use Headless Service and StatefulSets to assign a fixed accessible domain name to each origin server (Pod). If using Deployment, you can add a Service for each Deployment. Alternatively, you can consider using the OCM method (specific progress can be found in OCM's Issue #1607).

TRANS_BY_GPT3

Winlin · Answer 3 · Sat Feb 08 2020 20:15:17 GMT+0800 (China Standard Time)

SRS3 already supports Docker. The official Docker image is based on CentOS7, which is relatively mature and stable in the server field. It provides the image for deployment of SRS3, which can be found at SRS3 Docker. It also provides development images at Development, as well as special images like SRT. For more details, please refer to srs-docker.

Solution: The official SRS Docker image project is provided at srs-docker. The image can be deployed on Docker and Aliyun.

TRANS_BY_GPT3

Winlin · Answer 4 · Wed Feb 12 2020 19:17:21 GMT+0800 (China Standard Time)

In cloud services, SLB is usually placed in front of SRS to provide services. SLB is similar to nginx reverse proxy, with high bandwidth and throughput. Multiple SRS extensions can be added behind it, without changing the IP address for external service provision. In k8s, SRS runs in pods, and new pods can be continuously started. SLB is placed in front of the pods to provide external services.

SLB generally has health checks, either TCP or HTTP protocols, which require SRS support. Please refer to #1598 for more information.

Solution: SRS supports TCP health checks for SLB.

TRANS_BY_GPT3

Winlin · Answer 5 · Thu Feb 13 2020 19:06:53 GMT+0800 (China Standard Time)

If SRS shares volume with other containers like Nginx, for example, SRS handles HLS while Nginx is responsible for distributing HLS, if the volume is k8s' emptyDir, then the HTTP directory is empty, without crossdomain.xml or index.html, which is not very user-friendly.

SRS can write some information by default, but it is turned off by default. In k8s, it can be enabled by referring to #1603.

Solution: A new container was created and a script was used to copy all the original files of SRS to the Share Volume, including crossdomain.xml, index.html, and console. Refer to #1603.

TRANS_BY_GPT3

Winlin · Answer 6 · Sat Feb 15 2020 11:53:20 GMT+0800 (China Standard Time)

Regarding the storage issues of K8s, it can be divided into two types: configuration and streaming slicing, such as DVR or HLS.

K8s configuration can be done using ConfigMap, which can be mounted as a shared volume in the container's file system. This makes SRS appear as a configuration file. For more details, please refer to 1, 2, 3. As for how to notify SRS to reload after updating the configuration file, further investigation is required.

In K8s storage, when there is a single origin server, SRS and Nginx can be deployed in the same Pod and share a directory using Volume: emptyDir. SRS writes HLS files to this directory, while Nginx reads the HLS files and distributes them externally. However, in an Origin Cluster, since the origin servers are separate Deployments and Services, they cannot be deployed in the same Pod. In order to share storage across Pods, a cross-Pod shared storage solution such as NAS is needed.

Solution: It has been confirmed that SRS can support NAS as a Persistent Volume (PV).

Remark: For cross-Pod shared storage, OSS can also be used. However, it may require some additional work from SRS, which needs to be confirmed.

TRANS_BY_GPT3

Winlin · Answer 7 · Sat Feb 15 2020 16:29:02 GMT+0800 (China Standard Time)

In the origin server cluster, each origin server needs to provide services to the Edge, or in other words, the origin server needs to be accessed. Therefore, each origin server needs to have a service address. Please refer to Origin Server Cluster K8s Deployment Method for more details.

This requires configuring coworkers in the origin server, which includes configuring the addresses of all the origin servers. Therefore, the origin server needs to support configuring its own name, which means accessing its own HTTP API /api/v1/cluster. Care should be taken to avoid circular calls.

Solution: Confirm that it does not cause any issues, but optimization is needed. Refer to #1608.

TRANS_BY_GPT3

Winlin · Answer 8 · Mon Feb 17 2020 19:19:52 GMT+0800 (China Standard Time)

K8S updates and rollbacks can be achieved through Rolling Update. Version information is recorded each time kubectl apply is executed, and updates are automatically applied in parts. The update process can be paused and rolled back in case of issues. Refer to K8S RollingUpdate Rolling Upgrade Mechanism Example for more information.

When performing K8S gray release, the general approach is to create a new version of the image and deploy the new version of the application. The labels for the old and new applications are the same, so the SLB or Service will evenly distribute the traffic between the old and new versions. Then, by scaling down the old version and scaling up the new version, the traffic is gradually increased to the new version.

There are several issues here:

Stateless applications are easier to operate, while stateful applications like stateful deployment of the origin server cluster require modifying the image and forcing Pod migration for updates, which actually cannot achieve traffic redirection and is more complex.
When scaling down older versions, edge servers need to wait for client disconnection, requiring a certain waiting time, or it depends on how Kubernetes updates long-lived connections for stateless services, which needs to be seen how Kubernetes implements it.
For stateless deployment of the origin server cluster, when upgrading, a new batch of stateless applications needs to be created, which can be troublesome without OCM (Origin Cluster Master) and is not suitable for more than 10 nodes.

The second point above, regarding how to perform a smooth upgrade, research has found that Kubernetes (K8S) has relevant mechanisms:

Termination of Pods describes this process. The parameter --grace-period or .spec.terminationGracePeriodSeconds can be configured to set the waiting time before forcefully terminating the process. For example, if the client does not disconnect within 12 hours, the process will be killed.
Kubernetes (K8S) sends a SIGTERM signal to the process or uses preStop to notify SRS to perform cleanup tasks. SRS should stop accepting new connections and continue serving existing connections.
This mechanism is only applicable to the Edge. Origin can choose to directly restart and update since the streams are served by this origin. However, it is recommended to gradually disconnect connections for a better experience.

Remark: The signal for Nginx graceful upgrade is SIGUSR2, while SIGTERM and SIGINT are used for quick termination. Since some processes in K8S may not receive SIGTERM, it might be a better practice to use preStop to send SIGUSR2.

Solution: SRS supports upgrade, rollback, and gray deployment solutions. You can refer to Gracefully Upgrade for more details. SRS3 will support core functionalities, while SRS4 will provide even more comprehensive support.

Wiki please refer to: https://github.com/ossrs/srs/wiki/v4_CN_K8s#srs-cluster-canary-release

TRANS_BY_GPT3

Winlin · Answer 9 · Tue Feb 18 2020 21:59:26 GMT+0800 (China Standard Time)

SRS3 supports Gracefully Quit, which requires sending the SIGQUIT signal to SRS.

However, in K8S, after preStop, it also sends SIGTERM to SRS. By default, SIGTERM is a Fast Quit that will exit quickly. Therefore, if SRS receives SIGTERM during Gracefully Quit, it will also exit quickly.

Therefore, in the Docker environment, SRS needs to specify SIGTERM through configuration (ignore or consider it as Gracefully QUIT). Since it is only needed during smooth upgrades, it is not required by default. It is more appropriate to specify it through configuration. Reference: #1579 (comment)

For example, the configuration is as follows:

        lifecycle:
          preStop:
            exec:
              command:
                - /bin/sh
                - -c
                - >
                  /usr/local/srs/etc/init.d/srs grace;
                  sleep 60
      terminationGracePeriodSeconds: 30

So after we delete the Deployment, we will first call preStop to send SIGQUIT to SRS, initiate Gracefully Quit, and then wait for 60s.

And at 30 seconds, K8S realizes that preStop has not finished yet, so it sends SIGTERM to SRS, then waits for 2 seconds and sends SIGKILL to force the exit of SRS.

The log is as follows:

[2020-02-18 14:54:50.881][Trace][1][217] sig=3, user start gracefully quit
[2020-02-18 14:54:51.405][Trace][1][212] cleanup for quit signal fast=0, grace=1
[2020-02-18 14:54:51.405][Warn][1][212][11] main cycle terminated, system quit normally.
[2020-02-18 14:54:52.405][Trace][1][212] wait for 1 conns to quit
[2020-02-18 14:54:54.405][Trace][1][212] wait for 1 conns to quit
[2020-02-18 14:54:58.409][Trace][1][212] wait for 1 conns to quit
[2020-02-18 14:55:06.410][Trace][1][212] wait for 1 conns to quit
[2020-02-18 14:55:20.321][Trace][1][217] force gracefully quit, signo=15
2 seconds later, forced exit: command terminated with exit code 137

Solution: Configure force_grace_quit to consider both SIGTERM and SIGQUIT as graceful exits.

Wiki reference: https://github.com/ossrs/srs/wiki/v4_CN_K8s#srs-cluster-canary-release

TRANS_BY_GPT3

Winlin · Answer 10 · Tue Feb 18 2020 23:25:05 GMT+0800 (China Standard Time)

When starting to delete a Pod, it will also be removed from the Service at the same time. However, since this process is synchronous, it is possible that the Pod has not been removed from the Service when we receive the SIGQUIT signal. Please refer to Termination of Pods.

3. When listed in client commands, a Pod appears as "Terminating".

4. (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the Pod shutdown process.
1. If one of the Pod's containers has defined a preStop hook, it will be executed within the container. If the preStop hook is still running after the grace period expires, step 2 is triggered with a small (2 second) extended grace period.
  2. The container is sent the TERM signal. Note that not all containers in the Pod will receive the TERM signal at the same time and may each require a preStop hook if the order in which they shut down matters.

5. (simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running Pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.

Step 5 is being performed simultaneously with Step 3. This means that when we receive SIGQUIT, we cannot immediately stop the listeners. We need to wait for a short period, such as around 2.3 seconds, to ensure that the Service has safely removed this Pod. Then we can stop listening without causing any issues. Add a configuration option for this.

# For gracefully quit, wait for a while then close listeners,
# because K8S notify SRS with SIGQUIT and update Service simultaneously,
# maybe there is some new connections incoming before Service updated.
# default: 2300
grace_start_wait 2300;

Note: Of course, this waiting time can also be done in preStop. It would be better to do it uniformly in SRS.

By calculation, the minimum time required for SRS to perform Gracefully Quit is 5.5 seconds:

terminationGracePeriodSeconds = grace_final_wait(3200ms) + grace_start_wait(2300ms) = 5.5s

Remark: When deleting a Pod, it will not be removed from the SLB record. When creating a Service as LoadBalancer, it will listen on the port on the Node. However, when deleting a Pod, it will only be removed from the Service. The Service will remove the Pod, so the process is faster. As shown in the following figure:

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                                        AGE
srs-origin-service   LoadBalancer   172.21.15.150   172.17.232.159   2935:32559/TCP,2080:32123/TCP,2985:32579/TCP   149m

Remark: Only when deleting a Service, the port record will be removed from the SLB.

Wait for a certain period of time and then Gracefully Quit:

[2020-02-18 16:04:32.935][Trace][1][353] sig=3, user start gracefully quit
[2020-02-18 16:04:33.003][Trace][1][354] <- CPB time=20006443, okbps=1,0,0, ikbps=272,0,0, mr=0/350, p1stpt=20000, pnt=5000
[2020-02-18 16:04:33.693][Trace][1][348] cleanup for quit signal fast=0, grace=1
[2020-02-18 16:04:33.693][Warn][1][348][11] main cycle terminated, system quit normally.
[2020-02-18 16:04:35.993][Trace][1][348] start wait for 2300ms
[2020-02-18 16:04:36.993][Trace][1][348] wait for 1 conns to quit

If you don't need to wait that long, you can also configure a shorter duration, maybe a few hundred milliseconds should be enough.

Solution: Configure grace_start_wait to wait for a certain period of time when starting smooth exit.

Wiki please refer to: https://github.com/ossrs/srs/wiki/v4_CN_K8s#srs-cluster-canary-release

TRANS_BY_GPT3

Winlin · Answer 11 · Sun Feb 23 2020 10:18:15 GMT+0800 (China Standard Time)

The configuration of SRS is stored in ConfigMap. After the configuration is changed, SRS needs to reload and load the configuration, which involves how K8S notifies the relevant SRS. For more details, please refer to #1635.

Solution: SRS automatically detects changes in the configuration file and reloads it. Two new configurations, inotify_auto_reload and auto_reload_for_docker, have been added.

TRANS_BY_GPT3

Winlin · Answer 12 · Thu Feb 27 2020 15:25:56 GMT+0800 (China Standard Time)

The resources of K8S include CPU and Memory, and the new version also has extended resources. The definition and consumption of resources are the standards for accurately evaluating water levels and scaling up or down. This may require SRS to do more work, please refer to Resource.

TRANS_BY_GPT3