# 我的解题之路: 容器子进程

一个据说是遗留很久的问题，被转到我这里了，只有以下描述

> *排查 docker 中，主进程接受kill sigterm信号 超过10s 强杀，且子进程无法接受 sigterm 信号问题，导致子进程无法优雅退出（方案：tini/自研，需要调研），该问题可能是导致组件重启数据丢失问题*

没有复现的场景，看起来还得自己造，该问题**可能**会导致数据丢失，说明还只是猜测，调研了tini是可能的解决方案，但是问了相关同事，似乎不行……所以各种似是而非，看似难解的问题又一次到了我这里

先来复现一下这个场景，条件如下

1. 容器中有多个进程，此处可用脚本启动应用进程
    
2. 10s后会强杀
    

先来构造示例程序，如下所示，目录结构如下

```
├── Dockerfile
├── go.mod
├── main.go
└── test.sh
```

main.go

```
package main
​
import (
  "context"
  "log"
  "net/http"
  "os/signal"
  "syscall"
)
​
func main() {
  srv := http.Server{
    Addr: ":8000",
    Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      w.Write([]byte("hello world"))
    }),
  }
​
  ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM)
  defer stop()
  go func() {
    log.Println("Server is running")
    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
      log.Fatal(err)
    }
  }()
​
  for {
    select {
    case <-ctx.Done():
      log.Println("received SIGTERM")
      srv.Shutdown(ctx)
      return
    }
  }
}
​
```

Dockerfile

```
FROM golang:1.17 as builder
​
WORKDIR /workspace
​
ADD test.sh .
RUN chmod +x test.sh
​
COPY go.mod go.mod
​
RUN go env -w GO111MODULE=on && \
    go env -w GOPROXY=https://goproxy.cn,direct && \
    go mod download
​
COPY main.go main.go
​
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o app main.go
​
FROM alpine
​
COPY --from=builder /workspace/app /app
COPY --from=builder /workspace/test.sh /test.sh
​
ENTRYPOINT ["/bin/sh", "/test.sh"]
```

[test.sh](http://test.sh)

```
#!/bin/sh
./app
```

docker run ，然后再docker stop

```
❯ docker ps
CONTAINER ID   IMAGE     COMMAND              CREATED          STATUS          PORTS     NAMES
9398c6c38f55   bt        "/bin/sh /test.sh"   37 seconds ago   Up 36 seconds             mystifying_elbakyan
​
❯ docker stop 9398c6c38f55
9398c6c38f55
​
 took 10s
❯
```

不多不少，刚好10s，对上了，再来观察下docker events

```
❯ docker events
2023-06-07T22:54:12.597708119+08:00 container kill 9398c6c38f55a7f85285453b7743ba198919f66508c11c9951b38eeda530b9d4 (image=bt, name=mystifying_elbakyan, signal=15)
2023-06-07T22:54:22.620442554+08:00 container kill 9398c6c38f55a7f85285453b7743ba198919f66508c11c9951b38eeda530b9d4 (image=bt, name=mystifying_elbakyan, signal=9)
2023-06-07T22:54:22.900211893+08:00 network disconnect be46b9b50890526a33927cb6b14776ec1c6b34721e56bbe5cd7776f4698e17bd (container=9398c6c38f55a7f85285453b7743ba198919f66508c11c9951b38eeda530b9d4, name=bridge, type=bridge)
2023-06-07T22:54:22.910430519+08:00 container stop 9398c6c38f55a7f85285453b7743ba198919f66508c11c9951b38eeda530b9d4 (image=bt, name=mystifying_elbakyan)
2023-06-07T22:54:22.914132154+08:00 container die 9398c6c38f55a7f85285453b7743ba198919f66508c11c9951b38eeda530b9d4 (exitCode=137, image=bt, name=mystifying_elbakyan)
```

---

## **使用tini**

### tini是什么

> [*tini 容器init*](https://github.com/krallin/tini) *是一个最小化的* `init` 系统，运行在容器内部，用于启动一个子进程，并等待进程退出时清理僵尸和执行信号转发

变更后的Dockefile

```
FROM alpine
​
COPY --from=builder /workspace/app /app
COPY --from=builder /workspace/test.sh /test.sh
​
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
​
CMD ["/test.sh"]
```

执行

```
❯ docker ps
CONTAINER ID   IMAGE     COMMAND                   CREATED          STATUS          PORTS     NAMES
a9d595a31a26   bt1       "/sbin/tini -- /test…"   12 seconds ago   Up 11 seconds             brave_lamarr
​
❯ docker stop a9d595a31a26
a9d595a31a26
```

对应的docker events

```
2023-06-07T23:13:27.961297294+08:00 container kill a9d595a31a26e6273061167b47cf66c44bf7b8c872906a756d05ef0968abb3f8 (image=bt1, name=brave_lamarr, signal=15)
2023-06-07T23:13:28.231211949+08:00 network disconnect be46b9b50890526a33927cb6b14776ec1c6b34721e56bbe5cd7776f4698e17bd (container=a9d595a31a26e6273061167b47cf66c44bf7b8c872906a756d05ef0968abb3f8, name=bridge, type=bridge)
2023-06-07T23:13:28.241647255+08:00 container stop a9d595a31a26e6273061167b47cf66c44bf7b8c872906a756d05ef0968abb3f8 (image=bt1, name=brave_lamarr)
2023-06-07T23:13:28.245237103+08:00 container die a9d595a31a26e6273061167b47cf66c44bf7b8c872906a756d05ef0968abb3f8 (exitCode=143, image=bt1, name=brave_lamarr)
```

然后标准输出并没有打印我想要的`received SIGTERM` 这说明SIGTERM可能并没有传递到`app`应用中

### dumb-init

> *A minimal init system for Linux containers*

显然，它是tini的竞争对手，先实验一把再说

dockerfile

```
FROM alpine
​
COPY --from=builder /workspace/app /app
COPY --from=builder /workspace/test.sh /test.sh
​
​
RUN wget -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.5/dumb-init_1.2.5_x86_64
RUN chmod +x /usr/local/bin/dumb-init
​
ENTRYPOINT ["/usr/local/bin/dumb-init", "--"]
​
CMD ["/test.sh"]
```

执行

```
❯ docker ps
CONTAINER ID   IMAGE     COMMAND                   CREATED          STATUS         PORTS     NAMES
a8013b4f5450   bt2       "/usr/local/bin/dumb…"   10 seconds ago   Up 9 seconds             agitated_tesla
​
❯ docker stop a8013b4f5450
a8013b4f5450
```

docker events事件

```
2023-06-07T23:19:03.818896256+08:00 container kill a8013b4f5450b650bab01b37dce0f91da528bf051909de7a06edab03b4267b2f (image=bt2, name=agitated_tesla, signal=15)
2023-06-07T23:19:04.126668505+08:00 network disconnect be46b9b50890526a33927cb6b14776ec1c6b34721e56bbe5cd7776f4698e17bd (container=a8013b4f5450b650bab01b37dce0f91da528bf051909de7a06edab03b4267b2f, name=bridge, type=bridge)
2023-06-07T23:19:04.136470132+08:00 container stop a8013b4f5450b650bab01b37dce0f91da528bf051909de7a06edab03b4267b2f (image=bt2, name=agitated_tesla)
2023-06-07T23:19:04.138676021+08:00 container die a8013b4f5450b650bab01b37dce0f91da528bf051909de7a06edab03b4267b2f (exitCode=143, image=bt2, name=agitated_tesla)
```

标准输出

```
2023/06/07 15:18:47 Server is running
2023/06/07 15:19:03 received SIGTERM
```

一切都对应上了，从结果看dumb-init满足了我的需求。

## **总结**

从我目前的知识储备来看，盲点还是挺多的，比如：为何docker中的目标进程不是1号进程？为何docker中有多进程管理的需求……然而，我明白这些都是真实存在的场景，有的是为了容器化而容器化，限于笔者的眼界，显得比较奇怪而已。

另一方面，比何市面上看到的容器化多进程管理工具，并不完全如描述的那样可行，难道是我实验的姿势不对？这点也值得进一步思考，我觉得这是一个未完的话题，理应有后续才行，

参考：

* [实验理解 K8S 滚动更新时如何实现零宕机](https://mp.weixin.qq.com/s/CCQdYZWWxT8IPjbq8KaHxQ)
    

* [tinit](https://github.com/krallin/tini)
    
* [dumb-init](https://github.com/Yelp/dumb-init)
    

* [继续爆论容器中的一号进程](https://www.manjusaka.blog/posts/2021/02/27/damn-the-init-process/)
    
* [docker\_tinit](https://cloud-atlas.readthedocs.io/zh_CN/latest/docker/init/docker_tini.html)
