华为欧拉(openeuler)系统安装docker和人工智能计算环境

视频地址:https://www.ixigua.com/7247868403061359165

chatgpt引爆了AIGC,最近涌现了很多很棒的AI项目,其实浏览项目主页你会发现很多项目官方推荐的就是在Linux平台上进行训练、微调、推理。接下来跟大家分享一下在华为欧拉(openeuler)系统上安装docker和人工智能计算环境。目的是想让大家知道我们国产的操作系统是可以完美运行这些主流的技术项目的。

首先参考https://www.toutiao.com/article/7248211950994653731 安装好自己的欧拉服务器,然后ssh登录到自己的欧拉服务器,然后复制下面的命令到ssh中执行即可完成安装:

1、屏蔽内核开源的nvidia驱动

cat << EOF >> /etc/modprobe.d/blacklist-nouveau.conf

blacklist nouveau

options nouveau modeset=0

EOF


2、重新生成内核

备份内核

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

重新生成内核

dracut -v /boot/initramfs-$(uname -r).img $(uname -r)

重启

reboot


3、安装软件包

yum install -y kernel-devel gcc make g++ libglvnd-devel


4、安装nvidia显卡驱动

根据自己的实际的nvidia显卡型号下载驱动

https://www.nvidia.cn/Download/index.aspx?lang=cn

赋予可执行权限

chmod 755 NVIDIA-Linux-x86_64-535.54.03.run

执行安装

./NVIDIA-Linux-x86_64-535.54.03.run --kernel-source-path /usr/src/kernels/5.10.0-136.36.0.112.oe2203sp1.x86_64

重启

reboot


5、安装docker

下载最新的docker软件包

wget https://download.docker.com/linux/static/stable/x86_64/docker-24.0.2.tgz

解压缩软件包

tar -xvzf docker-24.0.2.tgz

拷贝文件

cp docker/* /usr/bin/


生成docker的服务文件

(1)生成docker.service文件

cat << EOF >> /lib/systemd/system/docker.service

[Unit]

Description=Docker Application Container Engine

Documentation=https://docs.docker.com

After=network-online.target docker.socket firewalld.service containerd.service time-set.target

Wants=network-online.target containerd.service

Requires=docker.socket

[Service]

Type=notify

# the default is not to use systemd for cgroups because the delegate issues still

# exists and systemd currently does not support the cgroup feature set required

# for containers run by docker

ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

ExecReload=/bin/kill -s HUP $MAINPID

TimeoutStartSec=0

RestartSec=2

Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.

# Both the old, and new location are accepted by systemd 229 and up, so using the old location

# to make them work for either version of systemd.

StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.

# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make

# this option work for either version of systemd.

StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead

# in the kernel. We recommend using cgroups to do container-local accounting.

LimitNOFILE=infinity

LimitNPROC=infinity

LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.

# Only systemd 226 and above support this option.

TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers

Delegate=yes

# kill only the docker process, not all processes in the cgroup

KillMode=process

OOMScoreAdjust=-500

[Install]

WantedBy=multi-user.target

EOF


(2)生成docker.socket文件

cat << EOF >> /lib/systemd/system/docker.socket

[Unit]

Description=Docker Socket for the API

[Socket]

# If /var/run is not implemented as a symlink to /run, you may need to

# specify ListenStream=/var/run/docker.sock instead.

ListenStream=/run/docker.sock

SocketMode=0660

SocketUser=root

SocketGroup=docker

[Install]

WantedBy=sockets.target

EOF


(3)生成containerd.service文件

cat << EOF >> /lib/systemd/system/containerd.service

# Copyright The containerd Authors.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

[Unit]

Description=containerd container runtime

Documentation=https://containerd.io

After=network.target local-fs.target

[Service]

ExecStartPre=-/sbin/modprobe overlay

ExecStart=/usr/bin/containerd

Type=notify

Delegate=yes

KillMode=process

Restart=always

RestartSec=5

# Having non-zero Limit*s causes performance problems due to accounting overhead

# in the kernel. We recommend using cgroups to do container-local accounting.

LimitNPROC=infinity

LimitCORE=infinity

LimitNOFILE=infinity

# Comment TasksMax if your systemd version does not supports it.

# Only systemd 226 and above support this version.

TasksMax=infinity

OOMScoreAdjust=-999

[Install]

WantedBy=multi-user.target

EOF


(4)创建目录

mkdir -p /etc/docker

(5)生成docker配置文件

cat << EOF >> /etc/docker/daemon.json

{

"registry-mirrors": ["http://hub-mirror.c.163.com"]

}

EOF


(6)创建用户组

groupadd docker

(7)设置开机自动启动

systemctl daemon-reload

systemctl enable --now containerd

systemctl enable --now docker

(8)查看版本信息

docker version

6、安装nvidia容器工具包

生成仓库信息

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

安装容器工具包

yum install -y nvidia-container-toolkit

生成配置文件

nvidia-ctk runtime configure --runtime=docker

重启docker

systemctl restart docker

验证nvidia容器工具包

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

看到上面截图这样的输出,说明docker已经和nvidia容器工具包成功整合了,下一篇文章咱们来分享在欧拉系统上快速部署当下热门的Stalbe Diffusion来实现AI绘画。

展开阅读全文

页面更新:2024-04-25

标签:华为   篇文章   工具包   软件包   人工智能   内核   容器   环境   服务器   文件   项目   系统

1 2 3 4 5

上滑加载更多 ↓
推荐阅读:
友情链接:
更多:

本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828  

© CopyRight 2008-2024 All Rights Reserved. Powered By bs178.com 闽ICP备11008920号-3
闽公网安备35020302034844号

Top