JupyterHub

来自 Arch Linux 中文维基

JupyterHub 是用于 Jupyter 笔记本的多用户 Web 服务器。它由四个子系统组成:

  1. 主集线器(hub)进程。
  2. 对用户进行身份验证的身份验证器
  3. 生成器可为每个已连接的用户启动并监控单用户服务器。
  4. 一个 HTTP 代理,用于接收传入请求并将其路由到集线器或相应的单用户服务器。

有关详细信息,请参阅 JupyterHub 文档中的技术概述

安装[编辑 | 编辑源代码]

安装 jupyterhubAUR 软件包。 在大多数情况下,您还需要安装 jupyter-notebook 包(一些更高级的生成器可能不需要它)。还可以安装 jupyterlab 软件包以使 JupyterLab 接口可用。

运行[编辑 | 编辑源代码]

启动/启用 jupyterhub.service 。使用默认配置,您可以通过在浏览器中转到 127.0.0.1:8000 来访问集线器。

配置[编辑 | 编辑源代码]

JupyterHub 配置文件位于 /etc/jupyterhub/jupyterhub_config.py 。这是一个修改配置对象 c 的 Python 脚本。软件包提供的配置文件显示可用的配置选项及其默认值。

配置中的任何相对路径都是从运行集线器的工作目录开始解析的。软件包提供的 systemd 服务用 /etc/jupyterhub 作为工作目录。这意味着,例如,默认数据库 URL c.JupyterHub.db_url = 'sqlite:///jupyterhub.sqlite' 对应于文件 /etc/jupyterhub/jupyterhub.sqlite

所有配置选项都可以在命令行中覆盖。例如,配置文件中的 c.Application.show_config = True 设置可以用命令行标志 --Application.show_config=True 代替。请注意,所提供的 systemd 服务会使用命令行将 c.JupyterHub.pid_filec.ConfigurableHTTPProxy.pid_file 明确设置到合适的运行时目录,因此配置文件中的任何值都会被忽略。

身份验证器[编辑 | 编辑源代码]

身份验证器控制对集线器和单用户服务器的访问。文档的身份验证器部分包含有关身份验证器如何工作以及如何编写自定义身份验证器的详细信息。身份验证器 wiki 页面包含身份验证器列表;其中一些是打包的,如下所述。

请注意,用户状态存储在 cookie 中,由 cookie 密钥加密。如果切换到其他身份验证器,或修改所选身份验证器的设置,导致允许的用户列表可能发生变化,则应更改 Cookie 密钥。这将注销所有当前用户,并强制他们使用新设置重新进行身份验证。这可以通过删除 cookie 密钥文件并重新启动集线器来执行,该中心将自动生成新密钥。在默认配置中,cookie 密钥存储在 /etc/jupyterhub/jupyterhub_cookie_secret

PAM 身份验证器[编辑 | 编辑源代码]

PAM 身份验证器使用 PAM 允许本地用户登录集线器。它包含在 JupyterHub 中,是默认的身份验证器。使用它要求集线器拥有 /etc/shadow(包含用户密码的散列版本)的读取权限,以便对用户进行身份验证。默认情况下,/etc/shadow 由 root 拥有,文件权限-rw------,因此以 root 身份运行集线器将满足这一要求。一些资料主张删除 /etc/shadow 中的所有权限,使其无法被受损的守护进程读取,并授予需要访问的进程 DAC_OVERRIDE 功能。如果你的 /etc/shadow 是这样设置的,请为服务创建一个插入文件,将此功能授予 JupyterHub:

/etc/systemd/system/jupyterhub.service.d/override.conf
[Service]
CapabilityBoundingSet=CAP_DAC_OVERRIDE

The PAM authenticator relies on the Python package pamela. For basic troubleshooting this can be tested on the commandline. To attempt authentication as user testuser, run the following command:

# python -m pamela -a testuser

(If you run JupyterHub as a non-root user, run the command as that user instead of root). If the authentication succeeds, no output will be printed. If it failed an error message will be printed.

PAM authentication as non-root user[编辑 | 编辑源代码]

If you run JupyterHub as a non-root user, you will need to give that user read permissions to the shadow file. The method recommended by the JupyterHub documentation is to create a shadow group, make the shadow file readable by this group, and add the JupyterHub user to this group.

警告: This allows read-only access to the hashed passwords in /etc/shadow to anybody running code as the JupyterHub user. Note that each single-user server is run under their own account and so code executed in those servers will not have access. Also note that a security exploit in JupyterHub would allow the same access to the hashed passwords if JupyterHub was being run as root.

Creating the group, modifying the shadow file permissions and adding the user jupyterhub to the group can be accomplished with the following four commands:

# groupadd shadow
# chgrp shadow /etc/shadow
# chmod g+r /etc/shadow
# usermod -aG shadow jupyterhub

Spawners[编辑 | 编辑源代码]

Spawners are responsible for starting and monitoring each user's notebook server. The spawners section of the documentation contains more details about how they work and how to write a custom spawner. The spawners wiki page has a list of spawners; some of these are packaged and are described below.

LocalProcessSpawner[编辑 | 编辑源代码]

This is the default spawner included with JupyterHub. It runs each single-user server in a separate local process under their user account (this means each JupyterHub user must correspond to a local user account). It also requires JupyterHub to be run as root so it can spawn the processes under the different user accounts. The jupyter-notebook package must be installed for this spawner to work.

SudoSpawner[编辑 | 编辑源代码]

The SudoSpawner uses an intermediate process created with sudo to spawn the single-user servers. This allows the JupyterHub process to be run as a non-root user. To use it install the jupyterhub-sudospawnerAUR package.

To use it, create a system user account (the following assumes the account is named jupyterhub) and a group whose membership will define which users can access the hub (here assumed to be called jupyterhub-users). First, we have to configure sudo to allow the jupyterhub user to spawn a server without a password. Create a drop-in sudo configuration file with visudo:

# visudo -f /etc/sudoers.d/jupyterhub-sudospawner
# The command the hub is allowed to run.
Cmnd_Alias SUDOSPAWNER_CMD = /usr/bin/sudospawner

# Allow the jupyterhub user to run this command on behalf of anybody
# in the jupyterhub-users group.
jupyterhub ALL=(%jupyterhub-users) NOPASSWD:SUDOSPAWNER_CMD

The default service file runs the hub as root. It also applies a number of hardening options to the service to restrict its capabilities. This hardening prevents sudo from working; to allow it, the NoNewPrivileges service option (plus any other options which implicitly set it, see systemd.exec(5) for a list of service options) needs to be off. Create a drop-in file to run the hub using the jupyterhub user instead:

/etc/systemd/system/jupyterhub.service.d/override.conf
[Service]
User=jupyterhub
Group=jupyterhub

# Required for sudo.
NoNewPrivileges=false

# Setting the following would implicitly set NoNewPrivileges.
PrivateDevices=false
ProtectKernelTunables=false
ProtectKernelModules=false
LockPersonality=false
RestrictRealtime=false
RestrictSUIDGID=false
SystemCallFilter=
SystemCallArchitectures=

If you have previously run the hub as the root user, you will need to change the ownership of the user database and cookie secret files:

# chown jupyterhub:jupyterhub /etc/jupyterhub/{jupyterhub_cookie_secret,jupyterhub.sqlite}

If you are using the PAMAuthenticator, you will need to configure your system to allow it to work as a non-root user.

Finally, edit the JupyterHub configuration and change the spawner class to SudoSpawner:

/etc/jupyterhub/jupyterhub_config.py
c.JupyterHub.spawner_class='sudospawner.SudoSpawner'

To give a user access to the hub, add them to the jupyterhub-users group:

# usermod -aG jupyterhub-users <username>

systemdspawner[编辑 | 编辑源代码]

The systemdspawner uses systemd to manage each user's notebook which allows configuring resource limitations, better process isolation and sandboxing, and dynamically allocated users. To use it install the jupyterhub-systemdspawnerAUR package and set the spawner class in the configuration file:

/etc/jupyterhub/jupyterhub_config.py
c.JupyterHub.spawner_class = 'systemdspawner.SystemdSpawner'

Note that as per systemdspawner's readme using it currently requires JupyterHub to be run as root.

Services[编辑 | 编辑源代码]

A JupyterHub service is defined as a process which interacts with the Hub through its API. Services can either be run by the hub or as standalone processes.

Idle culler[编辑 | 编辑源代码]

The idle culler service can be used to automatically shut down idle single-user servers. To use it, install the jupyterhub-idle-cullerAUR package. To run the service through the hub, add a service description to the c.JupyterHub.services configuration variable:

/etc/jupyterhub/jupyterhub_config.py
import sys
c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--timeout=3600'
        ],
    }
]

See the service documentation or the output of python -m jupyterhub_idle_culler --help for a description of command-line options and details of how to run the service as a standalone process.

Tips and Tricks[编辑 | 编辑源代码]

Running as non-root user[编辑 | 编辑源代码]

By default, the main hub process is run as the root user (the individual user servers are run under the corresponding local user as set by the spawner). To run as a non-root user, you need to use the SudoSpawner (the other spawners listed above require running as root). If you are using the PAM authenticator, you will also need to configure it for a non-root user.

Using a reverse proxy[编辑 | 编辑源代码]

A reverse proxy can be used to redirect external requests to the JupyterHub instance. This can be useful if you want to serve multiple sites from one machine, or use an existing server to handle SSL. The using a reverse proxy section of the JupyterHub documentation has example configuration for using either nginx or Apache as a reverse proxy.

注意: This does not replace the proxy component of JupyterHub which is responsible for routing requests to either the main hub or the single-user servers. Rather, the reverse proxy passes external requests to the JupyterHub proxy.

Proxy other web services[编辑 | 编辑源代码]

The Jupyter Server Proxy extension allows you to run other web services such as Code Server or RStudio alongside JupyterHub and provide authenticated web access to them. To use it, install python-jupyter-server-proxyAUR and configure it with the /etc/jupyter/jupyter_notebook_config.py file. For instance, to proxy code-serverAUR:

/etc/jupyter/jupyter_notebook_config.py
c.ServerProxy.servers = {
  'code-server': {
    'command': [
      'code-server',
        '--auth=none',
        '--disable-telemetry',
        '--disable-update-check',
        '--bind-addr=localhost:{port}',
        '--user-data-dir=.config/Code - OSS/',
        '--extensions-dir=.vscode-oss/extensions/'
    ],
    'timeout': 20,
    'launcher_entry': {
      'title': 'VS Code'
    }
  }
}

See the documentation for more details about configuring the Jupyter Server Proxy.

Access to GPUs[编辑 | 编辑源代码]

If you receive errors when accessing GPUs (for instance, if nvidia-smi reports it cannot communicate with the NVIDIA driver), you must consider the hardening that is shipped with the JupyterHub systemd unit file. To allow access to GPUs (and other hardware) broadly, you can add this to a drop-in file:

/etc/systemd/system/jupyterhub.service.d/override.conf
[Service]
PrivateDevices=false