Valohai offers remote access to a live execution with SSH (Secure Shell). It is a low-level and agnostic protocol, which makes it usable for a wide array of tasks.
Common use-cases
- Inspecting the execution using an interactive terminal
- Connect your favorite IDE debugger like VSCode or PyCharm
- Low-latency integration with 3rd party tools like Tensorboard
Prerequisites:
- Remote access is available for enterprise users who are using on-premises, AWS, or GCP environments.
- Your organization admin needs to enable SSH connection to the workers and edit the firewall rules in your cloud provider.
Configure SSH Access for your organization
Your organization admin will need to configure the organization-wide settings and firewall rules before you can use remote SSH connections.
Define the default SSH port
Define a default port for SSH connections in your organization
- Navigate to
Hi, <name> (the top right menu) > Manage <organization>
- Go to Settings under the organization controls
- Set a Default Debug Port for your organization. Note, the value must be above 1023
Allow connections on the selected port
You’ll need to edit the firewall rules in your cloud to allow users to connect to the workers on the defined port.
AWS
In AWS open the Security Group valohai-sg-workers
and click Edit Inbound rules to add a new inbound Custom TCP rule:
- Type: Custom TCP
- Port range: The port number you specified in your Valohai organization’s settings.
- Source: Depending on your organization settings you can either set Source as 0.0.0.0/0 to allow connections from anywhere or whitelist certain IP ranges/source tags
- Description: Allows connecting to Valohai jobs over SSH
Setting the source as 0.0.0.0/0 means that inbound connections will be allowed from all addresses.. However, you’ll still need the SSH Private Key (generated below) in order to authenticate and successfully connect.
GCP
In GCP create a new firewall rule:
- Name:
valohai-fr-worker-ssh
- Description: Allows connecting to Valohai jobs over SSH
- Network: The network where your Valohai resources are created (e.g.
valohai-vpc
) - Direction: Ingress
- Targets: Specified target tags:
valohai-worker
- Source: Depending on the organization settings you can either set Source as 0.0.0.0/0 to allow connections from anywhere or whitelist certain IP ranges/source tags.
- Specified protocols and ports: * TCP: with the port number you specified in your Valohai organization’s settings.
Setting the source as 0.0.0.0/0 means that inbound connections will be allowed from all addresses. However, you’ll still need the SSH Private Key (generated below) in order to authenticate and successfully connect.
Create an SSH keypair (optional)
An SSH key pair is required for securing the connection. You may re-use an existing keypair, but please be mindful of regenerating it periodically according to your security standards. Note that the debug connection we are establishing here is to the Docker container only and should not be the same one you might be using to access the server where the container runs.
Valohai can create the keypair automatically for you when starting the execution from the UI but if you want to create the keys yourself, follow the instructions here.
Use ssh-keygen
to create a new SSH key pair.
ssh-keygen -t rsa -b 4096 -N '' -f my-debug-key
This will generate two files:
my-debug-key.pub
is the public key you paste into UI before starting an execution.my-debug-key
is the private key you need for connecting to the execution.
You should not include these keys in the version control. Anybody that gains access to the
valohai-debug-key
file contents will have access to your execution, so use appropriate caution.Start an execution
You can start a job either from the command-line or from the web application.
Valohai CLI
valohai-cli
version 0.17.0 or higher to run executions with debugging enabled from the CLI. Update your version by running pip install --upgrade valohai-cli
on your own machine.Start a Valohai execution with extra parameters debug-key-file
for your public key file and debug-port
for the port you have open for the debug connections.
vh exec run --adhoc --debug-key-file=/tmp/remote-debug-key.pub --debug-port 2222 train
Valohai GUI
Start a Valohai execution with the “Run with SSH” enabled.
If you created the keypair yourself, Copy-paste the entire contents of the my-debug-key.pub file into the text field. Alternatively, you can click on the Generate new SSH key button and use the generate keys. Make sure to download and store the private key in a secure location. Never include the keys in your version control! Finally, change the TCP/IP port if your network setup requires it.
Wait for an IP address
You need to start the Valohai execution before you can connect to it. Valohai will either run the execution on an existing virtual machine or create a new instance. Each machine has its own IP which is allocated by the cloud provider (e.g. AWS, GCP, Azure). You’ll need the IP in order to SSH into the execution.
Wait for the execution to start and watch for the first log events. Look for (something like) this:
You can now add the path to your private key and connect:
ssh -i 52.214.159.193 -p 2222 -t /bin/bash
Open an SSH connection
Now depending on what your use-case, you may want to do one of these things:
- Run a single remote command
- Open an interactive shell
- Open an SSH tunnel
Run a single command
This will execute the command and return the results to your terminal.
# template
ssh -i -p -t <command>
# example
ssh -i /home/johndoe/.ssh/my-debug-key 52.214.159.193 -p 2222 -t ps aux
Open an interactive shell
Allows you to connect to the execution and run commands directly inside the Docker container that’s running your execution.
# template
ssh -i -p -t /bin/bash
# example
ssh -i /home/johndoe/.ssh/my-debug-key 52.214.159.193 -p 2222 -t /bin/bash
Open an SSH tunnel
# template
ssh -i -p -t -L::
# example
ssh -i ~/.ssh/remote-debug-key 34.245.207.101 -p 2222 -t -L5678:127.0.0.1:5678
How to keep the execution running?
You execution is designed to start, compute, and shut down on errors. When debugging, we want to keep the execution running even if it fails.
The safest way is to add a sleep command at the end of the execution.
python train.py {parameters}
sleep 1h/code>
This way, the execution will wait for an hour and then shut down. It is better to set a reasonable time limit instead of an infinite uptime to avoid costly mistakes.
Limitations
It is essential to understand that the SSH connection is not directly to the worker operating system.
We are opening remote access to the docker container running within that host operating system. It means that the Valohai platform internals and the rest of the host operating system are not available for inspection
Comments
0 comments
Please sign in to leave a comment.