Jupyter notes

Jupyter notebook (and lab)

Remote instance and login

Sometimes it is useful to open a Jupyter instance on the computer cluster but open it on the local machine, which saves having to download the data onto the machine and eating up space (and there is possibility to do parallelised data processing through DASK for example). The way below is some overall instructions; details will differ depending on firewall configurations and machine details.

  1. With anaconda, miniconda or similar, make a virtual environment and install jupyter and other packages accordingly.

  2. SSH or similar onto the remote machine, and submit a job that opens that virtual environment, trigger jupyter with some specified port (normally 8888, but you can choose), and keep it open. See below for a sample submit script (in my case the remote machine is called hpc3 and the environment is called py311).

  3. The job should be running on some node(s). Still on the remote machine, query the IP address of the master node via nslookup hhnode-ib-21 or dig +short hhnode-ib-21.local (replace hhnode-ib-21 with whatever the node name is). Also query the security token that gets generated for use later; it should be in the output (in my case it’s in stdouterr_${job_number}).

  4. Go back to the local machine, and open a SSH tunnel to that IP address with

ssh -N -f -L ${LOCAL_PORT}:${IP_ADDRESS}:${REMOTE_PORT} ${USER}@${CLUSTER}

So as an example, ssh -N -f -L 4167:10.1.2.126:4167 jclmak@hpc3.ust.hk means I bind the remote port 4167 at the IP address 10.1.2.126 to my local machine’s port 4167 using my appropriate credentials on the cluster.

Note

I use port 4167 because if I have a local instance of jupyter that usually by default opens at 8888 and lead to a clash. I could have done ssh -N -f -L 4167:10.1.2.126:8888 or similar to avoid the clash I suppose.

  1. Use lsof -i :{$LOCAL_PORT} to see if the port is open. If there are already ones open, you might want to kill those to avoid clashing.

  2. Open a browser and enter http://localhost:4167 (or whatever you decided to substitute ${LOCAL_PORT} for. If all goes well then a jupyter instance will open but ask you for a token; enter the token from above in and you should have a jupyter instance working on the external machine but controlled on your the local machine.

Note

The above works ok for me whether my laptop firewall is on or not. However, I seem to need to enter the token manually, while for my post-doc she could do something like http://hhnode-ib-21:8888/?token=WHATEVER and get on directly. Not sure what the deal is.

Note

Could probably do something similar to open a port for VScode or similar. In that case just open the port into the node probably but don’t run the Jupyter notebook commands through the SLURM script.

Sample submit script for a SLURM system:

#SBATCH -o stdouterr_%j # output and error file name
#SBATCH -n 1            # total number of mpi tasks requested
#SBATCH -N 1            # total number of nodes requested
#SBATCH -p hpc3oces-cpu # queue (partition) -- standard, development, etc.
#SBATCH -t 12:00:00     # maximum runtime

# Setup runtime environment if necessary
# module load anaconda3  # commented here because I use a separate miniconda manager

# just make sure the environment really is off otherwise it seems to fail to load the environment here
python --version
source /home/jclmak/miniconda3/bin/activate
source /home/jclmak/miniconda3/bin/activate py311
python --version

### Set Tunneling information
# 1. Get the specific hostname
node=$(hostname)

# 2. Get the IP address directly from the node
# 'hostname -I' lists all IPs; awk takes the first one (usually the primary LAN IP)
node_ip=$(hostname -I | awk '{print $1}')

# 3. then get cluster stuff
user=$(whoami)
cluster="hpc4.ust.hk"
port=4167   # provide a specific port to avoid possible clashing

# Print tunneling instructions
echo -e "

# Command to create SSH tunnel:
ssh -N -f -L ${port}:IP_TO_FILL:${port} ${user}@${cluster}

# Use a browser on your local machine to go to:
http://localhost:${port}/

"

jupyter-notebook --no-browser --ip=${node} --port=${port}

# keep job alive so it can be tunneled in
sleep 36000