Python / Anaconda notes

At some point I encountered some problem with plotting data in MATLAB (to do with the tripolar grid meaning the co-ordinate files were not monotonic so MATLAB hated it), and I went over to Python because the Cartopy and Iris packages lets me do data projection and plotting in different projects fairly easily. Here are some notes for Python and Anaconda which may be useful (the latter might be useful for getting the libraries that NEMO and XIOS need).

Anaconda

Most of these are taken from the official conda manual The installation for conda (or the lighter version miniconda) is somewhat dependent on the OS and the instructions are here You end up downloading a bash file that you run in the terminal, and from there you can accept and change some of the settings accordingly. No administrator rights should be required, though it does mean the installed packages may not be shareable. The installation will ask if you want to add to your $PATH variable, which I accepted (it means the some of the anaconda based binaries take precedence over the system ones).

One conda is installed, I would recommend creating an environment so that if damage is to occur, it is only within the environment which may be deleted easily without touching other things. The creation, entering and leaving of the environment is done by:

>> julian@psyduck:~/$ conda create -n nemo python=3.6
...
>> julian@psyduck:~/$
>> julian@psyduck:~/$ source activate nemo
>> (nemo) julian@psyduck:~/$
>> (nemo) julian@psyduck:~/$ source deactivate
>> julian@psyduck:~/$

The first command creates and environment called nemo that uses python 3.6, and the other commands are self explanatory. An environment may be removed by issuing the command

conda remove --name nemo --all

Packages are installed through (make sure you are in an environment first)

conda install netcdf
conda install -c conda-forge netcdf-fortran

Some packages need to be searched for in the forge.

Note that while the environment is active some commands take precedence over others, and a bit of care is needed to make sure the ones you intend to call really are the ones that are called (e.g. my mercurial command hg seems to be overwritten on my machine when I am in my environment). Check with things like which python for example which shows which binary the command python is actually calling.

Python

I mostly develop code in a notebook because I am too heavily influenced by MATLAB. Notebooks (in particular with Jupyter) lets you write code within cells that you run and see outputs then and there which is what I am used to. Later on I do write code in a text editor when I have more specific things I want need to do.

I normally do the following to get what I need. Within the environment:

conda install scipy
conda install numpy
conda install matplotlib
conda install jupyter
conda install -c conda-forge cartopy
conda install -c conda-forge iris

I normally install NetCDF as well. Numpy and scipy gives the number crunching stuff I normally need. Matplotlib gives most of the plotting capabilities. Cartopy and iris are the map and projection packages, and jupyter is the notebook stuff. To trigger the notebook, I normally do from a terminal

jupyter notebook 2>/dev/null &

just to suppress the terminal outputs. The notebook opens in a browser and you do coding in there (I think there is another software that lets you open and edit notebooks somewhere else though I’ve never used it); it’s basically ipython but in a browser. Note that just closing the tabs does not necessarily close the notebook; you need to do files>>close and halt. Also, just because the relevant pages are closed in the browser does not mean the notebook server is shutdown either; you need to click logout on the top right corner (assuming you are not using a custom theme which suppresses that). To kill it in the terminal, either find the job through jobs and use kill %n or do

jupyter notebook list
>> Currently running servers:
>> http://localhost:8888/?token=7774a1ace4c2a0a1e098a5900f30c67310074a7250bd6c0d :: /home/julian/GitRepo/pydra/wrapper
>> http://localhost:8889/?token=00b793728b03e2536b5a07a793bbd2a9fc1342469f3cf28d :: /home/julian/Documents/NEMO

jupyter notebook stop 8888
jupyter notebook list
>> Currently running servers:
>> http://localhost:8889/?token=00b793728b03e2536b5a07a793bbd2a9fc1342469f3cf28d :: /home/julian/Documents/NEMO

Some Python banana skins

The big banana skin with Python to watch out for is that indexing starts at 0 (rather than 1 in MATLAB), and index slicing normally omits the last entry, e.g.

x_vec = [1, 2, 3, 4, 5, 6]
x_vec[0:-1]
>> [1, 2, 3, 4, 5]
x_vec[1:4]
>> [2, 3, 4]
x_vec[0::]
>> [1, 2, 3, 4, 5, 6]
x_vec[-1]
>> 6
x_vec[-2]
>> 5

Contrast this to MATLAB which would be

x_vec = [1, 2, 3, 4, 5, 6]
x_vec(0:end-1)
>> 1, 2, 3, 4, 5
x_vec(2:4)
>> 2, 3, 4
x_vec(:)
>> 1, 2, 3, 4, 5, 6
x_vec(end)
>> 6
x_vec(end - 1)
>> 5

Another banana skin with python is that data is not necessarily copied when defining new variables. For example:

x_vec = [1, 2, 3, 4, 5, 6]
y_vec = x_vec
y_vec[0] = 2
y_vec
>> [2, 2, 3, 4, 5, 6]
x_vec
>> [2, 2, 3, 4, 5, 6]

This is especially dangerous if you, like me, do the following in MATLAB:

x_vec = zeros(6)
y_vec = x_vec
z_vec = x_vec

If you really mean to do a copy, do the following:

from copy import deepcopy
x_vec = [1, 2, 3, 4, 5, 6]
y_vec = x_vec
z_vec = deepcopy(x_vec)
y_vec[0] = 2
y_vec
>> [2, 2, 3, 4, 5, 6]
x_vec
>> [2, 2, 3, 4, 5, 6]
z_vec
>> [1, 2, 3, 4, 5, 6]

Python is really slow with loops, so the more vectorising commands you can use, the better! If you have routines that you have to use loops in (e.g. transformation of data from Cartesian co-ordinates to density co-ordinates through binning into density bins), then consider using cypthon (write code in C but call it through Python), f2py (same but for Fortran), or numba/JIT (compile and run loops, usually on the order of 200 speed up; restricted to fairly low level commands).