External server infrastructure

A AETROS server is a training job scheduler you can run on every machine you want. You create in AETROS Trainer a new server and use our "server" cli command in aetros-cli to connect your machine to AETROS.

If you have own server infrastructure, you can simply connect all your servers with AETROS Trainer using aetros-cli. You can also use this little command to start a job scheduler on your local machine, so you can use hyperparameter optimizations and don't have to start jobs manually in your console.

Your benefits

  1. Train on remote hardware
  2. Monitor server utilization (CPU, RAM, bandwidth, disk, processes, running jobs)
  3. Start jobs through the interface
  4. Start fully automated hyperparameter optimizations
  5. Run builds (continuous integration)

The server itself checks out always the newest commit of your Git repository and executes your defined python script on each training/job.

Step 1: Create server

To do so, open AETROS Trainer, click "Servers" on the left hand side and click at the bottom left "CREATE SERVER". Please enter now a unique name. After clicking on "CREATE" the newly created server is visible. You see right in the middle of the screen a command you need to enter on your actual server in order to connect your server with AETROS.

Please make sure that you have installed all required libraries for your script to run on this server and aetros-cli. How to install aetros-cli.

Step 2: Connect server

Make sure you have installed aetros-cli first.
It's important to keep your secret-key in --secret-key=YOUR_KEY private. After creating the server you can simply execute following command on your server.

aetros server --secure-key=SERVER_KEY
Connected to aetros.com as username/servername

Replace SERVER_KEY with your actual server key you see when you open the server in AETROS Trainer. You should see now your server as online in AETROS Trainer.

It is worth noting that all training jobs are running under the python binary you started the server. You can change the python interpreter by using /my/python2.7 -m aetros server --secure-key=SERVER_KEY.
You need to install on this server all software required to be able to let your python script run correctly (e.g. Theano, Tensorflow, Numpy).

Step 3: Daemonize (optional)

To connect your server automatically to AETROS on bootstrap and to make sure the script restarts automatically when its crashes, you can use supervisord.
Here's a short introduction how to install and configure supervisord with aetros-cli.

3.1 Installation

sudo apt-get install supervisor

3.2 Add aetros-cli to supervisor

First check which full path aetros-cli has by running following command:

which aetros

So, the command path is /usr/local/bin/aetros. We need to use this full path in our supervisor configuration file.

[program:long_script] command=/usr/local/bin/aetros server --secure-key=SERVER_KEY autostart=true autorestart=true stderr_logfile=/var/log/aetros-server.err.log stdout_logfile=/var/log/aetros-server.out.log

Replace SERVER_KEY with your actual server key you see when you open the server in AETROS Trainer. Make sure this key stays private.

3.2 Refresh supervisor

Tell supervisor to reread all of its configurations.

supervisorctl reread

aetros server should now immediately be executed. You find more information in the article How To Install and Manage Supervisor on Ubuntu and Debian VPS.