Deep learning experiment
management platform
AETROS helps you to manage your experiments and
computing infrastructure with ease in one place.
Powered by Git and Docker.

Explore all features Request demo

Web dashboard

Major features at a glance

Click on a feature to get more information.

Experiment

  • Monitor
    Monitor all running jobs aka experiments in real-time, watch metrics like loss/accuracy, additional information, hyper-parameters and much more.
  • Track & Compare
    All jobs are tracked in Git as new commit history viewable, sortable and exportable in our interface. You can compare all information of multiple jobs side-by-side.
  • Reproduce
    Since every job aka experiment is tracked in its own Git commit history, you have full transparency and best reproducibility.

Model

  • Debug
    In each experiment, you get valuable insights of your model. With our interactive model debugger, you get very detailed insights of all of your layers in real-time (in work).
  • Optimise
    With our hyper-parameter optimization, you can find better hyper-parameters fully automated and distributed across multiple servers.
  • Design
    A deep neural network designer helps you to build quickly prototypes or helps you to understand new architectures.

Cluster

  • Server Cluster
    Add your servers and build a cluster to monitor all their hardware utilization and information in one interface. The cluster as a whole gives resources to jobs and assigns automatically.
  • Job scheduler
    With your connected servers, you can quickly create new jobs aka experiments on different servers directly through our interface without fighting with dozens of open SSH terminals.
  • Docker container
    Every jobs can run in a Docker container, making it possible to completely make the environment reproducible.

Collaboration

  • Organisation support
    If you work in an organisation with multiple team members, you can create an organisation account and work together on models/experiments easier.
  • Real-time web application
    Since our main application AETROS Trainer is primarly a HTML5 web application, you see all jobs aka experiments and models in real-time across multiple devices.
  • Email/Slack notification
    Get notified when an experiment failed or finishes via e-mail or Slack integration.

Extra

  • Framework independent
    Our Python SDK is suitable for all kind of Python scripts you want to monitor and analyse. We are not limited to a particular machine learning framework.
  • On-Premises installation
    Our whole AETROS Trainer application with its web interface, API and Git server is available as docker images, so you have all data in your own infrastructure.
  • Full Git integration
    We store all experiments data in Git, along with all of its outputs. We also support external Git repositories, like you're used to with CI/Build tools.

Monitor your
experiment

Supervise key performance indicators
and custom metrics in real time of your model
created in the model designer or your
custom model (framework independent).

Also see all information you need to replicate
the experiment: Used hyperparameters, environment,
own additional information, files, etc.

Track all experiments

See all current running training jobs aka experiments or history to see which
model and hyper-parameters performed best. With channels you can display as many metrics as beautiful graphs as you need.

Compare experiments

Compare every aspect of multiple experiments in very detail.
You can attach custom additional information at each job (e.g. which datasource and split ratio you used)
and compare side by side. Metrics as graphs, hyperparameters and uploaded files can be compared as well.

Server Cluster

You can easily connect your servers and create a cluster that provides resources like CPU, memory and GPUs. Each job defines the resource requirements it needs and AETROS finds the best free server automatically. Since all jobs are encapsulated in Docker container, all assigned resources are quaranteed and GPUs exclusive. More information in our docs Server Cluster.

# Connect any server with AETROS Trainer.
$ aetros server marcj/alpha
Server connected to aetros.com as marcj/alpha

# run commands from your working machine on
# connected server and monitor in AETROS Trainer.
$ aetros run --server=marcj/alpha 'python my-script.py'

Automatic hyperparameter optimization

With our fully automated and scaleable hyperparameter optimization based on Hyperop you find easier and faster better hyperparameters for your model. More information in our documentation.

Are you ready?

Improve your current machine learning workflow.

Register now for free