Version 2017.4 released

2. September 2017 - Releases

Today we are happy to announce another very big release of AETROS Trainer and our platform with a complete overhauled storage engine and improved collaboration user interface.

Changes at a glance

We got in the last months tons of feedback: one of the biggest was that companies and private organisations want to have their own AETROS Trainer in their own server infrastructure. Teams that work together on machine learning models ask quite a lot about a feature called "organisations" where you have a shared space of models, datasets and jobs - like you probably already know from software like GitHub. Also, we got meanwhile power users creating tons of models and jobs each week. We found for all of your wishes a solution and are proud to present you our work.

Another big topic is for us to make all your data more accessible to you. Thus we changed our whole model and job storage engine to Git by using an open file structure and not our internal MySQL database anymore. This means you have all your model code and job information (metrics, used code, results, images, logs, etc) now in a Git repository you can easily clone and modify. This is not only better for our application scaling, but allows you to keep your experiments data and results now always in your hands, making it offline accessible and usable with your Git client of choice.

Read more ...

Road to version 2017.4

13. June 2017 - Releases

Directly after our latest release 2017.3 we want to inform you about the features coming in the next release.

On-Premise solution

We had tons of feedback for our AETROS Trainer. One of the biggest was to provide the software as on-premise. Obviously important for bigger companies that want to have everything in their own network. We had this already on our list, but will move it now to top priority and schedule the on-premise version as of July 2017.

Community Datasets

As you know you can already browse and use community models on our platform. We want to extend that by providing datasets and leaderboards for public available datasets as well. To know how good your algorithm compares to others is always interesting and an indicator whether you are on something big or if you fail hard. Also, for beginners it's super interesting to see which algorithm performs currently best at a certain dataset. We want to make the datasets you can already manage in AETROS Trainer available through our website with a leader board where everyone can submit a tagged job of a model to this dataset to show his/her results publicity.

The next release is scheduled for end of July 2017.

Version 2017.3 released

12. June 2017 - Releases

Today we are happy to announce one of our biggest release of AETROS Trainer and our platform with tons of new cool features for all data scientists and machine learning engineers out there!

Changes at a glance

With this update we made tracking and organizing experiments easier than ever. You can now give an experiment a description you can use to describe the uniqueness of this experiment. This gives a better overview and allows you to remember faster what was special about an experiment. If you need to reproduce results on different machines, you have a perfect overview of your used system variables and library versions with the environment feature. Additionally, you can upload your source files to AETROS and stick it to the experiment, so you can track source code changes and see directly changes between versions using our new compare view.

Read more ...

Version 2017.3: Experiment comparison

12. June 2017 - Feature

Have you ever wondered what you changed in your soure code, dataset or hyperparameters to get a particular result? Or why one week later your new experiment gets way worse results? Well, you could use git to version everything and use your git interface like Github to track changes (or use Excel). However, since this is cumbersome and we want to have everything in once place, we built an experiment comparison view, where you have a side-by-side view of all aspects of your experiment: Progress, hyper-parameters, additional information, metrics and even unified diffs of your source code. With this new feature, you can compare multiple experiments side-by-side and see instantly differences, which will help you for example to find the cause in changes of the performance of your models way faster.

Read more ...

Version 2017.3: Experiment notifications

12. June 2017 - Feature

We are often vesting our time with looking at the progress of a specific job and yes sometimes this is super existing, but on other days we just want to know when a specific job is done and we can get the results. Sometimes we start a long running experiment and 20 minutes later the RAM is full and the experiment crashed but we just see this the next morning.

Therefore, we implemented the notification function. You can now decide what model should trigger a notification by watching the model. We send an email or Slack message when a experiment is finished or crashed.

Read more ...

Better job tracking

15. May 2017 - Feature

When working with long running experiments, it's very important to get a feeling for the overall computation performance, so you can calculate with an ETA. Also, every time you change your architecture or training data you may run into performance penalties. Since computation is expensive and you usually don't want to wait weeks for a training job, you need to keep an eye on those stats. We improved the overall tracking of the progress of a training job now even more. You can not only see epochs (using job.progress(current, total)) but also see its batch progress (using new method job.batch(current, total, [size])) which acts as a sub progress of the regular epoch/iteration tracking. We calculate for you as before automatically an ETA, and samples/s if you pass size in job.batch.

You can now also get an better overview of the used hyperparameters and additional information you can freely set using job.set_info(key, value) to enrich the job with additional information. Also, we added an loss tracking graph that indicates whether your model overfits or underfits.

Automatic hyperparameter optimization: Easier than ever

24. February 2017 - Feature

We just published one of our newest and biggest features: Model optimizations. This allows you to automatically find hyperparameters based on the KPI of any python model, no matter which framework you use. Basic idea:

  1. You define hyperparameters (for example learning_rate)
  2. Its spaces (for example learning_rate from 0.005 to 0.5)
  3. We start the training script of your model several times with hyperparameters within the given space
  4. In your model you send us your KPI (accuracy for example)
  5. We can determine which hyperparameter performed well and which didn't
  6. Calculate further hyperparameter and start automatically as many training runs as you want (at wish parallel distributed across multiple servers)

In our user interface AETROS Trainer you can start as many optimizations as you want, watch its progress, adjust hyperparameter spaces, compare or export the results.

Features

  • Automatic search of better hyperparameters
  • Three different optimization algorithms (Random, TPE, Annealing)
  • Automatic training job distribution across multiple servers
  • Very detailed and convenient hyperparameter space definition through interface
  • Runtime constrains like max epochs and max time
  • Watch the process, results and metrics in real-time in AETROS Trainer
  • Completely based on Git
  • All results can be exported as CSV

Full documentation

See our main documentation: Automatic hyperparameter optimization.

Improved hyperparameters

23. February 2017 - Feature

You can now define hyperparameters in a even more detailed way. You can choose between seven types: String, Number, Boolean, Group (dict), Choice: String (selectbox), Choice: Number(selectbox), Choice: Group (select group). And of course, you can overwrite those hyperparameters per job.

Read more ...

Better jobs browser

20. February 2017 - Feature

With our improved jobs browser you can now create own job categories, export jobs as CSV and see continuous integration builds. If you hover with your mouse over a particular job you see now additionally all used hyperparameters and custom information. Through the implementation of a pagination you can now browse hundreds or thousands of jobs without performance issues.

Read more ...

New feature: External servers / job scheduler

25. January 2017 - Feature

You can now connect external server with AETROS using aetros-cli so you can start and monitor training jobs across multiple external servers with just one command. This makes it super easy to distribute your training jobs across multiple servers without using ssh (and start each job manually).

More documentation about that feature can be seen at Server Management.

New Website, new version and won NVIDIA contest

25. January 2017 - Company

We are excited to announce that we won the NVIDIA Inception "cool demo" contest! The price is a brand new NVIDIA Pascal Titan X, that we will definitely use to train a lot of new hot deep learning models. Thank you very much, NVIDIA!

Read more ...