Today we are happy to announce another very big release of AETROS Trainer and our platform with a complete overhauled storage engine and improved collaboration user interface.
We got in the last months tons of feedback: one of the biggest was that companies and private organisations want to have their own AETROS Trainer
in their own server infrastructure. Teams that work together on machine learning models ask quite a lot about a feature called "organisations" where
you have a shared space of models, datasets and jobs - like you probably already know from software like GitHub.
Also, we got meanwhile power users creating tons of models and jobs each week. We found for all of your wishes a solution and are proud to present you our work.
Another big topic is for us to make all your data more accessible to you. Thus we changed our whole model and job storage engine to Git by using an open file structure and not our internal MySQL database anymore. This means you have all your model code and job information (metrics, used code, results, images, logs, etc) now in a Git repository you can easily clone and modify. This is not only better for our application scaling, but allows you to keep your experiments data and results now always in your hands, making it offline accessible and usable with your Git client of choice.
Directly after our latest release
2017.3 we want to inform you about the features coming in the next release.
We had tons of feedback for our AETROS Trainer. One of the biggest was to provide the software as on-premise. Obviously important for bigger companies that want to have everything in their own network. We had this already on our list, but will move it now to top priority and schedule the on-premise version as of July 2017.
As you know you can already browse and use community models on our platform. We want to extend that by providing datasets and leaderboards for public available datasets as well. To know how good your algorithm compares to others is always interesting and an indicator whether you are on something big or if you fail hard. Also, for beginners it's super interesting to see which algorithm performs currently best at a certain dataset. We want to make the datasets you can already manage in AETROS Trainer available through our website with a leader board where everyone can submit a tagged job of a model to this dataset to show his/her results publicity.
The next release is scheduled for end of July 2017.
Today we are happy to announce one of our biggest release of AETROS Trainer and our platform with tons of new cool features for all data scientists and machine learning engineers out there!
With this update we made tracking and organizing experiments easier than ever. You can now give an experiment a description you can use to describe the uniqueness of this experiment. This gives a better overview and allows you to remember faster what was special about an experiment. If you need to reproduce results on different machines, you have a perfect overview of your used system variables and library versions with the environment feature. Additionally, you can upload your source files to AETROS and stick it to the experiment, so you can track source code changes and see directly changes between versions using our new compare view.Read more ...
Have you ever wondered what you changed in your soure code, dataset or hyperparameters to get a particular result? Or why one week later your new experiment gets way worse results? Well, you could use git to version everything and use your git interface like Github to track changes (or use Excel). However, since this is cumbersome and we want to have everything in once place, we built an experiment comparison view, where you have a side-by-side view of all aspects of your experiment: Progress, hyper-parameters, additional information, metrics and even unified diffs of your source code. With this new feature, you can compare multiple experiments side-by-side and see instantly differences, which will help you for example to find the cause in changes of the performance of your models way faster.Read more ...
We are often vesting our time with looking at the progress of a specific job and yes sometimes this is super existing, but on other days we just want to know when a specific job is done and we can get the results. Sometimes we start a long running experiment and 20 minutes later the RAM is full and the experiment crashed but we just see this the next morning.
Therefore, we implemented the notification function. You can now decide what model should trigger a notification by watching the model. We send an email or Slack message when a experiment is finished or crashed.Read more ...
When working with long running experiments, it's very important to get a feeling for the overall computation performance, so you can calculate with an ETA.
Also, every time you change your architecture or training data you may run into performance penalties. Since computation is expensive and you usually don't want
to wait weeks for a training job, you need to keep an eye on those stats. We improved the overall tracking of the progress of a training job now even more. You can
not only see epochs (using
job.progress(current, total)) but also see its batch progress (using new method
job.batch(current, total, [size])) which acts
as a sub progress of the regular epoch/iteration tracking. We calculate for you as before automatically an ETA, and samples/s if you pass
You can now also get an better overview of the used hyperparameters and additional information you can freely set using
job.set_info(key, value) to enrich
the job with additional information. Also, we added an loss tracking graph that indicates whether your model overfits or underfits.
We just published one of our newest and biggest features: Model optimizations. This allows you to automatically find hyperparameters based on the KPI of any python model, no matter which framework you use. Basic idea:
learning_rate from 0.005 to 0.5)
In our user interface AETROS Trainer you can start as many optimizations as you want, watch its progress, adjust hyperparameter spaces, compare or export the results.
See our main documentation: Automatic hyperparameter optimization.
You can now define hyperparameters in a even more detailed way. You can choose between seven types: String, Number, Boolean, Group (dict), Choice: String (selectbox), Choice: Number(selectbox), Choice: Group (select group). And of course, you can overwrite those hyperparameters per job.Read more ...
With our improved jobs browser you can now create own job categories, export jobs as CSV and see continuous integration builds. If you hover with your mouse over a particular job you see now additionally all used hyperparameters and custom information. Through the implementation of a pagination you can now browse hundreds or thousands of jobs without performance issues.Read more ...
You can now connect external server with AETROS using
aetros-cli so you can start and monitor training jobs across multiple external servers with just one command.
This makes it super easy to distribute your training jobs across multiple servers without using ssh (and start each job manually).
More documentation about that feature can be seen at External server infrastructure.
We are excited to announce that we won the NVIDIA Inception "cool demo" contest! The price is a brand new NVIDIA Pascal Titan X, that we will definitely use to train a lot of new hot deep learning models. Thank you very much, NVIDIA!Read more ...