Manage multiple NVIDIA GPU compute tasks
ML/DL research often requires running multiple compute tasks with various configurations such as hyperparameter search, ablation study, multiple seeds for DRL and much more. There exist various frameworks and software packages (e.g. Ray, Hadoop, Spark) to solve such issues but require a bit of a learning curve due to their complexity.
nvidia-gpu-scheduler is a simple utility based on Python that supports per GPU compute limits (number of processes, utilization rate, memory usage) on a per-(UNIX)user or worker basis and works with multiple nodes(machines) over the network. It is based on Python’s multiprocessing package.
For more information, follow these instructions.