Abstract
Time-sharing, which allows for multiple users to use a shared resource, is an important and fundamental aspect of modern computing systems. However, accelerators such as GPUs, that come without a native operating system do not support time sharing. The inability of accelerators to support time-sharing limits their applicability especially as they are getting deployed in Platform-as-a-Service and Resource-as-a-Service environments. In the former, elastic demands may require preemption where as in the latter, fine-grained economic models of service cost can be supported with time sharing. In this paper, we extend the concept of time sharing to the GPGPU computational space using a cooperative multitasking approach. Our technique is applicable to any GPGPU program written in Compute Unified Device Architecture (CUDA) API provided for C/C++ programming languages. With minimal support from the programmer, our framework incorporates process scheduling, light-weight memory management, and multiGPU support. Our framework provides an abstraction where, in a round-robin manner, every workload can use a GPU(s) over a time quantum exclusively. We demonstrate the applicability of our scheduling framework by running many workloads concurrently in a time sharing manner