Deep learning is a rapidly growing field of machine learning, and has proven successful in many domains, including computer vision, language translation, and speech recognition. The training of deep neural networks is resource intensive, requiring compute accelerators such as GPUs, as well as large amounts of storage and memory, and network bandwidth. Additionally, getting the training data ready requires a lot of tooling for data cleansing, data merging, ambiguity resolution, etc. Sophisticated middleware abstractions are needed to schedule resources, manage the distributed training job as well as visualize how well the training is progressing. Likewise, serving the large neural network models with low latency constraints can require middleware to manage model caching, selection, and refinement.
All the major cloud providers, including Amazon, Google, IBM, and Microsoft have started to offer cloud services in the last year or so with services to train and/or serve deep neural network models. In addition, there is a lot of activity in open source middleware for deep learning, including Tensorflow, Theano, Caffe2, PyTorch, and MXNet. There are also efforts to extend existing platforms such as Spark for deep learning workloads.
This workshop focuses on the tools, frameworks, and algorithms to support executing deep learning algorithms in a distributed environment. As new hardware and accelerators become available, the middleware and systems need to be able exploit their capabilities and ensure they are utilized efficiently.
Papers primarily based on (but not limited to) the following topics are welcome: (Topics include but not limited to)