We have a portfolio of ambitious research projects with high impact potential, both academically and in practice. To advance the boundaries of scientific knowledge, practicality, and usability in a harmonious way, we pursue close collaborations with people from academia and industry.
Zico: Enhancing GPU utilization in DNN cluster
Gangmuk Lim(UNIST), Myeongjae Jeon(UNIST), Wencong Xiao(Alibaba), Youngjin Kwon(KAIST), Jeongseob Ahn(Ajou U)
GPU is obviously the most widely used accelerator for deep learning training. Recent high performance (in terms of accuracy) deep learning models are highly resource intensive, requiring a number of expensive GPUs. However, still this costly GPU is not utilized efficiently, reporting low utilization. We are optimizing and improving GPU utilization through system software supports both in a single server and across a GPU cluster.
Accelerating DNN data loading using programmable hardware
Chanho Park(UNIST), Taeyoon Kim(UNIST), Myeongjae Jeon(UNIST), Kyuho Lee(UNIST), Jaeyoung Do(Microsoft Research), Changdae Kim(ETRI)
We envision that DNN data loading phase will soon require breakthrough in performance. To build robust models from training, the data loading relies on a complex data pre-processing pipeline and a number of data augmentation steps whose processing costs have grown significantly. We cannot use CPU for providing efficient data loading since CPU to GPU/NPU ratio in an AI server continues to fall. To overcome this, we are creating an accelerator specialized for DNN data loading and customizing it for important use cases in the shared AI cluster.
On-device intelligence! Live long and prosper
Soobee Lee(UNIST), Taeyoon Kim(UNIST), Myeongjae Jeon(UNIST), Atul Sandur(UIUC), Di Wang(Microsoft Research), Minjia Zhang(Microsoft Research)
“The computation power has reached another milestone where computing capability is going to be everywhere. And we are going to have intelligence everywhere around us." – Lidong Zhou, Assistant Managing Director of Microsoft Research Asia.
For “machine learning everywhere", we will need mechanisms and policies for operating both inference and continuous model training tasks efficiently on extremely resource-constrained energy-limited edge devices. In this project, we build a runtime system that takes into account limited resources provided by edge devices and schedules those intelligent tasks on the resources very differently from how it used to be done in server systems. In particular, we are refactoring existing execution pipelines for inference and training computational graphs so that we could avoid wasteful resource usage.
Jarvis: Runtime for large-scale datacenter telemetry systems
Chanho Park(UNIST), Myeongjae Jeon(UNIST), Atul Sandur(UIUC), Gul Agha(UIUC), Stavros Volos(Microsoft Research)
Datacenter telemetry systems collect monitoring data from datacenter nodes to a central cluster to process it for real-time troubleshooting, performance fingerprinting, resource scheduling, etc. This centralized approach is not scalable since it entails transporting monitoring data from >100K of nodes on a network shared by customer applications. To meet such scaling requirements, we leverage spare compute resources available on the monitored nodes to pre-process data and reduce network traffic. We are currently evaluating new query re-writing and resource scheduling techniques to promptly harness spare resources at each individual node, while automatically minimizing interference with different types of workload collocated in each node.
AOMG: Approximate quantiles for datacenter telemetry monitoring
Gangmuk Lim(UNIST), Myeongjae Jeon(UNIST), Stavros Volos(Microsoft Research), Mohamed Hassan(Oracle),
Quantile computation on real-time numerical data streams in datacenter is challenging because it has to not only achieve high throughput and low latency on massive data streams but also produce quantiles with small relative error in value since these estimates of quantiles summarize the typical and abnormal behavior of the monitored system. We address these challenges by designing a new quantile approximation algorithm that improves performance, while offering small value errors in a wide range of quantiles by taking into account the underlying distribution density of the data.
We have been doing “Systems for new HW" research on the following areas, and some of them are currently being executed in “Systems + AI" or “Big data analytics". Whenever we find promising HW that can benefit an important workload, we build system for it:
– Big data processing on high bandwidth hybrid memory
– Big data processing on manycore systems
– Compilation and resource manager for AI HW
– Cluster manager for large-scale GPU cluster