We have a portfolio of ambitious research projects with high impact potential, both academically and in practice. To advance the boundaries of scientific knowledge, practicality, and usability in a harmonious way, we pursue close collaborations with people from academia and industry.
Zico: Enhancing GPU utilization in DNN cluster
UNIST & Alibaba & KAIST & Ajou University
GPU is obviously the most widely used accelerator for deep learning training. Recent high performance (in terms of accuracy) deep learning models are highly resource intensive, requiring a number of expensive GPUs. However, still this costly GPU is not utilized efficiently, reporting low utilization. We are optimizing and improving GPU utilization through system software supports both in a single server and across a GPU cluster.
Accelerating DNN data loading using programmable hardware
UNIST & Microsoft Research & ETRI
We envision that the DNN data loading phase will soon require a breakthrough in performance. To build robust models from training, the data loading relies on a complex data pre-processing pipeline and a number of data augmentation steps whose processing costs have grown significantly. We must utilize CPU resources effectively to provide efficient data loading since the CPU to GPU/NPU ratio in an AI server continues to fall. To overcome this, we are speeding up DNN data loading on the CPU and customizing it for important use cases in the multi GPU AI cluster.
We study on building a novel continual learning system that redesigns the existing episodic memory architecture. Existing continual learning methods have tried to overcome catastrophic forgetting and resource constraint by using limited size of episodic memory. However, they ended up encountering performance degradation due to the bounded training setup. Based on this insight, we leverage memory hierarchy and build the continual learning system that optimizes both system efficiency and model accuracy.
Deep learning applications are increasingly requiring greater computing resources. Many studies and productions have been released to address this problem. However, such approaches are limited to a high computing center situated on a data center. We research deep learning training in the data center and commodity workstations using platform-based training, allowing anybody to participate and benefit from the platform's resources.
Jarvis: Runtime for large-scale datacenter telemetry systems
UNIST & UIUC & Microsoft Research
Datacenter telemetry systems collect monitoring data from datacenter nodes to a central cluster to process it for real-time troubleshooting, performance fingerprinting, resource scheduling, etc. This centralized approach is not scalable since it entails transporting monitoring data from >100K of nodes on a network shared by customer applications. To meet such scaling requirements, we leverage spare compute resources available on the monitored nodes to pre-process data and reduce network traffic. We are currently evaluating new query re-writing and resource scheduling techniques to promptly harness spare resources at each individual node, while automatically minimizing interference with different types of workload collocated in each node.
AOMG: Approximate quantiles for datacenter telemetry monitoring
UNIST & Microsoft Research & Oracle & Facebook
Quantile computation on real-time numerical data streams in datacenter is challenging because it has to not only achieve high throughput and low latency on massive data streams but also produce quantiles with small relative error in value since these estimates of quantiles summarize the typical and abnormal behavior of the monitored system. We address these challenges by designing a new quantile approximation algorithm that improves performance, while offering small value errors in a wide range of quantiles by taking into account the underlying distribution density of the data.
We have been doing “Systems for new HW" research on the following areas, and some of them are currently being executed in “Systems + AI" or “Big data analytics". Whenever we find promising HW that can benefit an important workload, we build system for it:
– Big data processing on high bandwidth hybrid memory
– Big data processing on manycore systems
– Compilation and resource manager for AI HW
– Cluster manager for large-scale GPU cluster