Key research themes
1. How can communication-efficient strategies mitigate bandwidth and latency constraints in distributed and federated learning systems?
This research area focuses on developing algorithms and frameworks that reduce communication overhead in distributed learning settings, such as federated learning, where bandwidth limitations and communication latency pose critical bottlenecks. Efficient communication protocols aim to compress, sparsify, or coordinate information exchange to maintain model performance while minimizing the amount and frequency of data transmitted between clients and servers or peers. This is crucial for scalability, privacy preservation, and operational feasibility in heterogeneous and resource-constrained environments.
2. What algorithmic frameworks and optimization techniques enable scalable, provably convergent distributed training of nonconvex models such as neural networks?
Training neural networks in distributed environments inherently involves solving nonconvex optimization problems with data partitioned across agents connected by possibly dynamic network topologies. Designing algorithms that guarantee convergence to stationary points, handle nonconvexity, and efficiently utilize computation and communication resources is nontrivial. Research in this area develops frameworks based on successive convex approximation, primal convexification, dynamic consensus, and gradient coding to provide scalable, general solutions with convergence guarantees and support for parallelism and robustness to stragglers.
3. How can distributed learning be designed to effectively handle data heterogeneity, non-IIDness, and decentralized data partitions in federated and collaborative learning?
Data heterogeneity and non-identically distributed (non-IID) characteristics of clients' local datasets pose significant challenges to federated and collaborative learning. Approaches addressing vertical (feature-partitioned) and horizontal (sample-partitioned) data distributions focus on enabling collaborative training without sharing raw data or model parameters, preserving privacy, and ensuring effective convergence. Research targets clustering clients by data similarity, multiple local update methods, multi-agent systems for dynamic node management, and ensemble architectures for continual learning on streaming nonstationary data—seeking personalized and robust global models in realistic distributed environments.