911±¬ÁÏÍø

XClose

Institute of Communications and Connected Systems

Home
Menu

Optics for Distributed Learning

Split screen. graphic image of a Neuro processing device and fibre optics

1 October 2020



DevelopingÌýlarge scale distributed learning computing systems,Ìýusing new topologies, partitioning and scheduling strategies over optical networks.
Ìý


FunderÌýMicrosoft
Amount £ 73 000

Ìý

Research topicsÌýOptics for distributed learning | Reconfigurable topologies | Neural network computational graphs Ìý| Distributed learning partitioning algorithm Ìý| Gradient reduce strategiesÌýÌý


Description

Data centres have been historically based on a server-centric approach with fixed amounts of processor and directly attached memory resources within the boundary of a mainboard tray. The mismatch between fixed proportionalities and diverse set of workloads can lead to substantially under-utilized resources (some cases even below 40%) that account for 85% of the total data centre cost.

The project aims to explore resource (xPU, memory, storage) disaggregation at Rack and Cluster level and identify the scalability limits both in terms of the number of end-points, network capacity per CPU/Memory, physical distance and the associated penalties to processing power, sustained memory bandwidth etc. In particular, the project will focus on optical network technologies, topologies, strategies and control to support large scale distributed learning, using heterogeneous resources (xPUs) in order to minimize training time of diverse distributed learning models.

Outputs

View PrincipalÌýInvestigator's Publications