Random Neural Networks at Finite Width and Large Depth

October 28, 2021

10:20AM - 11:15AM

Zoom

Add to Calendar 2021-10-28 10:20:00 2021-10-28 11:15:00 Random Neural Networks at Finite Width and Large Depth Title:  Random Neural Networks at Finite Width and Large Depth Speaker:  Boris Hanin (Princeton) Abstract:  Deep neural networks are often considered to be complicated “black boxes,” for which a full systematic analysis is not only out of reach but also impossible. In this talk, which is based in part on joint work with Sho Yaida and Daniel Adam Roberts, I will make the opposite claim. Namely, that deep neural networks with random weights and biases (i.e. networks at the start of training) are solvable models. Our approach applies to networks at finite width n and large depth L, the regime in which they are used in practice. A key point will be the emergence of a notion of “criticality,” which involves a finetuning of model parameters (weight and bias variances). At criticality, neural networks are particularly well-behaved but still exhibit a tension between large values for n and L, with large values of n tending to make neural networks more like Gaussian processes and large values of L amplifying higher cumulants. Our analysis at initialization has a number of consequences for networks during and after training, which I will discuss if time permits. URL associated with Seminar https://u.osu.edu/probability/autumn-2021/ Zoom Department of Mathematics math@osu.edu America/New_York public

Title:  Random Neural Networks at Finite Width and Large Depth

Speaker:  Boris Hanin (Princeton)

Abstract:  Deep neural networks are often considered to be complicated “black boxes,” for which a full systematic analysis is not only out of reach but also impossible. In this talk, which is based in part on joint work with Sho Yaida and Daniel Adam Roberts, I will make the opposite claim. Namely, that deep neural networks with random weights and biases (i.e. networks at the start of training) are solvable models. Our approach applies to networks at finite width n and large depth L, the regime in which they are used in practice. A key point will be the emergence of a notion of “criticality,” which involves a finetuning of model parameters (weight and bias variances). At criticality, neural networks are particularly well-behaved but still exhibit a tension between large values for n and L, with large values of n tending to make neural networks more like Gaussian processes and large values of L amplifying higher cumulants. Our analysis at initialization has a number of consequences for networks during and after training, which I will discuss if time permits.

URL associated with Seminar
https://u.osu.edu/probability/autumn-2021/

Events Filters:

Seminar - Combinatorics