ML seminars: Why does standard gradient descent work so well in practice?
Welcome to the seminar series on machine learning organized by IfI, UiO.
Title: Why does standard gradient descent work so well in practice?
Speaker: Tuyen Trung Truong, Math, UiO
Abstract: The standard gradient descent is a popular optimisation method
used in many fields, including Deep Learning. Introduced by Cauchy since
1847, many of its properties have been discovered but many more need to
be found. In particular, it has been a debate of why this simple method
works so efficiently (most of the time it converges to a minimum point).
In this talk, I will present a brief overview of current status of
gradient descent methods, in theory and also in practice in Deep
Learning, in particular my recent joint work (available at: arXiv:
1808.05160) with Tuan Hang Nguyen (AXON AI Research). In this work, we
prove that for most function (including all Morse functions), the
backtracking variant of gradient descent either converges to a single
critical point or diverges to infinity, and we also illustrate how it
can be used very efficiently in Deep Learning (in particular, helps to
avoid the practice of manual fine-tuning of learning rates). This result
can be used to provide a heuristic explanation to the question in the title.