Stats/Optimization/Machine Learning Seminar

Statistics, Optimization, and Machine Learning Seminar - Gongguo Tang

Anonymous — Tue, 19 Oct 2021 06:00:00 +0000

Statistics, Optimization, and Machine Learning Seminar - Gongguo Tang Anonymous (not verified) Tue, 10/19/2021 - 00:00

Gongguo Tang; Department of Electrical, Computer, and Energy Engineering; �鶹ӰԺ

Geometry and algorithm for some nonconvex optimizations

Great progress has been made in the past few years in our understanding of nonconvex optimizations. In this talk, I will share with you three of our works in this direction. In one, we study low-rank matrix optimization with a general objective function, solved by factoring the matrix variable as the product of two smaller matrices. We characterize the global optimization geometry of the nonconvex factored problem and show that the corresponding objective function satisfies the robust strict saddle property as long as the original objective function satisfies restricted strong convexity and smoothness properties. In two, we recognized that many machine learning problems involve minimizing empirical risk functions with well-behaved population risks. Instead of analyzing the non-convex empirical risk directly, we first study the landscape of the corresponding population risk, which is usually easier to characterize, and then build a connection between the landscape of the empirical risk and its population risk. Lastly, we study the convergence behavior of alternating minimization, a class of algorithms widely used to solve the aforementioned nonconvex optimizations. We show that under mild assumptions on the (nonconvex) objective functions, these algorithms avoid strict saddles and converge to second-order optimal solutions almost surely from random initialization.

Off

Traditional

White

Statistics, Optimization, and Machine Learning Seminar - Samy Wu Fung

Anonymous — Tue, 12 Oct 2021 06:00:00 +0000

Statistics, Optimization, and Machine Learning Seminar - Samy Wu Fung Anonymous (not verified) Tue, 10/12/2021 - 00:00

Samy Wu Fung, Department of Applied Mathematics and Statistics, Colorado School of Mines

Efficient Training of Infinite-depth Neural Networks via Jacobian-free Backpropagation

A promising trend in deep learning replaces fixed depth models by approximations of the limit as network depth approaches infinity. This approach uses a portion of network weights to prescribe behavior by defining a limit condition. This makes network depth implicit, varying based on the provided data and an error tolerance. Moreover, existing implicit models can be implemented and trained with fixed memory costs in exchange for additional computational costs. In particular, backpropagation through implicit depth models requires solving a Jacobian-based equation arising from the implicit function theorem. We propose a new Jacobian-free backpropagation (JFB) scheme that circumvents the need to solve Jacobian-based equations while maintaining fixed memory costs. This makes implicit depth models much cheaper to train and easy to implement. Numerical experiments show JFB is more computationally efficient while maintaining competitive accuracy for classification tasks.

Off

Traditional

White

Statistics, Optimization, and Machine Learning Seminar - Pratyush Tiwary (Virtual)

Anonymous — Tue, 28 Apr 2020 06:00:00 +0000

Statistics, Optimization, and Machine Learning Seminar - Pratyush Tiwary (Virtual) Anonymous (not verified) Tue, 04/28/2020 - 00:00

Pratyush Tiwary; Department of Chemistry & Biochemistry and Institute for Physical Science and Technology; University of Maryland

From atoms to emergent dynamics (with help from statistical physics and artificial intelligence)

ABSTRACT:

The ability to rapidly learn from high-dimensional data to make reliable predictions about the future of a given system is crucial in many contexts. This could be a fly avoiding predators, or the retina processing terabytes of data almost instantaneously to guide complex human actions. In this work we draw parallels between such tasks, and the efficient sampling of complex molecules with hundreds of thousands of atoms. Such sampling is critical for predictive computer simulations in condensed matter physics and biophysics. Specifically, we use ideas from statistical physics, artificial intelligence (AI) and information theory, including the Maximum Caliber approach [1], Predictive Information Bottleneck (PIB) [2], Shannon's rate distortion theory [3] and Long-short term memory (LSTM) networks, re-formulating them for the sampling of molecular structure and dynamics, especially when plagued with rare events. We demonstrate our methods on different test-pieces primarily in biophysics. We also discuss some open problems in the application of AI approaches to molecular simulations - for instance dealing with spurious solutions related to non-convex objective function in AI.

1. Tiwary and Berne, Proc. Natl. Acad. Sci. 2016

2. Wang, Ribeiro and Tiwary, Nature Commun. 2019

3. Ravindra, Smith and Tiwary, Mol. Sys. Des. Engg. 2020

Bio: Pratyush Tiwary is an Assistant Professor at the University of Maryland, College Park, holding joint positions in the Department of Chemistry and Biochemistry and the Institute for Physical Science and Technology. He received his PhD and MS in Materials Science from Caltech, working with Axel van de Walle, and finished his undergraduate degree in Metallurgical Engineering at the Indian Institute of Technology, Banaras Hindu University, Varanasi. Prior to starting at Maryland, Prof. Tiwary was a postdoc in the Department of Chemistry at Columbia University working with Bruce Berne, and at the Department of Chemistry & Applied Biosciences at ETH Zurich work with Michele Parrinello.

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Anindya De

Anonymous — Tue, 25 Feb 2020 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Anindya De Anonymous (not verified) Tue, 02/25/2020 - 00:00

Anindya De, Department of Computer and Information Science, University of Pennsylvania

Testing noisy linear functions for sparsity

Consider the following basic problem in sparse linear regression -- an algorithm gets labeled samples of the form (x, + \eps) where w is an unknown n-dimensional vector, x is drawn from a background n-dimensional distribution D, and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Romberg and Tao shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary.

We look at this problem from the vantage point of property testing, and study the complexity of determining whether the unknown vector w is k-sparse versus "far" from k-sparse. We show that this decision problem can be solved with a number of samples which is independent of n as long as the background distribution D is i.i.d. and its components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale logarithmically in n.

Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Stephen Becker

Anonymous — Tue, 28 Jan 2020 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Stephen Becker Anonymous (not verified) Tue, 01/28/2020 - 00:00

Stephen Becker, Department of Applied Mathematics, �鶹ӰԺ

Stochastic Subspace Descent: Stochastic gradient-free optimization, with applications to PDE-constrained optimization

We describe and analyze a family of algorithms that generalize block-coordinate descent, where we assume one can take directional derivatives (for low-precision optimization, this can be approximated with finite differences, hence this is similar to a 0th order method). The method generalizes randomized block coordinate descent. We prove almost-sure convergence of the algorithm at a linear rate (under strong convexity) and convergence (with convexity). Furthermore, we analyze a variant similar to SVRG but that does not require the finite-sum structure in the objective, and for isotropic random sampling, we use Johnson-Lindenstrauss style arguments to provide non-asymptotic, probabilistic convergence results. Numerical examples are provided for selecting landmark points in Gaussian process regression, and in PDE-constrained optimization (shape optimization). This is joint work with Luis Tenorio and David Kozak from the Colorado School of Mines Applied Math & Stat Dept, and Alireza Doostan from CU �鶹ӰԺ's Smead Aerospace Engineering Dept.

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Zhihui Zhu

Anonymous — Tue, 21 Jan 2020 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Zhihui Zhu Anonymous (not verified) Tue, 01/21/2020 - 00:00

Zhihui Zhu, Department of Electrical and Computer Engineering, University of Denver

Provable Nonsmooth Nonconvex Approaches for Low-Dimensional Models

As technological advances in fields such as the Internet, medicine, finance, and remote sensing have produced larger and more complex data sets, we are faced with the challenge of efficiently and effectively extracting meaningful information from large-scale and high-dimensional signals and data. Many modern approaches to addressing this challenge naturally involve nonconvex optimization formulations. Although in theory finding a local minimizer for a general nonconvex problem could be computationally hard, recent progress has shown that many practical (smooth) nonconvex problems obey benign geometric properties and can be efficiently solved to global solutions.

In this talk, I will extend this powerful geometric analysis to robust low-dimensional models where the data or measurements are corrupted by outliers taking arbitrary values. We consider nonsmooth nonconvex formulations of the problems, in which we employ an L1-loss function to robustify the solution against outliers. We characterize a sufficiently large basin of attraction around the global minima, enabling us to develop subgradient-based optimization algorithms that can rapidly converge to a global minimum with a data-driven initialization. I will also talk about our very recent work for general nonsmooth optimization on the Stiefel manifold which appears widely in engineering. I will discuss the efficiency of this approach in the context of robust subspace recovery, robust low-rank matrix recovery, and orthogonal dictionary learning.

Bio:
Zhihui Zhu received a Ph.D. degree in electrical engineering from Colorado School of Mines, Golden, CO, in 2017, and was a Postdoctoral Fellow in the Mathematical Institute for Data Science at the Johns Hopkins University, Baltimore, MD, in 2018-2019. He is an Assistant Professor with the Department of Electrical and Computer Engineering, University of Denver, CO. His research interests include the areas of data science, machine learning, signal processing, and optimization. His current research largely focuses on the theory and applications of nonconvex optimization and low-dimensional models in large-scale machine learning and signal processing problems.

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Amir Ajalloeian, Maddela Avinash, Ayoub Ghriss

Anonymous — Tue, 10 Dec 2019 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Amir Ajalloeian, Maddela Avinash, Ayoub Ghriss Anonymous (not verified) Tue, 12/10/2019 - 00:00

Amir Ajalloeian; Department of Electrical, Computer, and Energy Engineering; �鶹ӰԺ

Inexact Online Proximal-gradient Method for Time-varying Convex Optimization

This paper considers an online proximal-gradient method to track the minimizers of a composite convex function that may continuously evolve over time. The online proximal-gradient method is "inexact,'' in the sense that: (i) it relies on an approximate first-order information of the smooth component of the cost; and, (ii)~the proximal operator (with respect to the non-smooth term) may be computed only up to a certain precision. Under suitable assumptions, convergence of the error iterates is established for strongly convex cost functions. On the other hand, the dynamic regret is investigated when the cost is not strongly convex, under the additional assumption that the problem includes feasibility sets that are compact. Bounds are expressed in terms of the cumulative error and the path length of the optimal solutions. This suggests how to allocate resources to strike a balance between performance and precision in the gradient computation and in the proximal operator.

Maddela Avinash; Department of Electrical, Computer, and Energy Engineering; �鶹ӰԺ

Semidefinite Relaxation technique to solve Optimal power flow Problem

I would like to discuss about using the convex relaxation technique to find the optimal solution for cost function of a power distribution system. Conventional optimal power flow problem is a nonconvex problem. Traditional Newton-Rpahson method has a convergence issue when the system reaches its limit. Semi definite programming makes an approximation to the power flow constraints by increasing the boundaries of the feasible set to make it a convex problem. This convex problem can therefore be solved to minimize the total cost of generation, transmission and distribution of Electric Power.

Ayoub Ghriss, Department of Computer Science, �鶹ӰԺ

Hierarchical Deep Reinforcement Learning through Mutual Information Maximization

As it’s the case of the human learning, biological organisms can master tasks from extremely small samples. This suggests that acquiring new skills is done in a hierarchical fashion starting with simpler tasks that allow the abstraction of newly seen samples. While reinforcement learning is rooted in Neuroscience and Psychology, Hierarchical Reinforcement Learning (HRL) was developed in the machine learning field by adding the abstraction of either the states or the actions. Temporally abstract actions, our main focus, consists of top-level/manager policy and a set of temporally extended policies (low-level/workers). At each step, a policy from this set is picked by the manager and continues to run until a set of specified termination states is reached. The decision making in this hierarchy of policies starts by top-level policy that assigns low-level policies to different domains of the state space. These low-level policies operate as any other monolithic policy on the assigned domain. In this talk, we introduce HRL and present our method to learn the hierarchy: we use Information Maximization to learn the top-level policies with on-policy method (Trust Region Policy Optimization) to learn the low-level policies.

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Sriram Sankaranarayanan

Anonymous — Tue, 12 Nov 2019 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Sriram Sankaranarayanan Anonymous (not verified) Tue, 11/12/2019 - 00:00

Sriram Sankaranarayanan, Department of Computer Science, �鶹ӰԺ

Reasoning about Neural Feedback Systems

Data-driven components such as feedforward neural networks are increasingly being used in critical safety systems such as autonomous vehicles and closed-loop medical devices. Neural networks compute nonlinear functions. Relatively tiny networks present enormous challenges for existing reasoning techniques used in formal verification. In this work, we will present the first steps into verifying properties of neural networks in isolation, and reasoning about properties of dynamical systems with neural networks as feedback.

Joint work with Souradeep Dutta (CU �鶹ӰԺ), Ashish Tiwari (Microsoft) and Susmit Jha (SRI).

Bio: Sriram Sankaranarayanan is an associate professor of Computer Science at the University of Colorado, �鶹ӰԺ. His research interests include automatic techniques for reasoning about the behavior of computer and cyber-physical systems. Sriram obtained a Ph.D. in 2005 from Stanford University where he was advised by Zohar Manna and Henny Sipma. Subsequently, he worked as a research staff member at NEC research labs in Princeton, NJ. He has been on the faculty at CU �鶹ӰԺ since 2009. Sriram has been the recipient of awards including the President's Gold Medal from IIT Kharagpur (2000), Siebel Scholarship (2005), the CAREER award from NSF (2009), Dean's award for outstanding junior faculty (2012), outstanding teaching (2014), and the Provost's faculty achievement award (2014).

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Mohsen Imani

Anonymous — Tue, 05 Nov 2019 07:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Mohsen Imani Anonymous (not verified) Tue, 11/05/2019 - 00:00

Mohsen Imadi; Department of Computer Science and Engineering; University of California, San Diego Towards Learning with Brain Efficiency Modern computing systems are plagued with significant issues in efficiently performing learning tasks. In this talk, I will present a new brain-inspired computing architecture. It supports a wide range of learning tasks while offering higher system efficiency than the other existing platforms. I will first focus on HyperDimensional (HD) computing, an alternative method of computation which exploits key principles of brain functionality: (i) robustness to noise/error and (ii) intertwined memory and logic. To this end, we design a new learning algorithm resilient to hardware failure. We then build the architecture exploiting emerging technologies to enable processing in memory. I will also show how we use the new architecture to accelerate other brain-like computations such as deep learning and other big data processing. Bio: Mohsen Imani is a Ph.D. candidate in the Department of Computer Science and Engineering at UC San Diego. His research interests are in brain-inspired computing and computer architecture. He is an author several publications at top tier conferences and journals. His contributions resulted in over $40M grants funded from... https://calendar.colorado.edu/event/stats_optimization_and_machine_learning_seminar_-_mohsen_imani

Off

Traditional

White

Stats, Optimization, and Machine Learning Seminar - Alec Dunton

Anonymous — Tue, 22 Oct 2019 06:00:00 +0000

Stats, Optimization, and Machine Learning Seminar - Alec Dunton Anonymous (not verified) Tue, 10/22/2019 - 00:00

Alec Dunton, Department of Applied Mathematics, �鶹ӰԺ Learning a kernel matrix for nonlinear dimensionality reduction (Weinberger et. al. 2004) We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that unfolds the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods. https://calendar.colorado.edu/event/stats_optimization_and_machine_learning_seminar_-_alec_dunton

Off

Traditional

White