
Events

Seminar @ Cornell Tech: Will Ma
Model-based vs. Model-free Online Decision-making for Optimal Stopping, Pricing, and Inventory
Buoyed by advances in computation and AI, organizations have a growing interest in adopting black-box algorithms for decision-making. Motivated particularly by Reinforcement Learning (RL) at an industry partner, we compare model-based to model-free approaches on several stochastic optimization problems over time. Here, Will Ma defines “model-based” to construct independent distributions over time and solve dynamic programming, while “model-free” optimizes over historical trajectories in a black-box fashion.
Ma provides a rigorous theoretical comparison in a finite-horizon time-inhomogeneous RL setting with offline trajectories that allow perfect hindsight evaluation. He derives surprising horizon-independence results for PAC-learning, which exploit the problem-specific structure of the Inventory Replenishment and Optimal Stopping problems. Perhaps more surprisingly, this requires different approaches for different problems: model-free for Inventory, and model-based for Stopping. Meanwhile, Ma shows that a horizon-independent learning guarantee is impossible for the problem of Dynamic Pricing of a single item.
These theoretical results are consistent with simulation findings and are explanatory of successes/failures in deploying model-free RL at our industry partner. The takeaway is that organizations should not adopt a one-size-fits-all verdict on whether to deploy black-box algorithms, as model-free learning is far easier for some problems (Inventory) than others (Stopping).
Contains results from two papers:
- Sample Complexity of Posted Pricing for a Single Item, with Billy Jin, Thomas Kesselheim, Sahil Singla
https://proceedings.neurips.cc/paper_files/paper/2024/file/95dd2f8d32badf8959844ef0c2528d31-Paper-Conference.pdf - VC Theory for Inventory Policies, with Yaqi Xie, Linwei Xin
https://arxiv.org/pdf/2404.11509
Speaker Bio
Will Ma is the Roderick H. Cushman Associate Professor in the Decision, Risk, and Operations (DRO) division of Columbia Business School. Before joining DRO, he received his PhD in Operations Research from MIT in 2018, and spent 2018-2019 at Google Research. His research centers around online algorithms in e-commerce systems, both for supply-side problems like inventory and fulfillment, and revenue management problems like dynamic assortment optimization. He specializes in designing simple online algorithms with performance guarantees, that can be tuned to historical data. Will also has miscellaneous experience as a professional poker player, video-game startup founder, and karaoke bar pianist.