| __pycache__ | ||
| .DS_Store | ||
| pde_rga_debug.ipynb | ||
| readme.md | ||
| requirements.txt | ||
| RGA_1d_exact_argmax.py | ||
| RGA_debug.py | ||
| RGA_exact_debug.ipynb | ||
| rga_tanh1d.py | ||
| rga_tanh_exp.ipynb | ||
RGA-PDE: Relaxed Greedy Algorithm for PDEs
This repository implements the Relaxed Greedy Algorithm (RGA) for the numerical solution of partial differential equations (PDEs). The implementation is inspired by the approach of J. W. Siegel et al. (2023).
The current focus is the 1D Poisson problem
u'' = f \quad \text{on } [0,\pi]
with Dirichlet boundary conditions. The boundary conditions are not enforced strongly, but instead via a boundary penalty term in the energy functional ("weakly imposed Dirichlet BC").
Typical right-hand sides are e.g.
f \equiv 1,
f(x) = \sin(x).
Goals of this project
- Empirical study of the convergence rate of RGA for PDEs.
- Influence of the dictionary (set of admissible atoms) on the constants in the theoretical convergence bounds.
- Understanding and explaining the plateaus observed in the error curves.
Installation & setup
To run the experiments locally, create a Python environment and install the dependencies from requirements.txt.
1. Clone the repository
git clone <repository-url>
cd RGA-PDE
2. Create and activate a virtual environment
# create venv
python -m venv .venv
# activate venv (macOS/Linux)
source .venv/bin/activate
# OR: activate venv (Windows)
.venv\Scripts\activate
3. Install dependencies
pip install -r requirements.txt
Theoretical background (short version)
The Relaxed Greedy Algorithm (RGA) is an iterative method for minimizing a smooth, convex energy functional
\mathcal{R} : H \to \mathbb{R}
over the convex hull \operatorname{conv}(\mathbb{D}) of a symmetric dictionary \mathbb{D} \subset H.
The algorithm starts with u_0 = 0. For each iteration n \ge 1, the update is (see Algorithm RGA):
u_n = (1 - \alpha_n) , u_{n-1} + M , \alpha_n , g_n.
Here:
Budget (M)
M is a fixed upper bound on the variation norm. In particular,
|u_n|_{\mathcal{K}_1} \le M \quad \text{for all } n.
Relaxation / learning rate (\alpha_n)
A typical choice is
\alpha_n = \min(1, 2/n).
This enforces a relaxed step rather than a full greedy jump to the new atom.
Greedy atom (g_n)
At each step, pick a single atom from the dictionary that best decreases the energy. In practice an approximate version of
g_n \approx \underset{g \in \mathbb{D}}{\arg\max} , \bigl| \langle g, \nabla \mathcal{R}(u_{n-1}) \rangle_H \bigr|
is solved numerically.
Intuition:
- Measure how the energy would change if adding a given atom.
- Choose the atom with the largest (absolute) directional derivative.
- Update in that direction, but only partially (via
\alpha_n).
Convergence rates
Let u_n be the RGA iterate after n steps, and let u^* be an exact PDE solution.
Under standard assumptions (smoothness, convexity, symmetric dictionary, bounded variation norm), one obtains:
Energy / optimization error
\mathcal{R}(u_n) - \inf_{|v|_{\mathcal{K}_1} \le M} \mathcal{R}(v) \le \mathcal{O}!\left( \frac{1}{n} \right).
$L^2$-error
Combining this with a Poincaré-type estimate (and standard elliptic theory for Poisson),
|u_n - u^*|_{L^2} \le \mathcal{O}!\left(n^{-1/2}\right).
So asymptotically, the approximation improves at rate n^{-1/2} in L^2.
Key implementation ideas
1. Dictionaries (\mathbb{D})
Unlike FEM, the algorithm does not use a fixed basis on a mesh. Instead, it constructs the solution as a convex combination of "atoms" drawn from a dictionary. Two dictionary types are currently implemented:
(a) Quadratic ReLU atoms
g(x) = \operatorname{ReLU}^2(w x + b) = (w x + b)_+^2.
- Implemented in
RGA_ld_exact_argmax.pyandRGA_debug.py. - Smooth except at one kink.
- Strong localization depending on
(w,b).
(b) Hyperbolic tangent atoms
g(x) = \tanh(w x + b).
- Implemented in
rga_tanh1d.py. - Globally smooth.
- Much better behavior near the domain boundaries in practice.
Why this matters:
- The dictionary determines which shapes the method can "mix" to approximate the PDE solution.
- It directly affects stability, convergence speed, and how quickly the method escapes error plateaus.
2. Argmax subproblem (atom selection)
Selecting g_n is the numerically most delicate step. Two solver strategies are implemented:
(a) Closed-form / exact argmax for quadratic ReLU
For the quadratic ReLU dictionary, define
c(b) = \langle g_b, \nabla \mathcal{R} \rangle, \quad g_b(x) = (w x + b)_+^2.
It can be shown that c(b) is piecewise quadratic in the offset parameter b. The exact solver in RGA_ld_exact_argmax.py:
- Builds all breakpoint intervals in
b, coming from the grid pointsx_iviab_i = -w x_i. - Evaluates all candidate parabola maxima and boundary values.
- Picks the global maximizer.
Result: a (numerically) exact argmax for ReLU$^2$ atoms.
(b) Numerical / fallback solver (grid + local refine)
For \tanh atoms (and as a fallback for ReLU$^2$):
- Coarse grid search over parameters
(w,b). - Optional L-BFGS-B refinement starting from the best coarse candidate.
This is implemented in RGA_debug.py and rga_tanh1d.py.
3. Convergence constants, boundary penalty & plateaus
The theoretical rate \mathcal{O}(1/n) hides constants, and those constants are everything in practice.
A generic upper bound for the optimization error after n steps is of the form
\text{error}(n) \lesssim \frac{32 (C M)^2 K}{n}.
K: smoothness constant of\mathcal{R}. For the penalized Dirichlet energy used here,K = 1.C = \sup_{g \in \mathbb{D}} |g|_H: maximal dictionary atom norm.M: the budget / variation norm bound.
Key observations from experiments:
-
Boundary penalty and
C(\delta). The weak enforcement of Dirichlet boundary conditions uses a penalty parameter\delta > 0. For the ReLU$^2$ dictionary, the effective constantCcan blow up asC(\delta) \sim \delta^{-1/2}.Intuition: small
\deltameans "very stiff" boundary penalty. Atoms that violate the boundary even slightly get huge energy weight near the boundary, which makes their norm in the penalized inner product extremely large. -
Huge prefactor. Because the prefactor
32 (C M)^2can become enormous, the asymptotic1/nregime may only be visible after millions of iterations. Before that, the measured error curve looks almost flat.\RightarrowThis explains the long error plateaus observed in experiments. -
\tanhimproves constants. The\tanhdictionary behaves much more tamely near the boundary. Empirically this leads to smaller effectiveC, more reasonable constants in the bound, and noticeably faster visible decay (less plateauing).
File overview (selected)
-
RGA_ld_exact_argmax.py- RGA loop with an exact argmax routine for quadratic ReLU$^2$ atoms.
- Breakpoint/interval logic for computing the maximizing shift
b.
-
RGA_debug.py- Debug / playground version.
- Can switch between exact and numerical argmax.
- Contains coarse grid search + L-BFGS-B refinement.
-
rga_tanh1d.py- RGA specialized to
\tanh(w x + b)atoms. - Uses numerical argmax.
- RGA specialized to
-
requirements.txt- Python dependencies for reproducing the experiments.
TL;DR / Why this matters
- RGA builds the PDE solution as a greedy convex combination of simple nonlinear atoms instead of using a fixed mesh basis.
- Theory promises energy decay like
1/nand $L^2$-error liken^{-1/2}. - In practice those rates can be hidden behind gigantic constants, especially when the boundary penalty is too stiff.
- Different atom dictionaries (ReLU$^2$ vs.
\tanh) dramatically change those constants and therefore the visible convergence.
Next steps / open questions
- Quantify the observed plateaus: how long until the asymptotic regime actually kicks in for realistic
\delta? - Better parameter search / argmax heuristics to escape local plateaus faster.
- Adaptive or data-driven choice of dictionary atoms.
- Extension from 1D Poisson to higher dimensions and to more general PDEs.