Solution to the Paulsen problem(via operator scaling)

今年STOC上的一篇会议论文，这是参照作者Lap Chi Lau在IAS上演讲所做的记录。

Part I:Paulsen problem

Motivation from frame theory

Frame

Frame: a collection of vectors \(u_1,u_2,...,u_n \in R^d\) that spans \(R^d.\)

Equal norm: if \(\|u_i\|_2 = \|u_j\|_2\)for all \(i, j\) which is means same length.

Parseval: if \(\sum\limits_{i=1}^nu_iu_i^T = I_d\).

从另一个角度来解释: \[ C\cdot\|x\|^2\le\sum_{j\in\rm II}|\langle x,f_j\rangle|^2\le D\cdot\|x\|^2 \]

存在一组基，对于任意一个向量x，这组基向x做投影的长度的平方和，介于x的模长的平方的常数倍（大于零）之间。

An equal norm Parseval frame is an overcomplete basis: \[ \sum\limits_{i=1}^n\langle x,u_i\rangle u_i = x\quad\forall x \in R^d \]

任何向量x可以由一frame（n个d维向量）来表示。

当n=d时，这是一组标准基，当n远大于d时，称为“overcomplete basis”

思考:

overcomplete basis：这组基是过完备的，当你拿掉一个vector时，仍然可以通过改变系数来表示出R^d中的任意vector。

The application of equal norm Parseval frame : 比如用在信号处理上，信息压缩；frame中的每个vector的系数记录了某些信息，当某个系数丢失了，我们仍然可以得到原本的信息。

It has applications in signal processing, communication theory, and quantum information theory.

Motivation

Equal norm Parseval frames are difficult to construct with only a few known algebraic constructions.

examples:[Holmes-Paulsen 04] Grassmaniann frames: equal norm Parseval frames with minimal \(\max\limits_{i,j}{\langle u_i,u_j\rangle}^2\) which are even more difficult to construct.

这里是指，在所有equal norm parseval frames中,其最大内积是所有frames中最小的，又被称作:Grassmaniann frams.

It is easier to construct "approximate" equal norm Parseval frames(e.g. random unit vectors, optimal packing of lines).

Question:Can we turn an "approximate" frame into an equal norm Parseval frame by just moving the vectors "slightly"?

总结：

很难构建equal norm Parseval frame，但很容易构建近似equal norm Parseval frame.

目的：

我们通过近似equal norm frame稍微地移动其中的vectors 使其变成equal norm Parseval frame。

The Paulsen Problem

what is the best function\(f(n,d,\varepsilon)\) such that for any \(u_1,...,u_n\in R^d\) with

\((1-\varepsilon)\frac{d}{n}\le\|u_i\|_2^2\le(1+\varepsilon)\frac{d}{n}\quad\forall1\le i\le n\) (\(\varepsilon\) —nearly equal norm)

\((1-\varepsilon)I_d\preceq\sum\limits_{i=1}^nu_iu_i^T\preceq(1+\varepsilon)I_d\) (\(\varepsilon\)—nearly Parseval)

右边减左边是正定的

there exist \(v_1,..v_n\in R^d\) with

\(||v_i||_2^2=\frac{d}{n}\quad\forall1\le i\le n\)¹ and \(\sum_{i=1}^nv_iv_i^T=I_d\)

such that

\[ dist^2(u_i,v_i)=\sum\limits_{i=1}^n\|u_i-v_i\|_2^2\le f(n,d,\varepsilon)? \]

个人理解：

从“approximate”到equal norm Parseval frame，我们使这个变化过程中向量的改变一定不超过f，所以我们要想办法确定f的上界。

Previous Work

[Bodmann-Casazza,10] \(f(d,n,\varepsilon)= O(d^{42}n^{18}\varepsilon^2)\) when \(gcd(d,n)=1.\)²

从数学的角度讲：

令f和g为从整数集或实数集到实数集的函数。如果存在常数C和k使得只要当x>k时就有 \[ |f(x)|\le C|g(x)| \] 我们就说f(x)是O(g(x))的。

直觉上，当x无限增长时f(x)的增长率慢于g(x)的某个固定倍数

说白了，O只是一个对于函数的上界的估计，最坏情况的估计。

dynamical system improves on equal norm while keeping Parseval.

[Casazza-Fickus-Mixon,12] \(f(d,n,\varepsilon)= O(d^{20/7}n^{2/7}\varepsilon^{2/7})\)

gradient descent improves on Parseval while keeping equal norm.

There are examples showing that \(f(d,n,\varepsilon)\ge d\varepsilon.\)

Open question:Can the bound be independent of n?

函数f的界限是否可以独立与n？

Main Result

Theorem. \(f(d,n,\varepsilon) = O(d^{13/2}\varepsilon)\)

The proof has two parts.

First, we define a dynamical system based on operator scaling,

and show that \(f(d,n,\varepsilon) = O(d^2n\varepsilon)\)

Then,we do a smoothed analysis to remove the dependency on \(n.\)

Part II:Continuous operator scaling

Operator scaling, alternating algorithm, reduction
Analysis of dynamical system

Alternating Algorithm

How to move an approximate to satisfy the two conditions exactly?

The problem is difficult with two conditions. It is easy with one condition.

To satisfy the equal norm condition, we just rescale the vectors.³
To satisfy the Parseval condition, we can set

\[u_i\leftarrow(\sum\limits_{i=1}^nu_iu_i^T)^{-\frac{1}{2}}u_i\] so that \[\sum\limits_{i=1}^nu_iu_i^T=I_d\]

求矩阵的-1/2次方：前提矩阵非负定

\(u_iu_i^T\) 实对称可相似对角化 \((u_iu_i^T)^{-\frac12}=(P^{-1}\Lambda^{\frac12} P)^{-1}\)

A natural algorithm is to alternate between these two steps and hope that it will converge to a solution satisfying both conditions.

总结：

这里提出一种交替算法，进行一步equal norm变换后；再进行一次Parseval变换；交替进行，直到收敛，得到的结果便可以满足equal norm和Parseval两个条件。

First Idea

This is a special case of the alternating algorithm for operator scaling, which was analyzed in [Gurvits 04, Garg-Gurvits-Oliveira-Wigderson 16].

Our starting point is to bound the distance by the total movement in the alternating algorithm (assuming it converges):

研究出发点：

如果交替算法得到的结果收敛的话，研究变换过程。

向量的“移动”过程如下图

Operator Scaling

An operator is a collection of matrices \(U_1,...,U_k\in R^{m\times n}\) [Gurvits 04]

Given \(U_1,...,U_k\in R^{m\times n}\),we would like to find \(L \in R^{m\times m}\),and \(R \in R^{n\times n}\quad\)such that if we define \(V_i = LU_iR\quad for\quad 1\leq k\leq n\quad\)then \[\sum\limits_{i=1}^kV_iV_i^T=cnI_m\quad and\quad\sum\limits_{i=1}^kV_i^TV_i=cmI_n\]

for some constant c.

We say an operator satisfying the two conditions doubly balanced.

when c = 1/n called doubly stochastic

总结：

operator scaling 需要找到满足上面条件的L 和R.

那么，该如何找呢？

Applications

Matrix Scaling; Frame Scaling; PSD Scaling; Operator Scaling

Alternating Algorithm

Repeat the following tow steps [Gurivits 04]

To satisfy the condition \(\sum\limits_{i=1}^kU_iU_i^T=I_m\) , we set

\[ U_i\leftarrow(\sum\limits_{j=1}^kU_jU_j^T)^{-\frac12}U_i \]

To satisfy the condition \(\sum\limits_{i=1}^kU_i^TU_i=I_n\) , we set

\[ U_i\leftarrow U_i(\sum\limits_{j=1}^kU_j^TU_j)^{-\frac12} \]

A natural algorithm is to alternate between these two steps and hope that it will converge to a solution satisfying both conditions.

总结：

通过交替算法，我们最后可以使得一个组矩阵 U_i最后收敛，满足上面两种conditions，这样，我们就得到了operator scaling 中的V_i.

这将operator scaling 和Paseval problem 联系了起来。

Reduction

A simple reduction from frame scaling to operator scaling:

\[u_i\in R^d\to U_i\equiv\begin{pmatrix} |&|&|\\0&u_i&0\\|&|&| \end{pmatrix}\in R^{d\times n}\] one column being import, other columns are zero.

The condition \(\sum\limits_{i=1}^nU_iU_i^T=I_d\) is the Parseval condition \(\sum\limits_{i=1}^nv_iv_i^T=I_d\)
The condition \(\sum\limits_{i=1}^nU_i^TU_i=\frac mnI_n\) is the equal norm condition

\[ \begin{pmatrix}\|u_1\|_2^2&...&0\\ \vdots&\ddots&\vdots\\0&...&\|u_n\|_2^2\end{pmatrix}=\frac mnI_n. \]

So we focus on this more general setting in this part of the talk.

总结：

问题归约技术 Paulsen problem --> operator scaling --> matrix sxaling

通过把向量构造成矩阵之后，我们发现，寻找equal norm Paseval frame 和 operator scaling 是同一个问题。

The Operator Paulsen Problem

What is the best function \(g(m,n,k,\varepsilon)\) s.t. for any \(U_1,...,U_k\in R^{m\times n}\) with

\[ (1-\varepsilon)I_m\preceq\sum\limits_{i=1}^kU_iU_i^T\preceq(1+\varepsilon)I_m,(1-\varepsilon)\frac mnI_n\preceq\sum\limits_{i=1}^kU_i^TU_i\preceq(1+\varepsilon)\frac mnI_n \]

there exits\(V_1,...,V_k\in R^{m\times n}\) with

\[\sum\limits_{i=1}^kV_iV_i^T=I_m\] and \(\sum\limits_{i=1}^kV_i^TV_i=\frac mnI_n\)

doubly stochastic

such that

\[ \sum\limits_{i=1}^k\|U_i-V_i\|_F^2\le g(m,n,k,\varepsilon)? \] show \(\leq m^2n\varepsilon\) \[ U^{(t+2)}_i=U^{(t+1)}_i(R^{(t+1)})^{-\frac12}=(L^{(t)})^{-\frac12}U^{(t)}_i(R^{(t+1)})^{-\frac12} \]

总结：

把doubly balanced operator和equal norm Parseval frame 联系起来之后，我们可以把Parseval problem换一种表述方式。

Issues in First Idea

There are examples which do not converge:

\[ \begin{pmatrix}1\\0\end{pmatrix},\begin{pmatrix}1\\0\end{pmatrix},\begin{pmatrix}0\\1\end{pmatrix}\Leftrightarrow\begin{pmatrix}\sqrt2/2\\0\end{pmatrix},\begin{pmatrix}\sqrt2/2\\0\end{pmatrix},\begin{pmatrix}0\\1\end{pmatrix} \] Even if it converges, the path could zig-zag a lot and the total movement is much larger than the distance.

思考：

上述例子称为：not scalable

zig-zag和larger distance意味着vectors的变化肯定不会是slightly.

Continuous Operator Scaling

Dynamical System:Do both steps simultaneously and continuously. \[ \frac {d}{dt}U_i=(sI_m-m\sum\limits_{j=1}^kU_iU_j^T)U_i+U_i(sI_n-n\sum\limits_{j=1}^{k}U_j^TU_j) \] where \(s=\sum_{i=1}^{k}\|U_i\|_F^2\) ⁴ is the size of the operator.

\(s=tr(\sum\limits_{i=1}^kU_iU_i^T)=tr(\sum\limits_{i=1}^kU_i^TU_i)\)

Discrete: \(U_i\leftarrow(\sum\limits_{j=1}^kU_jU_j^T)^{-1/2}U_i\)

\(\frac ms\sum\limits_{j=1}^kU_jU_j^T\approx I\)

Continuous: \(U_i\leftarrow(\frac ms\sum\limits_{j=1}^kU_jU_j^T)^{-\frac12dt}U_i\)

利用\((I-X)^{-dt}\approx I+Xdt\) ⁵得到 \[ U_i\leftarrow(\sum\limits_{j=1}^kU_jU_j^T)^{-dt}U_i\implies\frac {d}{dt}U_i=(I_m-\sum\limits_{j=1}^kU_jU_j^T)U_i \]

加上上标t来代表U_i随时间t的变化。

Total Movement

We again bound the final distance by the total movement (the path length).

\[ (\sum\limits_{i=1}^k\|U_i^{(\infty)}-U_i^{(0)}\|_F^2)^{\frac12}=(\sum\limits_{i=1}^k\|\int_0^{\infty}\frac d{dt}U_i^{(t)}dt\|_F^2)^{\frac12}\le\sum\limits_{i=1}^k\int_0^{\infty}(\|\frac d{dt}U_i^{(t)}\|_F^2)^{\frac12}dt \] 通过 triangle inequality⁶ 得到不等式

先求积分再求范数小于等于先求范数再求积分

Error Measure

[Gurvits 04] \[ \Delta=\frac1m\|sI_m-m\sum\limits_{j=1}^kU_jU_j^T\|_F^2+\frac1n\|sI_n-n\sum\limits_{j=1}^kU_j^TU_j\|_F^2 \]

Intuition: In the Paulsen problem, \[ (1-\varepsilon)I_m\preceq\sum\limits_{i=1}^kU_iU_i^T\preceq(1+\varepsilon)I_m\quad\Leftrightarrow\quad \max\limits_i|\lambda_i-1|\le\varepsilon \]

\(\varepsilon\quad is\quad L_\infty\) bound on eigenvalues

从矩阵特征值的角度来看：

\(\lambda\)介于\(1-\varepsilon\quad and\quad 1+\varepsilon\) 之间

\(\varepsilon\geq \max\) 属于 L_infinity norm

\[ \|I_m-\sum\limits_{j=1}^kU_jU_j^T\|_F^2\le\Delta\quad \Leftrightarrow\quad\sum\limits_{i=1}^m(\lambda_i-1)^2\le\Delta \]

\(\Delta\quad is\quad L_2\) bound on eigenvalues

在衡量误差上，我不使用Paulson problem的\(\varepsilon\) 而使用\(\Delta\) :

\(\Delta\) 属于L₂ norm.

\(\|A\|_F^2=tr(A^*A)\) 其中A^*^ 是A的共轭转置矩阵。

\(tr(A) = \lambda_1+\lambda_2+...\lambda_n=\sum\limits_{i=1}^n\lambda_i\)

Easier to work with 2-norm.

\(\Delta\) is zero if and only if the operator is doubly balanced.
Can show that \(\Delta\le m^2\varepsilon^2\).
Focus on proving the total movement is \(\le mn\sqrt{\Delta}\le m^2n\varepsilon.\)

The dynamical system is moving in the direcrtion that minimizes \(\Delta\).

Gradient flow.

dynamical system ：

目的是为了得到doubly balanced，我们要使得Delta为0，所以

类似一个gradient decent 的过程（个人理解），每次移动的方向是使得Delta变小的方向。

Convergence

We find some nice identities to analyze the convergence.

Lemma 1. \(\frac d{dt}s^{(t)}=-2\Delta^{(t)}.\)

s随时间t的变化率是负的，说明s随时间增长而不断减少。

Lemma 2. \(\frac d{dt}\Delta^{(t)}=-4\sum\limits_{i=1}^k\|\frac d{dt}U_i^{(t)}\|_F^2\)

Delta随时间的变化率也是负的，说明Delta也随时间增长而降低。

并且local movement more，Delta decreasing more.

Claim. The dynamical system converges to a doubly balanced operator.

Remark. For an operator that is not scalable, its size will shrink to zero.

当Delta降低到0时，s不会再减少，即the size of operator不会再变化，这时我们得到的operator满足 doubly balanced.

lemma 3. \(cap(u^{t})\) is unchanged over time

lemma 4. \(cap(u^{t})\) lower bound upper bound: \[ s^{(T)}\geq cap(u^{T})=cap(u^{(t)})\ge s^{(t)}-mn\sqrt{\Delta^{(t)}} \]

Local Movement

\[ dist(u^{\infin},u^{0})=(\sum\limits_{i=1}^k\|U_i^{(\infty)}-U_i^{(0)}\|_F^2)^{\frac12}\le\int_0^{\infty}(\sum\limits_{i=1}^k\|\frac d{dt}U_i^{(t)}\|_F^2)^{\frac12}dt\\=\int_0^\infty\sqrt{-\frac d{dt}\Delta^{(t)}}dt \]

Half time. The first time T so that \(\Delta^{(T)}=\Delta^{(0)}/2.\)Then use geometric sum. \[ \sum\limits_{i=1}^k\|U_i^{(\infty)}-U_i^{(0)}\|_F^2\le(\int_0^\infty\sqrt{-\frac d{dt}\Delta^{(t)}}dt)^2\le4(\int_0^T\sqrt{-\frac d{dt}\Delta^{(t)}}dt)^2 \]

use Cauchy Schwarz inequality

\[ (\int_0^T\sqrt{-\frac d{dt}\Delta^{(t)}}dt)^2\le(\int_0^T-\frac d{dt}\Delta^{(t)}dt)(\int_0^T1dt)\le T\Delta^{(0)} \]

So it remains to bound the half time. ### Part III: Smoothed analysis

Proof outline,capacity lower bound

Capacity and Total Movement

Remark: Smoothed analysis only works in the frame setting, not (yet) in the operator setting.

In Part II, we proved \(f(d,n)\le dn\sqrt{\Delta}.\)

In Part III, we proved that \(f(d,n,\Delta)\le d^c\sqrt{\Delta}\) in "perturbed" instances

Intuition: operators with small capacity are rare.

Idea: perturb an operator,and apply the dynamical system

Plan

movement in dynamical system\(\le f(d,n,\Delta\tilde{(U)} )\)

1.Upper bound the perturbation movement

2.Error won't increase too much

3.Improved capacity in perturbed instances

Perturbation Process

Reduction to Matrix Capacity

New Method in Capacity Lower Bound

capacity lower bound ——> convergence of \(\Delta\)

\[ -\frac d{dt}\Delta\ge \mu\Delta \Rightarrow cap\ge s-\frac{\Delta}{\mu} \]

Pseudorandom Propert

\[ -\frac d{dt}\Delta\ge{\sigma}^2n\Delta\Rightarrow cap\geq s-\frac{\Delta}{\sigma^2n} \]

The proof is based on a combinatorial argument

把sigma设置成n分之XXX就可以把n去掉了

Big Picture

matrix capacity lower bound

frame capacity lower bound ---> total movement bound

matrix convergence of Delta ---> capacity lower bound

frame capacity lower bound ---> total movement bound

通过equal nrom 和Parseval 的条件证明，即可得↩︎
d与n的最大公约数为1↩︎
rescale: 向量单位化：\(\frac{u_i}{|u_i|}\)↩︎
矩阵Frobenius范数²: 对应元素平方和↩︎
泰勒展开 \((1+x)^{\alpha}=1+\alpha x+...+\frac{\complement_{\alpha}^n}{n!}x^n\)↩︎
三角不等式↩︎