l2 grid. how to avoid the dimension?

average distortion
wrong lb
2025-08-10 16:22:16 +08:00 · 2025-07-29 13:52:49 +08:00 · 2025-07-24 13:43:18 +08:00 · 2025-07-23 18:56:19 +08:00 · 2025-07-22 14:17:07 +08:00 · 2025-07-22 00:17:32 +08:00
2 changed files with 80 additions and 13 deletions
--- a/algo.sty
+++ b/algo.sty
@@ -0,0 +1,11 @@
+\def\begin@lg{\begin{minipage}{1in}\begin{tabbing}
+        \quad\=\qquad\=\qquad\=\qquad\=\qquad\=\qquad\=\qquad\=\kill}
+\def\end@lg{\end{tabbing}\end{minipage}}
+
+\newenvironment{algorithm}
+{\begin{tabular}{|l|}\hline\begin@lg}
+{\end@lg\\\hline\end{tabular}}
+
+\newenvironment{algo}
+{\begin{center}\begin{algorithm}}
+{\end{algorithm}\end{center}}
--- a/distribution.tex
+++ b/distribution.tex
@@ -1,5 +1,7 @@
 \documentclass[12pt]{article}
 \usepackage{chao}
+\usepackage{algo}
+\usepackage[normalem]{ulem}

 \title{Outlier Embedding Notes}

@@ -16,20 +18,43 @@ For any metric space $(X,d)$ on $n$ points, one has
 For $\ell_2$ the lowerbound is still $\Omega(\log n)$
 \footnote{\url{https://web.stanford.edu/class/cs369m/cs369mlecture1.pdf}}.

-Recall that we want to find a $(O(k),(1+\e)c)$-outlier embedding into $\ell_2$ for any metric space $(X,d)$ which admits a $(k,c)$-outlier embedding into $\ell_2$. If we can do this deterministically, we actually find an embedding of the outlier points into $\ell_2$ with distortion $O(k)$, which contradicts the lowerbound. However, maybe we can do $O(k)$ via embedding into some distribution of $\ell_2$ metrics.
+Recall that we want to find a $(O(k),(1+\e)c)$-outlier embedding into $\ell_2$ for any metric space $(X,d)$ which admits a $(k,c)$-outlier embedding into $\ell_2$. If we can do this deterministically, we actually find an embedding of the outlier points into $\ell_2$ \sout{with distortion $O(k)$, which contradicts the lowerbound}. This is not true! The $\log k$ factor is required by SDP and only expansion bound is needed. We do not have to bound the contraction part. However, maybe we can do $O(k)$ via embedding into some distribution of $\ell_2$ metrics.

-Let $(X,d)$ be a finite metric space and let $\mathcal Y=\{ (Y_1,d_1),\ldots (Y_h,d_h) \}$ be a set of metric spaces. Let $\pi$ be a distribution of embeddings into $\mathcal Y$. The original metric space $(X,d)$ embeds into $\pi$ with distortion $D$ if there is an $r>0$ such that for all $x,y\in X$,
+\begin{definition}[Expected distortion]
+Let $(X,d)$ be the original metric space and let $\mathcal Y=\{ (Y_1,d_1),\ldots (Y_h,d_h) \}$ be a set of target spaces. Let $\pi$ be a distribution of embeddings into $\mathcal Y$. To be more precise, for each target space $(Y_i,d_i)$ we define an embedding $\alpha_i:X\to Y_i$ and define the probability of choosing this embedding to be $p_i$. The original metric space $(X,d)$ embeds into $\pi$ with distortion $D$ if there is an $r>0$ such that for all $x,y\in X$,
 \[r\leq \frac{\E_{i\from \pi} [d_i(\alpha_i(x),\alpha_i(y))]}{d(x,y)}\leq Dr.\]
+\end{definition}

-SODA23 paper also embeds $(X,d)$ into distribution. We call this kind of embeddings stochastic embedding.
+Note that if we compute the minimum $D$ for all $x,y$ pair and take the average, the resulting value is called the average distortion.\footnote{\url{https://www.cs.huji.ac.il/w~ittaia/papers/ABN-STOC06.pdf}} There is an embedding into $\ell_p$ with constant average distortion for arbitrary metric spaces, while maintaining the same worst case bound provided by Bourgain's theorem.
+
+The outlier paper (SODA23) also embeds $(X,d)$ into distribution. We call this kind of embeddings stochastic embedding.
+
+\begin{lemma}
+Let $\pi$ be a stochastic embedding into $\ell_p$ with expected expansion bound $\E_{i\from \pi}\|\alpha_i(x)-\alpha_i(y)\|_p\leq  c_{\E}d(x,y)$. Then there is a deterministic embedding into  $\ell_p$ with the same expansion bound.
+\end{lemma}
+\begin{proof}
+We define a new averaged embedding $\alpha^*(x)=\sum_{i\from \pi} \alpha_i(x) p_i$. Consider the expansion bound for $\alpha^*$.
+\begin{equation*}
+\begin{aligned}
+\| \alpha^*(x)- \alpha^*(y) \|_p & = \left\| \sum_{i\from \pi}  p_i ( \alpha_i(x) - \alpha_i(y) ) \right\|_p\\
+    &\leq \sum_{i\from \pi} \| p_i ( \alpha_i(x) - \alpha_i(y) ) \|_p\\
+    &= \sum_{i\from \pi}p_i  \| ( \alpha_i(x) - \alpha_i(y) ) \|_p\\
+    &\leq c_{\E} d(x,y)
+\end{aligned}
+\end{equation*}
+\end{proof}
+
+Note that one cannot derive contraction bound for $\alpha^*$ from the stochastic embedding. So the distortion may not be the same.

 \paragraph{Example: Random Trees}
 Consider the problem of embedding some finite metric into a tree metric. We can get an $O(n)$ lowerbound via the unit edge length cycle $C_n$. However, if embedding into distortions is allowed, we can do $O(\log n)$.

 \begin{theorem}[Bartal]
-Let $(X,d)$ be a metric space on $n$ points with diameter $\Delta$, let $\mathcal D T$ be the set of tree metrics that dominate $d$, there is a distribution $\pi$ on $\mathcal D T$ such that $(X,d)$ embeds into $\pi$ with distortion $O(\log n)$.
+Let $(X,d)$ be a metric space on $n$ points, let $\mathcal D T$ be the set of tree metrics that dominate $d$, there is a distribution $\pi$ on $\mathcal D T$ such that $(X,d)$ embeds into $\pi$ with distortion $O(\log n)$.
 \end{theorem}

+Is there any other known result on expected distortion of embeddings besides Bartal's theorem?
+
 % A kind of embedding problems which are closely related to outlier embeddings is Ramsey type embedding. Let $(X,d_X)$ be the original metric space and let $(Y,d_Y)$ be the target space. Given a fixed distortion $c$, Ramsey type embedding asks for the largest subset $Z$ of $X$ such that $(Z,d_X)$ embeds into $(Y,d_Y)$ with distortion at most $c$. This is the same as computing the smallest outlier set.

 \section{Stochastic Embedding into \texorpdfstring{$\ell_2$}{l2}}
@@ -39,11 +64,13 @@ We first ignore the outlier condition and see if stochastic embeddings break the
 For any metric space $(X,d)$ and for any $p$, there is an embedding of $(X,d)$ into $\ell_p^{O(\log^2 n)}$ with distortion $O(\log n)$.
 \end{theorem}

-Bourgain develops an algorithm that finds a desired embedding with probability at least $1/2$.\footnote{\url{https://home.ttic.edu/~harry/teaching/pdf/lecture3.pdf}} For the $\ell_2$ case, the embedding has the following bounds:
-\begin{itemize}
-\item[Expansion] $\|f(x)-f(y)\|_2\leq O(\log n) d(x,y)$
-\item[Contraction] $\|f(x)-f(y)\|_2 \geq \frac{d(x,y)}{O(1)}$
-\end{itemize}
+Bourgain develops a randomized algorithm that finds a desired embedding.\footnote{The expansion bound always holds. The contraction bound holds with probability at least $1/2$. See \url{https://home.ttic.edu/~harry/teaching/pdf/lecture3.pdf}}
+Can we get better expected distortion by repeating the algorithm and uniformly selecting an embedding?
+For the $\ell_2$ case, the embedding has the following bounds:
+\begin{enumerate}
+\item Expansion. $\|f(x)-f(y)\|_2\leq O(\log n) d(x,y)$
+\item Contraction. $\|f(x)-f(y)\|_2 \geq \frac{d(x,y)}{O(1)}$
+\end{enumerate}

 The contraction bound is almost tight. Let $K$ be the dimension of the target space. For the expansion bound, we have

@@ -56,10 +83,39 @@ The contraction bound is almost tight. Let $K$ be the dimension of the target sp
 \end{aligned}
 \end{equation*}

-One thing we can try is to tighten the second line. 
-Recall that for each dimension $i$ a random subset $S_i\subset X$ is selected and the value of $f_i(x)$ is $\min_{s\in S_i} d(x,s)$.
-We want to show that for any fixed $x,y\in X$ and any dimension $i$ the event that distance $|f_i(x)-f_i(y)|^2$ is much smaller than $d(x,y)^2$ happends with high probability.
+One thing we can try is to tighten the second line.

-Now consider a subset $S_i$ by sampling each node in $X$ iid with probability $2^{-i}$.
+\begin{algo}
+\underline{Bourgain's construction}:\\
+$m=576\log n$\\
+for $j=1$ to $\log n$:\\
+\quad for $i=1$ to $m$:\\
+\quad \quad choose set $S_{ij}$ by sampling each node in $X$ independently with probability $2^{-j}$\\
+\quad \quad $f_{ij}(x)=\min_{s\in S_{ij}} d(x,s)$\\
+$f(x)=\bigoplus_{j=1}^{\log n} \bigoplus_{i=1}^m f_{ij}(x)$ for all $x\in X$.
+\end{algo}
+
+% Recall that for each dimension $i$ a random subset $S_i\subset X$ is selected and the value of $f_i(x)$ is $\min_{s\in S_i} d(x,s)$.
+We want to show that for any fixed $x,y\in X$ and $j$,
+\[
+\Pr[|f_{ij}(x)-f_{ij}(y)|\leq \frac{d(x,y)}{\polylog n}]\geq ???
+\]
+
+One can see that our desired event does not happen with high probability for any pair of $x,y$. Let the original metric space be a line metric with $n$ points. $x,y$ locate on two endpoints of an interval and the rest $n-2$ points locate on the middle of $xy$. Then our metric in the target space $|f_{ij}(x)-f_{ij}(y)|$ is a $\polylog n$ factor smaller than $d(x,y)$ if and only if both $x$ and $y$ are selected in $S_{ij}$, which happens with probability $4^{-j}$. This example shows that Bourgain's construction is tight up to a constant factor for some metric space.
+
+\section{Grid}
+
+Recall that we need an algorithm that outputs an embedding which extends a $(k,c)$-outlier embedding into $\ell_2$ and we want the extended embedding to have a good (expected) expansion bound.
+
+\begin{conjecture}\label{conj:expansion}
+Let $(X,d)$ be a metric space such that $|X|=n$ and $\alpha: X\setminus K \to \R^d$ be a $(k,c)$-outlier embedding of $(X,d)$ into $\ell_2^{d}$, where $K\subset X$ is the outlier set. Then there exist an embedding $\beta: X\to \R^d$ such that $\beta$ completes $\alpha$ and has expansion bound
+\[
+\max_{x,y\in X} \frac{\norm{\beta(x)-\beta(y)}_2}{d(x,y)}\leq O(c\sqrt{\log k}).
+\]
+\end{conjecture}
+
+In their bi-criteria approximation the dimension $d$ is not important and therefore is considered as a fixed parameter. \autoref{conj:expansion} provides more tools than theorem 2.6, i.e. we know the coordinates of non-outlier points in the embedding $\beta$ and we can use coordinates in $\R^d$ instead of simply mapping non-outlier points to outliers.
+
+A common and powerful method is to use grid. We divide $\R^d$ into identical hypercubes of some sidelength $s$ and working with grid cells instead of points. However, this method often involves the dimension $d$, which is not desirable...

 \end{document}
Author	SHA1	Message	Date
Yu Cong	9e8887eb4f	l2 grid. how to avoid the dimension? All checks were successful Build LaTeX on Host / build (push) Successful in 4s Details	2025-08-10 16:22:16 +08:00
Yu Cong	6caed7bac6	average distortion All checks were successful Build LaTeX on Host / build (push) Successful in 8s Details	2025-07-29 13:52:49 +08:00
Yu Cong	fc147b6ef9	wrong lb All checks were successful Build LaTeX on Host / build (push) Successful in 9s Details	2025-07-24 13:43:18 +08:00
Yu Cong	48cac1c9d2	impossible via bourgain's construction All checks were successful Build LaTeX on Host / build (push) Successful in 7s Details	2025-07-23 18:56:19 +08:00
Yu Cong	8627ece41a	better definition All checks were successful Build LaTeX on Host / build (push) Successful in 7s Details	2025-07-22 14:17:07 +08:00
Yu Cong	5502e7d20d	18 log n lemma All checks were successful Build LaTeX on Host / build (push) Successful in 10s Details	2025-07-22 00:17:32 +08:00