Notation for Machine Learning - BAAI

This proposal suggests a standard for commonly used mathematical notation for machine learning.

The field of machine learning is evolving rapidly in recent years. Communication between different researchers and research groups becomes increasingly important. A key challenge for communication arises from inconsistent notation usages among different papers. This proposal suggests a standard for commonly used mathematical notation for machine learning. In this first version, only some notation are mentioned and more notation are left to be done. This proposal will be regularly updated based on the progress of the field.

You can use this notation by downloading LaTeX macro package MLMath.sty tuned for the updates and you can turn to GitHub for more information.

Notation Table

See the full Guide for more

\(S=\{\mathbf{z}_i\}_{i=1}^n=\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n\)
Dataset
\(\mathcal{H}\)
function space
\(f_{\mathbf{\theta}}:\mathcal{X}\to \mathcal{Y}\)
hypothesis function
\(L_{S}(\mathbf{\theta}), L_{n}(\mathbf{\theta}), R_{n}(\mathbf{\theta}), R_{S}(\mathbf{\theta})\)
empirical risk or training loss
\(f(\mathbf{x};\mathbf{\theta})=\sum_{j=1}^{m} a_j \sigma (\mathbf{w}_j\cdot \mathbf{x} + b_j) \)
two-layer neural network
\({\rm Rad}_{n} (\mathcal{H})\)
Rademacher complexity
GD
gradient descent
SGD
stochastic gradient descent
\(B\)
a batch set
\(|B|\)
batch size
\(\eta\)
learning rate
\(\mathbf{\xi}\)
continuous frequency

Notation Table

See the full Guide for more

symbol	meaning	LATEX	simplied
x	input	\bm{x}	\vx
y	output, label	\bm{y}	\vy
d	input dimension	d
do	output dimension d_{\rm o}	d_{\rm o}
n	number of samples	n
X	instances domain (a set)	\mathcal{X}	\fX
Y	labels domain (a set)	\mathcal{Y}	\fY
Z	= X × Y example domain	\mathcal{Z}	\fZ
H	hypothesis space (a set)	\mathcal{H}	\fH
θ	a set of parameters	\bm{\theta}	\vtheta
fθ : X → Y	hypothesis function	\f_{\bm{\theta}}	f_{\vtheta}
f or f ∗ : X → Y	target function	f,f^*
ℓ : H × Z → R+	loss function	\ell
D	distribution of Z	\mathcal{D}	\fD
S = {zi}ni=1	= {(xi, yi)}ni=1 sample set
LS(θ), Ln(θ),	empirical risk or training loss
Rn(θ), RS(θ)	empirical risk or training loss
LD(θ), RD(θ)	population risk or expected loss
σ : R → R+	activation function	\sigma
wj	input weight	\bm{w}_j	\vw_j
aj	output weight	a_j
bj	bias term	b_j
f∑θ(x) or f(x; θ)	neural network	f_{\bm{\theta}}	f_{\vtheta}
∑mj=1 ajσ(wj · x + bj )	two-layer neural network
VCdim(H)	VC-dimension of H
Rad(H ◦ S), RadS(H)	Rademacher complexity of H on S
Radn(H)	Rademacher complexity over samples of size n
GD	gradient descent
SGD	stochastic gradient descent
B	a batch set	B
\|B\|	batch size	b
η	learning rate	\eta
k	discretized frequency	\bm{k}	\vk
ξ	continuous frequency	\bm{\xi}	\vxi
∗	convolution operation	*

This proposal suggests a standard for commonly used mathematical notation for machine learning.

Notation Table

Notation Table

And a quick video that explains it