Main

Errata

Please email us if you find any errors in the book. We will list known errata on this page.

Chapter 1: Data Mining and Analysis

• p4, Section 1.3, line 13: "as linear combination"should be "as a linear combination"
• p9, Example 1.3, 3rd line from end: "$$(153)^{1/3}$$" should be "$$(152)^{1/3}$$"
• p9, Example 1.3, last line: "$$(4^3 + (-1)^3)^{1/3} = (63)^{1/3} = 3.98$$" should be "$$(4^3 + |-1|^3)^{1/3} = (65)^{1/3} = 4.02$$"
• p24, Section 1.4.3, last line of subsection Univariate Sample: "where $$f_\mathbf{X}$$ is the probability mass or density function for $$\mathbf{X}$$" should be "where $$f_X$$ is the probability mass or density function for $$X$$"
• p30, Section 1.7, Q1: "in (1.5)" should be "in Eq. (1.5)"

Chapter 2: Numeric Attributes

• p34, Equation (2.2): "$$\hat{F}(x) \ge q$$" should be "$$F(x) \ge q$$"
• p34, Line after Equation (2.2):
"That is, the inverse CDF gives the least value of $$X$$, for which $$q$$ fraction of the values are higher, and $$1 − q$$ fraction of the values are lower."
should be
"That is, the inverse CDF gives the least value of $$X$$, for which $$q$$ fraction of the values are lower, and $$1 − q$$ fraction of the values are higher."
• p53, Example 2.6, line 1: "... range for $${\tt Income}$$ is $$2700-300=2400$$" should be "... range for $${\tt Income}$$ is $$6000-300=5700$$"
• p55, In Eq (2.32): "$$P(-k \le z \le k) = P\bigl(0 \le t \le k/\sqrt{2}\bigr)$$" should be "$$P(-k \le z \le k) = 2 \cdot P\bigl(0 \le t \le k/\sqrt{2}\bigr)$$"
• p58, Total and Generalized Variance, Line 2: "...product of its eigenvectors" should be "...product of its eigenvalues"
• p58, two lines above Example 2.8: "$$tr(\Lambda)$$" should be "$$tr(\mathbf{\Lambda})$$"
• p61, Q3: "$$mu$$" should be "$$\mu$$" so that it reads

$$\sum_{i=1}^n (x_i - \mu)^2 = n(\hat{\mu} - \mu)^2 + \sum_{i=1}^n (x_i - \hat{\mu})^2$$

Chapter 3: Categorical Attributes

• p81, Table 3.6, Attribute value for $$X_2$$: "$${\tt Short} ( a_{23})$$" should be "$${\tt Long} ( a_{23})$$"

Chapter 4: Graph Data

• p103, 2 lines above Eq (4.3): "$$\gamma_{jk} = 0$$" should be "$$\gamma_{jk}(v_i) = 0$$"
• p103, Eq (4.3): "$$\gamma_{jk}$$" should be "$$\gamma_{jk}(v_i)$$"
• p103, Example 4.5, last line: "$$\gamma_{jk} > 0$$" should be "$$\gamma_{jk}(v_5) > 0$$"
• p104, Example 4.5:
$$c(v_5) = \gamma_{18} + \gamma_{24} + \gamma_{27} + \gamma_{28} + \gamma_{38} + \gamma_{46} + \gamma_{48} + \gamma_{67} + \gamma_{68}$$
should be
$$c(v_5) = \gamma_{18}(v_5) + \gamma_{24}(v_5) + \gamma_{27}(v_5) + \gamma_{28}(v_5) + \gamma_{38}(v_5) + \gamma_{46}(v_5) + \gamma_{48}(v_5) + \gamma_{67}(v_5) + \gamma_{68}(v_5)$$
• p107: $$\mathbf{p}_1 = \frac{1}{2} \pmatrix{1\\ 1\\ 2\\ 1\\ 2}$$ should be $$\mathbf{p}_1 = \frac{1}{2} \pmatrix{1\\ 2\\ 2\\ 1\\ 2}$$
• p127, 4th Line after Eq (4.22): "initial $$n_0$$ edges" should be "initial $$n_0$$ nodes"

Chapter 5: Kernel Methods

• p138, Example 5.4:

$$\mathbf{\mu}_\phi = \sum_{i=1}^5 \phi(\mathbf{x}_i) = \sum_{i=1}^5 \mathbf{x}_i$$

should be

$$\mathbf{\mu}_\phi = \frac{1}{5}\sum_{i=1}^5 \phi(\mathbf{x}_i) = \frac{1}{5} \sum_{i=1}^5 \mathbf{x}_i$$

• p140, 7th Line after Eq (5.3): "$$\sum_{i=1}^{m_a} \sum_{j=1}^{m_a} \alpha_i \alpha_{\!j} K(\mathbf{x}_i, \mathbf{x})$$" should be "$$\sum_{i=1}^{m_a} \sum_{j=1}^{m_a} \alpha_i \alpha_{\!j} K(\mathbf{x}_i, \mathbf{x}_j)$$"
• p141, 3rd line and 10th Line before Sec 5.1.2: There is an extra left bracket in definition of $$\phi(\mathbf{x})$$, that is,
"$$\big( ( K(\mathbf{x}_1, \mathbf{x}), ...$$" should be "$$\big( K(\mathbf{x}_1, \mathbf{x}), ...$$"
• p144, 2nd line: "$$\int a(\mathbf{x})^2\; d\mathbf{x} < 0$$" should be "$$\int a(\mathbf{x})^2\; d\mathbf{x} < \infty$$"
• p144, last line: "$$\sum_{k=1}^q$$" should be "$$\sum_{k=0}^q$$"
• p156, Section 5.4.2: all occurrences of "path/paths" should be "walk/walks"

Chapter 6: High-dimensional Data

• p164: In the definitions of the hyperball and and hypersphere
"$$\mathbf{x} = (x_1, x_2, \ldots, x_d)$$" should be "$$\mathbf{x} = (x_1, x_2, \ldots, x_d)^T$$"
• p171: "$$\mathbf{0}_d = (0_1,0_2,\ldots,0_d)$$" should be "$$\mathbf{0}_d = (0_1,0_2,\ldots,0_d)^T$$"
• p172, Section 6.6, 1st Line after Eq. (6.11):
$$\mu$$ in equation "$$\mu=\mathbf{0}_d$$" should be in bold.
• p178, section "Volume in d dimensions":
"$$x_1 = r \cos\theta_1\cos\theta_2 \cos\theta_3 = r c_2 c_2 c_3$$" should be "$$x_1 = r \cos\theta_1\cos\theta_2 \cos\theta_3 = r c_1 c_2 c_3$$"
"$$x_3 = r \cos\theta_1\sin\theta_2 = r c_1 s_1$$" should be "$$x_3 = r \cos\theta_1\sin\theta_2 = r c_1 s_2$$"
• p178, Equation for $$J(\theta_1, \theta_2, \theta_3)$$, Entry in first row, fourth column: "$$r c_1 c_2 s_3$$"should be "$$-r c_1 c_2 s_3$$"
• p207, line 3, Alg 7.2: "$$\eta_1, \eta_2, ..., \eta_d$$" should be "$$\eta_1, \eta_2, ..., \eta_n$$"

Chapter 7: Dimensionality Reduction

• p186, line 1: "$$\mathbf{a}_r$$ is vector" should be "$$\mathbf{a}_r$$ is a vector"
• p207, line 3, Alg 7.2: "$$\eta_1, \eta_2, ..., \eta_d$$" should be "$$\eta_1, \eta_2, ..., \eta_n$$"

Chapter 8: Itemset Mining

• p235, Example 8.13, 2nd last line: "$$...,AB(3), AD(4),...$$" should be "$$..., AB(4), AD(3), ...$$"
• p236, 5th line: "$$...,AD(4),...$$" should be "$$..., AD(3),...$$"

Chapter 9: Summarizing Itemsets

• p250, 2nd line under Generalized Itemsets: "$$k$$-tidsets" should be "$$k$$ tidsets"
• p250, 4th line from bottom: "$$Z = Y \setminus X$$" should be "$$Z = X \setminus Y$$"
• p252, Eq. (9.3) and Eq. (9.4): "$$\bigl|X\setminus Y\bigr|$$" should be "$$\bigl|X\setminus W\bigr|$$" on the right hand side in both equations, so that they read

$$\textbf{Upper Bounds} \bigl(\bigl|X\setminus Y\bigr| \text{is odd} \bigr): sup(X) \leq\sum_{Y \subseteq W \subset X} -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} sup(W)$$ $$\textbf{Lower Bounds} \bigl(\bigl|X\setminus Y\bigr| \text{is even}\bigr): sup(X) \geq\sum_{Y \subseteq W \subset X} -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} sup(W)$$

• p254, Section Nonderivable Itemsets, 1st Equation after line 1: "$$\bigl|X\setminus Y\bigr|$$" should be "$$\bigl|X\setminus W\bigr|$$" , so that it reads

$$\mathit{IE}(Y) = \sum_{Y \subseteq W \subset X}\, -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} \cdot sup(W)$$

Chapter 10: Sequence Mining

• p264, alg 10.2, line 9: "$$\mathbf{P}$$" should be "$$P_a$$"

Chapter 11: Graph Pattern Mining

• p288, sec 11.3, 2nd paragraph, line 6: "$$sup(C) = sup(t)$$" should be "$$sup(C') = sup(t)$$"
• p290, Figure 11.8: The last tuple in the DFS-code for graph $$C_{19}$$ should be "$$\langle 2, 0, a, a \rangle$$" and not "$$\langle 2, 0, a, b\rangle$$"
• p292, Algorithm 11.2, Line 14: "$$b=\langle u_r, v, L(u_r), L(v), L(u_r, v)\rangle$$" should be "$$b=\langle u_r, v, L(\phi(u_r)), L(\phi(v)), L(\phi(u_r),\phi(v))\rangle$$"
• p293, Figure 11.9 (c): There there should be one more extension for $$\phi_5$$, namely $$\langle 0, 3, a, b\rangle$$
• p294, Algorithm 11.3, Line 12: "$$N_{G_j}$$" should be "$$N_{G}$$"
• p295, Algorithm 11.4, Line 0: "$$C$$" should be "$$C = \{t_1, t_2, ..., t_k\}$$"

Chapter 12: Pattern and Rule Assessment

• p322 (Alg 12.1) and p326 (Alg 12.2): replace "=" with "$$\gets$$"

Chapter 13: Representative-based Clustering

• p343, in 3rd equation: "$$P(C_i)$$" should be "$$P(C_1)$$"
• p335, Algorithm 13.1, line 7: "$$\mathbf{\mu}^t_i$$" should be "$$\mathbf{\mu}^{t-1}_i$$"

Chapter 14: Hierarchical Clustering

• p366, Fig 14.2: "(a) $$m=1$$", "(b) $$m=2$$", and "(c) $$m=3$$" should be "(a) $$n=1$$", "(b) $$n=2$$", and "(c) $$n=3$$", respectively.
• p373, sec 14.4: "EXERCISES AND PROJECTS" should be "EXERCISES"
• p373, Q1, "$$SMC(X_i, X_j)$$, $$JC(X_i, X_j)$$, $$RC(X_i, X_j)$$" should be "$$SMC(\mathbf{x}_i, \mathbf{x}_j)$$, $$JC(\mathbf{x}_i, \mathbf{x}_j)$$, $$RC(\mathbf{x}_i, \mathbf{x}_j)$$", respectively.

Chapter 15: Density-based Clustering

• p385, line after Eq. (15.6): "... having two parts. A vector ... " should be "... having two parts: a vector ..."

Chapter 16: Spectral and Graph Clustering

• p411, 2nd last equation: "$$\frac{1}{2}p_{rs}$$" should be "$$p_{rs}$$" so that it reads

$$p_{rs} = \frac{d_r}{2m}\frac{d_s}{2m} = \frac{d_r d_s}{4m^2}$$

• p413, Line 5: "$$\sum_{j=1}^n \mathbf{d}^T \mathbf{c}_i$$" should be "$$\mathbf{d}^T \mathbf{c}_i$$"
• p413, Line 10: "$$(\mathbf{d}_i^T\mathbf{c}_i)^2$$" should be "$$(\mathbf{d}^T\mathbf{c}_i)^2$$"
• p424, Q5: "$$\mathbf{c}_n = \frac{1}{\sqrt{n}} \mathbf{1}$$" should be "$$\mathbf{c}_n = \frac{1}{\sqrt{\sum_{i=1}^n d_i}} \mathbf{\Delta}^{1/2}\mathbf{1}$$"
• p424, Q6 (b): "$$\mathbf{K} = \mathbf{M}$$" should "$$\mathbf{K} = \mathbf{M} + \mathbf{I}$$"

Chaper 17: Clustering Validation

• p428, Example 17.1, Table below 2nd para: "$$n=100$$" should be "$$n=150$$" for the total count
• p463, Q10: Add the sentence "Assume that the clusters are: $$C_1 = \{a,b, c,d, e\}, C_2 = \{g, i\}, C_3 = \{f,h, j \}, C_4 = \{k\}$$."

Chapter 18: Probabilistic Classification

• p472, Table 18.2: "13/50" should be "11/50"
• p472, Example 18.2, 2nd Para, lines 6 and 7: "$$P(c_1|\mathbf{x})$$" and "$$P(c_2|\mathbf{x})$$" should be "$$\hat{P}(c_1|\mathbf{x})$$" and "$$\hat{P}(c_2|\mathbf{x})$$", respectively.

Chapter 20: Linear Discriminant Analysis

• p503: Example 20.2: There should be no transpose operator "$$T$$" on the mean vectors, i.e.,

$$\mathbf{\mu}_1 = \pmatrix{5.01\\3.42}^T \qquad \mathbf{\mu}_2 = \pmatrix{6.26\\2.87}^T \qquad \mathbf{\mu}_1 - \mathbf{\mu}_2= \pmatrix{-1.256\\0.546}^T$$

should be

$$\mathbf{\mu}_1 = \pmatrix{5.01\\3.42} \qquad \mathbf{\mu}_2 = \pmatrix{6.26\\2.87} \qquad \mathbf{\mu}_1 - \mathbf{\mu}_2 = \pmatrix{-1.256\\0.546}$$

• p509, Example 20.4, line 4: "iris-virginica" should be "$${\tt Iris\text{-}versicolor}$$"
• p512, Q1: In part (a) "$$\mathbf{S}_B$$" should be "$$\mathbf{B}$$", and in (b) "$$\mathbf{S}_W$$" should be "$$\mathbf{S}$$"

Chapter 21: Support Vector Machines

• p526, 7th line, in $$L_{dual}$$: "$$(C - \alpha_i + \beta_i)$$" should be "$$(C - \alpha_i - \beta_i)$$"
• p536, Algorithm 21.1, line 15: "$$\mathbf{\alpha}_{t+1} = \alpha$$" should be "$$\alpha_{t+1} \gets \alpha$$"
• p538, Example 21.8, line 5: "homogeneous quadratic kernel $$K(\mathbf{x}_i,\mathbf{x}_j) = ( \mathbf{x}^T_i \mathbf{x}_j)^2$$" should be "inhomogeneous quadratic kernel $$K(\mathbf{x}_i,\mathbf{x}_j) = (1+ \mathbf{x}^T_i \mathbf{x}_j)^2$$"