Main

Errata

Please email us if you find any errors in the book. We will list known errata on this page.

Chapter 1: Data Mining and Analysis

  • p4, Section 1.3, line 13: "as linear combination"should be "as a linear combination"
  • p9, Example 1.3, 3rd line from end: "\((153)^{1/3}\)" should be "\((152)^{1/3}\)"
  • p9, Example 1.3, last line: "\( (4^3 + (-1)^3)^{1/3} = (63)^{1/3} = 3.98\)" should be "\( (4^3 + |-1|^3)^{1/3} = (65)^{1/3} = 4.02\)"
  • p24, Section 1.4.3, last line of subsection Univariate Sample: "where \(f_\mathbf{X}\) is the probability mass or density function for \(\mathbf{X}\)" should be "where \(f_X\) is the probability mass or density function for \(X\)"
  • p30, Section 1.7, Q1: "in (1.5)" should be "in Eq. (1.5)"

Chapter 2: Numeric Attributes

  • p34, Equation (2.2): "\( \hat{F}(x) \ge q \)" should be "\( F(x) \ge q \)"
  • p34, Line after Equation (2.2):
    "That is, the inverse CDF gives the least value of \( X\), for which \(q\) fraction of the values are higher, and \(1 − q\) fraction of the values are lower."
    should be
    "That is, the inverse CDF gives the least value of \( X\), for which \(q\) fraction of the values are lower, and \(1 − q\) fraction of the values are higher."
  • p53, Example 2.6, line 1: "... range for \({\tt Income}\) is \(2700-300=2400\)" should be "... range for \({\tt Income}\) is \(6000-300=5700\)"
  • p55, In Eq (2.32): "\(P(-k \le z \le k) = P\bigl(0 \le t \le k/\sqrt{2}\bigr) \)" should be "\(P(-k \le z \le k) = 2 \cdot P\bigl(0 \le t \le k/\sqrt{2}\bigr) \)"
  • p58, Total and Generalized Variance, Line 2: "...product of its eigenvectors" should be "...product of its eigenvalues"
  • p58, two lines above Example 2.8: "\(tr(\Lambda)\)" should be "\( tr(\mathbf{\Lambda}) \)"
  • p61, Q3: "\(mu\)" should be "\(\mu\)" so that it reads

$$\sum_{i=1}^n (x_i - \mu)^2 = n(\hat{\mu} - \mu)^2 + \sum_{i=1}^n (x_i - \hat{\mu})^2$$

Chapter 3: Categorical Attributes

  • p81, Table 3.6, Attribute value for \(X_2\): "\( {\tt Short} ( a_{23}) \)" should be "\( {\tt Long} ( a_{23}) \)"

Chapter 4: Graph Data

  • p103, 2 lines above Eq (4.3): "\(\gamma_{jk} = 0 \)" should be "\(\gamma_{jk}(v_i) = 0 \)"
  • p103, Eq (4.3): "\(\gamma_{jk} \)" should be "\(\gamma_{jk}(v_i) \)"
  • p103, Example 4.5, last line: "\(\gamma_{jk} > 0 \)" should be "\(\gamma_{jk}(v_5) > 0 \)"
  • p104, Example 4.5:
    \(c(v_5) = \gamma_{18} + \gamma_{24} + \gamma_{27} + \gamma_{28} + \gamma_{38} + \gamma_{46} + \gamma_{48} + \gamma_{67} + \gamma_{68}\)
    should be
    \(c(v_5) = \gamma_{18}(v_5) + \gamma_{24}(v_5) + \gamma_{27}(v_5) + \gamma_{28}(v_5) + \gamma_{38}(v_5) + \gamma_{46}(v_5) + \gamma_{48}(v_5) + \gamma_{67}(v_5) + \gamma_{68}(v_5)\)
  • p107: \(\mathbf{p}_1 = \frac{1}{2} \pmatrix{1\\ 1\\ 2\\ 1\\ 2}\) should be \(\mathbf{p}_1 = \frac{1}{2} \pmatrix{1\\ 2\\ 2\\ 1\\ 2}\)
  • p127, 4th Line after Eq (4.22): "initial \(n_0\) edges" should be "initial \(n_0\) nodes"

Chapter 5: Kernel Methods

  • p138, Example 5.4:

$$\mathbf{\mu}_\phi = \sum_{i=1}^5 \phi(\mathbf{x}_i) = \sum_{i=1}^5 \mathbf{x}_i$$

should be

$$\mathbf{\mu}_\phi = \frac{1}{5}\sum_{i=1}^5 \phi(\mathbf{x}_i) = \frac{1}{5} \sum_{i=1}^5 \mathbf{x}_i$$

  • p140, 7th Line after Eq (5.3): "\(\sum_{i=1}^{m_a} \sum_{j=1}^{m_a} \alpha_i \alpha_{\!j} K(\mathbf{x}_i, \mathbf{x})\)" should be "\(\sum_{i=1}^{m_a} \sum_{j=1}^{m_a} \alpha_i \alpha_{\!j} K(\mathbf{x}_i, \mathbf{x}_j)\)"
  • p141, 3rd line and 10th Line before Sec 5.1.2: There is an extra left bracket in definition of \(\phi(\mathbf{x})\), that is,
    "\(\big( ( K(\mathbf{x}_1, \mathbf{x}), ... \)" should be "\( \big( K(\mathbf{x}_1, \mathbf{x}), ... \)"
  • p144, 2nd line: "\(\int a(\mathbf{x})^2\; d\mathbf{x} < 0\)" should be "\(\int a(\mathbf{x})^2\; d\mathbf{x} < \infty\)"
  • p144, last line: "\(\sum_{k=1}^q\)" should be "\(\sum_{k=0}^q\)"
  • p156, Section 5.4.2: all occurrences of "path/paths" should be "walk/walks"

Chapter 6: High-dimensional Data

  • p164: In the definitions of the hyperball and and hypersphere
    "\(\mathbf{x} = (x_1, x_2, \ldots, x_d)\)" should be "\(\mathbf{x} = (x_1, x_2, \ldots, x_d)^T\)"
  • p171: "\( \mathbf{0}_d = (0_1,0_2,\ldots,0_d) \)" should be "\(\mathbf{0}_d = (0_1,0_2,\ldots,0_d)^T\)"
  • p172, Section 6.6, 1st Line after Eq. (6.11):
    \(\mu\) in equation "\(\mu=\mathbf{0}_d\)" should be in bold.
  • p178, section "Volume in d dimensions":
    "\(x_1 = r \cos\theta_1\cos\theta_2 \cos\theta_3 = r c_2 c_2 c_3\)" should be "\(x_1 = r \cos\theta_1\cos\theta_2 \cos\theta_3 = r c_1 c_2 c_3\)"
    "\(x_3 = r \cos\theta_1\sin\theta_2 = r c_1 s_1\)" should be "\(x_3 = r \cos\theta_1\sin\theta_2 = r c_1 s_2\)"
  • p178, Equation for \(J(\theta_1, \theta_2, \theta_3) \), Entry in first row, fourth column: "\( r c_1 c_2 s_3 \)"should be "\(-r c_1 c_2 s_3 \)"

Chapter 7: Dimensionality Reduction

  • p186, line 1: "\( \mathbf{a}_r \) is vector" should be "\( \mathbf{a}_r \) is a vector"

Chapter 8: Itemset Mining

  • p235, Example 8.13, 2nd last line: "\(...,AB(3), AD(4),...\)" should be "\(..., AB(4), AD(3), ...\)"
  • p236, 5th line: "\(...,AD(4),...\)" should be "\(..., AD(3),...\)"

Chapter 9: Summarizing Itemsets

  • p250, 2nd line under Generalized Itemsets: "\(k\)-tidsets" should be "\(k\) tidsets"
  • p250, 4th line from bottom: "\(Z = Y \setminus X\)" should be "\(Z = X \setminus Y\)"
  • p252, Eq. (9.3) and Eq. (9.4): "\( \bigl|X\setminus Y\bigr| \)" should be "\( \bigl|X\setminus W\bigr| \)" on the right hand side in both equations, so that they read

$$ \textbf{Upper Bounds} \bigl(\bigl|X\setminus Y\bigr| \text{is odd} \bigr): sup(X) \leq\sum_{Y \subseteq W \subset X} -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} sup(W) $$ $$ \textbf{Lower Bounds} \bigl(\bigl|X\setminus Y\bigr| \text{is even}\bigr): sup(X) \geq\sum_{Y \subseteq W \subset X} -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} sup(W) $$

  • p254, Section Nonderivable Itemsets, 1st Equation after line 1: "\( \bigl|X\setminus Y\bigr| \)" should be "\( \bigl|X\setminus W\bigr| \)" , so that it reads

$$\mathit{IE}(Y) = \sum_{Y \subseteq W \subset X}\, -1^{\bigl(\bigl|X\setminus W\bigr|+1\bigr)} \cdot sup(W)$$

Chapter 10: Sequence Mining

  • p264, alg 10.2, line 9: "\(\mathbf{P}\)" should be "\( P_a \)"

Chapter 11: Graph Pattern Mining

  • p288, sec 11.3, 2nd paragraph, line 6: "\( sup(C) = sup(t) \)" should be "\( sup(C') = sup(t) \)"
  • p290, Figure 11.8: The last tuple in the DFS-code for graph \(C_{19}\) should be "\( \langle 2, 0, a, a \rangle\)" and not "\( \langle 2, 0, a, b\rangle \)"
  • p292, Algorithm 11.2, Line 14: "\( b=\langle u_r, v, L(u_r), L(v), L(u_r, v)\rangle \)" should be "\( b=\langle u_r, v, L(\phi(u_r)), L(\phi(v)), L(\phi(u_r),\phi(v))\rangle \)"
  • p293, Figure 11.9 (c): There there should be one more extension for \(\phi_5\), namely \( \langle 0, 3, a, b\rangle \)
  • p294, Algorithm 11.3, Line 12: "\( N_{G_j} \)" should be "\( N_{G} \)"
  • p295, Algorithm 11.4, Line 0: "\(C\)" should be "\(C = \{t_1, t_2, ..., t_k\}\)"

Chapter 12: Pattern and Rule Assessment

  • p322 (Alg 12.1) and p326 (Alg 12.2): replace "=" with "\(\gets\)"

Chapter 13: Representative-based Clustering

  • p343, in 3rd equation: "\(P(C_i)\)" should be "\(P(C_1)\)"
  • p335, Algorithm 13.1, line 7: "\(\mathbf{\mu}^t_i\)" should be "\(\mathbf{\mu}^{t-1}_i\)"

Chapter 14: Hierarchical Clustering

  • p366, Fig 14.2: "(a) \(m=1\)", "(b) \(m=2\)", and "(c) \(m=3\)" should be "(a) \(n=1\)", "(b) \(n=2\)", and "(c) \(n=3\)", respectively.
  • p373, sec 14.4: "EXERCISES AND PROJECTS" should be "EXERCISES"
  • p373, Q1, "\(SMC(X_i, X_j)\), \(JC(X_i, X_j)\), \(RC(X_i, X_j)\)" should be "\(SMC(\mathbf{x}_i, \mathbf{x}_j)\), \(JC(\mathbf{x}_i, \mathbf{x}_j)\), \(RC(\mathbf{x}_i, \mathbf{x}_j)\)", respectively.

Chapter 15: Density-based Clustering

  • p385, line after Eq. (15.6): "... having two parts. A vector ... " should be "... having two parts: a vector ..."

Chapter 16: Spectral and Graph Clustering

  • p411, 2nd last equation: "\( \frac{1}{2}p_{rs} \)" should be "\( p_{rs} \)" so that it reads

$$ p_{rs} = \frac{d_r}{2m}\frac{d_s}{2m} = \frac{d_r d_s}{4m^2} $$

  • p413, Line 5: "\(\sum_{j=1}^n \mathbf{d}^T \mathbf{c}_i\)" should be "\(\mathbf{d}^T \mathbf{c}_i\)"
  • p413, Line 10: "\((\mathbf{d}_i^T\mathbf{c}_i)^2\)" should be "\((\mathbf{d}^T\mathbf{c}_i)^2\)"
  • p424, Q5: "\( \mathbf{c}_n = \frac{1}{\sqrt{n}} \mathbf{1}\)" should be "\( \mathbf{c}_n = \frac{1}{\sqrt{\sum_{i=1}^n d_i}} \mathbf{\Delta}^{1/2}\mathbf{1}\)"
  • p424, Q6 (b): "\( \mathbf{K} = \mathbf{M} \)" should "\( \mathbf{K} = \mathbf{M} + \mathbf{I}\)"

Chaper 17: Clustering Validation

  • p428, Example 17.1, Table below 2nd para: "\(n=100\)" should be "\(n=150\)" for the total count
  • p463, Q10: Add the sentence "Assume that the clusters are: \(C_1 = \{a,b, c,d, e\}, C_2 = \{g, i\}, C_3 = \{f,h, j \}, C_4 = \{k\}\)."

Chapter 18: Probabilistic Classification

  • p472, Example 18.2, 2nd Para, lines 6 and 7: "\(P(c_1|\mathbf{x})\)" and "\(P(c_2|\mathbf{x})\)" should be "\(\hat{P}(c_1|\mathbf{x})\)" and "\(\hat{P}(c_2|\mathbf{x})\)", respectively.

Chapter 20: Linear Discriminant Analysis

  • p503: Example 20.2: There should be no transpose operator "\(T\)" on the mean vectors, i.e.,

$$\mathbf{\mu}_1 = \pmatrix{5.01\\3.42}^T \qquad \mathbf{\mu}_2 = \pmatrix{6.26\\2.87}^T \qquad \mathbf{\mu}_1 - \mathbf{\mu}_2= \pmatrix{-1.256\\0.546}^T$$

should be

$$\mathbf{\mu}_1 = \pmatrix{5.01\\3.42} \qquad \mathbf{\mu}_2 = \pmatrix{6.26\\2.87} \qquad \mathbf{\mu}_1 - \mathbf{\mu}_2 = \pmatrix{-1.256\\0.546}$$

  • p509, Example 20.4, line 4: "iris-virginica" should be "\({\tt Iris\text{-}versicolor}\)"
  • p512, Q1: In part (a) "\(\mathbf{S}_B\)" should be "\(\mathbf{B}\)", and in (b) "\(\mathbf{S}_W\)" should be "\(\mathbf{S}\)"

Chapter 21: Support Vector Machines

  • p526, 7th line, in \(L_{dual}\): "\((C - \alpha_i + \beta_i)\)" should be "\((C - \alpha_i - \beta_i)\)"
  • p536, Algorithm 21.1, line 15: "\( \mathbf{\alpha}_{t+1} = \alpha \)" should be "\( \alpha_{t+1} \gets \alpha \)"
  • p538, Example 21.8, line 5: "homogeneous quadratic kernel \(K(\mathbf{x}_i,\mathbf{x}_j) = ( \mathbf{x}^T_i \mathbf{x}_j)^2\)" should be "inhomogeneous quadratic kernel \(K(\mathbf{x}_i,\mathbf{x}_j) = (1+ \mathbf{x}^T_i \mathbf{x}_j)^2\)"