Computing Tucker via the HOSVD

Copyright 2022 National Technology & Engineering Solutions of Sandia,
LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the
U.S. Government retains certain rights in this software.

Higher-order Singular Value Decomposition (HOSVD) and Sequentially-truncased HOSVD (ST-HOSVD)

The HOSVD computes a Tucker decomposition of a tensor via a simple process. For each mode \(k\), it computes the \(R_k\) leading left singular values of the matrix unfolding and stores those as factor matrix \(U_k\). Then it computes a ttm of the original tensor and all the factor matrices to yield the core with shape \((R_1, R_2, \ldots, R_d)\). The core and factor matrices are used to form the ttensor. The values of \(R_k\) that lead to a good approximation can be computed automatically to yield a specified error tolerance; this is recommended and the default in our code. The ST-HOSVD is an improvement on the HOSVD that does a TTM in each mode before moving on to the next mode. This has the advantage of shrinking the tensor at each step and reducing subsequent computations. ST-HOSVD is the default in the hosvd code.

  • L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31:279-311, 1966, http://dx.doi.org/10.1007/BF02289464

  • L. D. Lathauwer, B. D. Moor and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Analysis and Applications, 21(4):1253-1278, 2000, http://dx.doi.org/10.1137/S0895479896305696

  • N. Vannieuwenhoven, R. Vandebril and K. Meerbergen, A New Truncation Strategy for the Higher-Order Singular Value Decomposition, SIAM J. Scientific Computing, 34(2):A1027-A1052, 2012, http://dx.doi.org/10.1137/110836067

import pyttb as ttb
import numpy as np

eps_machine = np.finfo(float).eps  # gets machine epsilon for floats, will be used later

Simple example of usage

# Create random tensor with shape (10, 20, 30) with core with shape (2, 3, 4)
def generate_sample_tensor():
    np.random.seed(0)
    noise = 0.01
    core = ttb.tensor(np.random.rand(2, 3, 4), shape=(2, 3, 4))  # The core tensor.
    U = [
        np.random.rand(10, 2),
        np.random.rand(20, 3),
        np.random.rand(30, 4),
    ]  # The factor matrices.
    Soln = ttb.ttensor(core, U)  # Create the solution ttensor.
    Z = Soln.full()
    Rdm = ttb.tenrand((10, 20, 30))

    Data = Z + noise * Z.norm() * Rdm / Rdm.norm()

    return Soln, Data
S, X = generate_sample_tensor()

# Compute HOSVD with desired relative error = 0.01
T = ttb.hosvd(input_tensor=X, tol=0.01)

# Check shape of core
coreshape = T.core.shape
print(f"Shape of core: {coreshape}")

# Check relative error
relerr = (X - T.double()).norm() / X.norm()
print(f"Relative error: {relerr}")
WARNING:root:Selected no copy, but input data isn't F ordered so must copy.
Computing HOSVD...

Shape of core: (2, 3, 4)
||X-T||/||X|| =  0.00508066 <= 0.010000 (tol)
Shape of core: (2, 3, 4)
Relative error: 0.005080663859987105

Generate a core with different accuracies for different shapes

We will create a core tensor that has is nearly block diagonal. The blocks are expontentially decreasing in norm, with the idea that we can pick off one block at a time as we increase the prescribed accuracy of the HOSVD. To do this, we define and use a function tenrandblk().

def tenrandblk(verbose=True):
    np.random.seed(0)
    # Block shapes (need not be cubic). Number of rows is the number
    # of levels and number of columns is the order of the tensor.
    bsz = np.array([[3, 2, 1], [2, 2, 2], [2, 3, 4]])

    # Squared norm of each block. Must be length L and sum to <= 1
    bns = np.array([0.9, 0.09, 0.009])

    # Find the end of each block
    bend = np.cumsum(bsz, 0)
    # Extract shape: D = # dimensions, L = # levels
    D, L = np.shape(bsz)
    # Final shape
    gsz = bend[-1, :]

    ## Create tensor
    # Figure out norm of off-block-diagonal
    dltnrmsqr = 1 - np.sum(bns)
    # Create pattern for off-block-diagonal to be modified as we go
    dltpattern = np.ones(tuple(gsz))
    # Create tensor to fill in
    G = ttb.tenzeros(tuple(gsz))

    # Create random entries to use
    A = np.sign(np.random.randn(*gsz))
    B = 0.1 * np.random.rand(*gsz) + 0.9
    Grnd = ttb.tensor(
        np.sign(np.random.randn(*gsz)) * (0.1 * np.random.rand(*gsz) + 0.9)
    )

    # Loop through and create blocks
    for i in range(L):
        # Figure out ith block pattern
        blkrange = []
        for k in range(D):
            if i == 0:
                blkrange.append(np.arange(bend[i, k]))
            else:
                blkrange.append(np.arange(bend[i - 1, k], bend[i, k]))

        # Create pattern that has ones for the block
        pattern = np.zeros(tuple(gsz))
        ix = np.ix_(blkrange[2], blkrange[0], blkrange[1])
        pattern[ix] = 1

        # Zero out block in the off-diagonal pattern
        dltpattern[ix] = 0

        # Randomly fill delta-pattern and rescale
        block = Grnd * pattern
        sse = (block**2).collapse()
        block *= np.sqrt(bns[i] / sse)

        # Add to main tensor
        G += block

        # Verbose output
        if verbose:
            print(
                f"Created block with shape {tuple(bsz[i,:])} with norm ({block.norm()})^2 = {block.norm()**2}"
            )

    if dltnrmsqr > 0:
        # final pattern
        block = Grnd * dltpattern
        sse = (block**2).collapse()
        block *= np.sqrt(dltnrmsqr / sse)
        G += block
        if verbose:
            print(
                f"Created tensor with shape {tuple(gsz)} with off-block-diagonal norm ({block.norm()})^2 = {block.norm()**2}"
            )

    return G, bsz, bns
# Create core tensor with given block structure and norm 1
G, bsz, bns = tenrandblk()
Created block with shape (np.int64(3), np.int64(2), np.int64(1)) with norm (0.9486832980505139)^2 = 0.9000000000000001
Created block with shape (np.int64(2), np.int64(2), np.int64(2)) with norm (0.3)^2 = 0.09
Created block with shape (np.int64(2), np.int64(3), np.int64(4)) with norm (0.09486832980505137)^2 = 0.009
Created tensor with shape (np.int64(7), np.int64(7), np.int64(7)) with off-block-diagonal norm (0.031622776601683805)^2 = 0.0010000000000000007
print(f"Shape of G: {G.shape}")
Shape of G: (7, 7, 7)

Generate data tensor with core described above

We take the core G and embed into into a larger tensor X by using orthogonal transformations. The true rank of this tensor is equal to the shape of G.

from scipy.stats import ortho_group


def mat_rand_orth(N, seed=0):
    # Generates random n x n orthogonal real matrix.
    return ortho_group.rvs(N, seed)


# shape of X
xsz = np.array([20, 20, 20])

# Create orthogonal matrices
U = []
for k in np.arange(3):
    V = mat_rand_orth(xsz[k])[:, : G.shape[k]]
    U.append(V)

# Create X
X = ttb.ttensor(G, U).full()

# The norm should be unchanged
print(f"||X|| = {X.norm()}")
||X|| = 1.0

Compute (full) HOSVD

We compute the ST-HOSVD using the hosvd method. We specify the tolerance close to machine precision. Ideally, it finds a core that is the same shape as G.

print("ST-HOSVD...")
T = ttb.hosvd(X, 2 * np.sqrt(eps_machine))
ST-HOSVD...
Computing HOSVD...

Shape of core: (7, 7, 7)
||X-T||/||X|| =  8.61086e-15 <= 0.000000 (tol)

Compute low-rank HOSVD approximation

The norm squared of the first two blocks of G is 0.99, so specifying an error of 1e-2 should yield a core with shape \((4, 4, 3)\)(mileage may vary). However, the conservative nature of the algorithm means that it may pick something larger. We can compensate by specifying a larger tolerance.

# Using 1e-2 exactly is potentially too conservative...
print("Result with tol = sqrt(1e-2):")
T = ttb.hosvd(X, np.sqrt(1e-2), verbosity=1)

# But a small multiple (i.e., |ndims(X)|) usually works...
print("\nResult with tol = sqrt(3e-2):")
T = ttb.hosvd(X, np.sqrt(3e-2), verbosity=1)
Result with tol = sqrt(1e-2):
Computing HOSVD...

Shape of core: (5, 5, 5)
||X-T||/||X|| =  0.0753888 <= 0.100000 (tol)

Result with tol = sqrt(3e-2):
Computing HOSVD...

Shape of core: (3, 4, 4)
||X-T||/||X|| =  0.0987914 <= 0.173205 (tol)

Similarly, the norm squared of the first block of G is 0.9, so specifying an error of 1e-1 should result in a core with shape \((3, 2, 1)\).

# Using 1e-1 exactly is potentially too conservative...
print("Result with tol = sqrt(1e-1):")
T = ttb.hosvd(X, np.sqrt(1e-1), verbosity=1)

# But a small multiple (i.e., |ndims(X)|) usually works...
print("\nResult with tol = sqrt(3e-1):")
T = ttb.hosvd(X, np.sqrt(3e-1), verbosity=1)
Result with tol = sqrt(1e-1):
Computing HOSVD...

Shape of core: (2, 3, 3)
||X-T||/||X|| =  0.202337 <= 0.316228 (tol)

Result with tol = sqrt(3e-1):
Computing HOSVD...

Shape of core: (1, 2, 2)
||X-T||/||X|| =  0.316125 <= 0.547723 (tol)

Verbosity - Getting more or less information.

Setting the verbosity to zero suppresses all output. Cranking up the verbosity gives some insight into the decision-making process…

Example 1

T = ttb.hosvd(X, tol=np.sqrt(3e-1), verbosity=10)
Computing HOSVD...

||X||^2 =  1
tol =  0.547723
eigenvalue sum threshold = tol^2 ||X||^2 / d =  0.1
Reverse cumulative sum of evals of Gram matrix:
 0:  1.0000 <-- Cutoff
 1:  0.0999
 2:  0.0322
 3:  0.0095
 4:  0.0055
 5:  0.0023
 6:  0.0004
 7: -0.0000
 8: -0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Reverse cumulative sum of evals of Gram matrix:
 0:  0.9001
 1:  0.3140 <-- Cutoff
 2:  0.0001
 3:  0.0000
 4:  0.0000
 5:  0.0000
 6:  0.0000
 7:  0.0000
 8:  0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Reverse cumulative sum of evals of Gram matrix:
 0:  0.9001
 1:  0.3139 <-- Cutoff
 2: -0.0000
 3: -0.0000
 4: -0.0000
 5: -0.0000
 6: -0.0000
 7: -0.0000
 8: -0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Shape of core: (1, 2, 2)
||X-T||/||X|| =  0.316125 <= 0.547723 (tol)

Example 2

T = ttb.hosvd(X, tol=3 * np.sqrt(eps_machine), verbosity=10)
Computing HOSVD...

||X||^2 =  1
tol =  4.47035e-08
eigenvalue sum threshold = tol^2 ||X||^2 / d =  6.66134e-16
Reverse cumulative sum of evals of Gram matrix:
 0:  1.0000
 1:  0.0999
 2:  0.0322
 3:  0.0095
 4:  0.0055
 5:  0.0023
 6:  0.0004 <-- Cutoff
 7: -0.0000
 8: -0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Reverse cumulative sum of evals of Gram matrix:
 0:  1.0000
 1:  0.4137
 2:  0.0996
 3:  0.0317
 4:  0.0093
 5:  0.0040
 6:  0.0001 <-- Cutoff
 7: -0.0000
 8: -0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Reverse cumulative sum of evals of Gram matrix:
 0:  1.0000
 1:  0.4137
 2:  0.0997
 3:  0.0322
 4:  0.0094
 5:  0.0052
 6:  0.0021 <-- Cutoff
 7:  0.0000
 8: -0.0000
 9: -0.0000
 10: -0.0000
 11: -0.0000
 12: -0.0000
 13: -0.0000
 14: -0.0000
 15: -0.0000
 16: -0.0000
 17: -0.0000
 18: -0.0000
 19: -0.0000
Shape of core: (7, 7, 7)
||X-T||/||X|| =  8.61086e-15 <= 0.000000 (tol)

Specify the ranks

If you know the rank you want, you can specify it. But there’s no guarantee that it will satisfy the specified tolerance. In such cases, the method will throw a warning.

# Rank is okay
T = ttb.hosvd(X, tol=np.sqrt(3e-1), ranks=bsz[0, :].tolist())

# Rank is too small for the specified error
T = ttb.hosvd(X, tol=np.sqrt(3e-1), ranks=[1, 1, 1])

# But you can set the error to the tensor norm to make the warning go away
T = ttb.hosvd(X, tol=X.norm(), ranks=[1, 1, 1])
Computing HOSVD...

Shape of core: (4, 3, 2)
||X-T||/||X|| =  0.316018 <= 0.547723 (tol)
Computing HOSVD...

Shape of core: (2, 2, 2)
||X-T||/||X|| =  0.316111 <= 0.547723 (tol)
Computing HOSVD...

Shape of core: (2, 2, 2)
||X-T||/||X|| =  0.316111 <= 1.000000 (tol)

Specify the mode order

It’s also possible to specify the order of the modes. The default is np.arange(X.ndims).

reverse_dimorder = np.arange(X.ndims).tolist()[::-1]
T = ttb.hosvd(X, tol=np.sqrt(3e-1), dimorder=reverse_dimorder)
Computing HOSVD...

Shape of core: (1, 2, 2)
||X-T||/||X|| =  0.316129 <= 0.547723 (tol)

Generate bigger data tensor with core described above

Uses the same procedure as before, but now the shape is bigger.

# shape of X
xsz = np.array([100, 100, 100])

# Create orthogonal matrices
U = []
for k in np.arange(3):
    V = mat_rand_orth(xsz[k])[:, : G.shape[k]]
    U.append(V)

# Create X
Y = ttb.ttensor(G, U).full()

# The norm should be unchanged
print(f"||Y|| = {Y.norm()}")
||Y|| = 0.9999999999999998

ST-HOSVD compared to HOSVD

The answers are essentially the same for the sequentially-truncated HOSVD and the HOSVD…

print("ST-HOSVD...")
T = ttb.hosvd(Y, tol=2 * np.sqrt(eps_machine))
print("\nHOSVD...")
T = ttb.hosvd(Y, tol=2 * np.sqrt(eps_machine), sequential=False)
ST-HOSVD...
Computing HOSVD...
Shape of core: (7, 7, 7)
||X-T||/||X|| =  5.49618e-15 <= 0.000000 (tol)

HOSVD...
Computing HOSVD...
Shape of core: (7, 7, 7)
||X-T||/||X|| =  5.62518e-15 <= 0.000000 (tol)

But ST-HOSVD may be slightly faster than HOSVD for larger tensors.

print("Time for 5 runs of ST-HOSVD:")
%timeit -n5 T = ttb.hosvd(Y, tol=2 * np.sqrt(eps_machine), verbosity=0)
Time for 5 runs of ST-HOSVD:
114 ms ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
print("\nTime for 5 runs of HOSVD:")
%timeit -n5 T = ttb.hosvd(Y, tol=2 * np.sqrt(eps_machine), sequential=False, verbosity=0)
Time for 5 runs of HOSVD:
187 ms ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)