Dense vs. Sparse Matrix

In the previous series, I covered the dense matrices in Mendz.Graphs, which implement them as two-dimensional arrays T[,]. They work good, really... that is, until you start loading graphs with a very large collection of vertices and edges.

In C#, a two-dimensional array requires memory relatively as big as the array's length m x n. Thus, if you have 2500 rows and 1250 columns, that's a memory allotment for 3,125,000 elements. For 2500 x 2500, that's 6,250,000 elements. Depending on your PC or device's memory size, creating a large two-dimensional array can throw an out of memory exception. At a certain point, even .Net itself will prevent you from creating an instance that gets bigger than 2GB (here's a possible workaround).

It doesn't take much to hit the limit. If you are working with thousands and thousands of graph data, Mendz.Graphs.Representations.Matrices.Dense may not be your friend. Fortunately, we have Mendz.Graphs.Representations.Matrices.Sparse to the rescue.

In Mendz.Graphs, a sparse matrix is a (int row, int column) tuple keyed dictionary of T values (ConcurrentDictionary<(int row, int column), T>). All sparse matrix implementations in the Mendz.Graphs.Representations.Matrices.Sparse namespace inherit and extend the SparseGraphMatrixBase abstract class.

As a dictionary containing only the "matrix" elements that are set/valued, it is easy to imagine how much memory can be saved. Memory is allotted only for what's needed by the dictionary. There is no real matrix. The matrix is virtual. This is easy to see in how SparseGraphMatrixBase implements its indexer property.

        public virtual T this[int row, int column]
        {
            get
            {
                Matrix<T>.CheckCoordinates(Size, row, column);
                if (Matrix.ContainsKey((row, column)))
                {
                    return Matrix[(row, column)];
                }
                else
                {
                    return default(T);
                }
            }
        }

The SparseGraphMatrixBase indexer simulates accessing a matrix entry by row i and by column j. In code, it looks like this: matrix[i, j]. If that code looks familiar, that's because it's exactly how you access elements in a two-dimensional array. If the indexes i and j are between the virtual matrix's size of m rows x n columns, an element/entry value can be retrieved. Otherwise, the caller gets an argument out of range exception.

In summary, Mendz.Graphs.Representations.Matrices.Sparse solves the memory hogging behaviors of the dense matrix. If your graph's matrix is literally "sparse", consider using Mendz.Graphs sparse matrices instead.

But first, we have to build them... ;-)

Note that Mendz.Library's Matrix and SquareMatrix static classes have been enhanced with CheckCoordinates() methods. Visit the Mendz.Library Matrix and SquareMatrix article to learn more. In order to promote modularity, SparseGraphMatrixBase has also been updated to use the new Matrix.CheckCoordinates() feature.

Comments