The next step towards an efficient Hybrid Solver is to optimize the computation of the Schur complement on each subdomain. Actually this part of the algorithm is the slowest, and it can be very slow.
First of all the reordering is special, since the degrees of freedom on the interior must be numbered first. I use the CAMD ordering by Tim Davis and al., that provide satisfaying enough orderings.
After that, I am concentrating on the code that computes the Schur complement. There is much work to do here. At the begining I was thinking about using MUMPS for this, since it has a subroutine for Schur complement computation and it is multithreaded. By googling around I found there is may be a (slightly) better solution, that would be to implement a sparse Cholesky solver based on the Direct Acyclic Graph (DAG) of the tasks. The computational tasks and their dependencies are expressed as an acyclic graph which is used to organize the thread hierachy to compute the Cholesky decomposition. It seems to be faster than MUMPS on multicore processors.
First of all the reordering is special, since the degrees of freedom on the interior must be numbered first. I use the CAMD ordering by Tim Davis and al., that provide satisfaying enough orderings.
After that, I am concentrating on the code that computes the Schur complement. There is much work to do here. At the begining I was thinking about using MUMPS for this, since it has a subroutine for Schur complement computation and it is multithreaded. By googling around I found there is may be a (slightly) better solution, that would be to implement a sparse Cholesky solver based on the Direct Acyclic Graph (DAG) of the tasks. The computational tasks and their dependencies are expressed as an acyclic graph which is used to organize the thread hierachy to compute the Cholesky decomposition. It seems to be faster than MUMPS on multicore processors.
Comments