Abstract
The method of kinematic synthesis requires finding the solution set of a system of polynomials. Parameter homotopy continuation is used to solve these systems and requires repeatedly solving systems of linear equations. For kinematic synthesis, the associated linear systems become ill-conditioned, resulting in a marked decrease in the number of solutions found due to path tracking failures. This unavoidable ill-conditioning places a premium on accurate function and matrix evaluations. Traditionally, variables are eliminated to reduce the dimension of the problem. However, this greatly increases the computational cost of evaluating the resulting functions and matrices and introduces numerical instability. We propose avoiding the elimination of variables to reduce required computations, increasing the dimension of the linear systems, but resulting in matrices that are quite sparse. We then solve these systems with sparse solvers to save memory and increase speed. We found that this combination resulted in a speedup of up to 250 × over traditional methods while maintaining the same accuracy.
1 Introduction
Many problems arising in science and engineering require solving a system of polynomial equations. The equations relevant to this paper are complex and with many roots. To this end, there are many possible methods for finding these roots: Newton’s method [1], interval analysis [2], and Gröbner bases [3] to name a few. However, the method of homotopy continuation [4] is often chosen for large polynomial systems due to its ability to be split into a series of independent subproblems. This makes homotopy continuation an attractive choice for parallel computing.
Homotopy continuation solves a polynomial system by first solving a similar polynomial system but with easy to find roots, then continuously deforms these start points into the roots of a target system. The deformation of each root can occur independently of every other root, leading to the method being termed “embarrassingly parallel” and affording itself attractively to multi-threaded computing.
Kinematic synthesis of robotic devices and mechanisms requires finding the values of physical dimensions that satisfy a number of constraints specified on position, velocity, acceleration, or some combination of the three. Finding the dimension values (i.e., roots of polynomial systems that form constraints) is typically achieved through either optimization or solving the algebraic equations directly. Homotopy continuation is often applied in the exact synthesis of mechanisms. The synthesis equations associated with such problems grow exponentially as more sophisticated mechanical systems are designed [5].
Traditionally in kinematic synthesis, the rotation operators are algebraically eliminated to reduce the dimensionality of the problem. The result is fewer equations and variables, but with the tradeoff of increasing the complexity of the resulting equations. For this paper, we will refer to “uneliminated” as forgoing algebraic elimination of the original kinematic synthesis equations. We refer to “eliminated” as having performed the necessary algebraic elimination to reduce the number of equations and variables of the polynomial systems to be solved.
When computing all roots to a system of polynomials for the first time, the prior art has devised several sophisticated methods of constructing start systems and start points. Some of these include multi-homogeneous homotopy [6], polyhedral homotopies [7], regeneration [8], statistics-based accumulations of roots [9], or monodromy methods [10]. However, once an initial minimal set of roots has been found, then the method of parameter homotopy continuation is known to be the most efficient [11]. This method first selects a parameterization of the polynomial variety of interest (which may be the monomial coefficient themselves, but generally is not), then forms a homotopy directly on those parameters. This is opposed to forming a homotopy on the constituent polynomials directly (which would be identical to selecting monomial coefficients as parameters). The efficiency of such an approach stems from the fact that it applies to homotopy paths corresponding with the minimal number of finite start points that, for numerically general [12] parameters, maintain their finiteness and multiplicity over the entirety of the homotopy. To start with such a minimal number of paths, parameter homotopy leverages solutions found from other forms of homotopy to simplify proceeding computations, constituting what is termed a numerical reduction of the polynomial system. Systems having the same structure with different parameters occur often in science and engineering, making parameter homotopy continuation suitable for many applications.
A common practical usage of homotopy continuation involves two phases of computation. In the first phase, an ab initio computation is performed using a generic set of parameters. This initial computation might be a multi-homogeneous homotopy, a polyhedral homotopy, regeneration homotopy, or a monodromy-based accumulation. This step has been generally understood as a resource-intense, one-time computation. In the second phase, the end points from an ab initio computation are repurposed as start points for repeated parameter homotopy computations. These parameter homotopies are repeated for different target parameters as necessarily required by some science or engineering application. These ensuing parameter homotopies are expected to compute quickly.
This expectation is true for systems that track from numerically general start parameters to numerically general final parameters. Numerically general refers to randomly generated complex numbers that have no special relation to one another. But science and engineering applications often require tracking to less general final parameters, stemming from physical problems that impart some pattern on these parameters. For these systems, using final parameters that have physical meaning leads to severe ill-conditioning that overwhelms the accuracy of path tracking when native precision [13]. Higher precision may overcome this, but at the cost of increasing computation time.
In homotopy continuation, ill-conditioning can occur when paths become sufficiently close in the tracking space when two or more points are merging to create a solution of higher multiplicity. Special techniques such as the gamma trick [14] have been developed to avoid ill-conditioning at midpoints during path tracking. Additionally, it can be shown with the Theorem of Sard [15] that careful choice of start parameters yields paths that have a very low chance of being ill-conditioned at a midpoint. Most of the ill-conditioning, however, occurs towards the end of the path, near the target system. Traditionally, special endgame methods [16] are used to successfully approximate these singular solutions and have been implemented in softwares such as hompack90 [17], phcpack [18], polsys_plp [19] and polsys_glp [20]. The methodology presented in this paper applies to finding ill-conditioned, but nonetheless, nonsingular and isolated roots relevant to kinematic synthesis with 64-bit precision. This differs from the endgame techniques [16], which were developed to compute singular roots of high multiplicity.
Endgame methods function in an “operating zone” between a ball defined by the nearest branch point and a zone of ill-conditioning [21,22]. When the ill-conditioning zone expands beyond the nearest branch point, then the operating zone is nonexistent and the endgame will fail. Increasing working precision effectively decreases the size of ill-conditioning zone. However, using non-native precision (>64 bits on modern CPUs) leads to a huge decrease in floating point throughput, about 10 × slower. One of the contributions of this paper is to compute these endpoints in the absence of higher precision libraries.
1.1 The Use of Polynomial Homotopy Continuation in Kinematic Synthesis.
For kinematic synthesis, the number of paths required to track for problems affects the amount of ill-conditioning. For problems with fewer paths, the degree of ill-conditioning seems to be much less. Problems in kinematics such as those studied in Ref. [23] have been able to successfully track all paths. However, for problems such as those found in this paper, both the number of paths required to track and the nonlinearity of the equations increase, where this ill-conditioning reaches the point that practical usage for engineering design becomes intractable. The problem studied by Newkirk et al. required tracking 128 paths for a system of seven quadratic equations, whereas for the scale for one of the problems investigated in this paper, we require tracking 1,122,022 paths for a system of seven quartic and 14 quintic equations. High failure rates have additionally been previously observed in similar problems with a large number of paths in kinematic synthesis [10,24,25]. The correlation between the number of paths required to track and condition number is special to kinematic synthesis. The number of paths to track and conditioning are two independent properties. It is entirely possible to conceive of small, ill-conditioned problems as well as large, well-conditioned problems outside the realm of kinematic synthesis.
We make this point in the conceptual diagrams of Fig. 1: parameter homotopy continuation works well when working with problems with fewer paths, but when pulling from data in exercises taken from Sec. 4 of this paper, numerical failures increased by 26% and compute time increased by 63% when parameter homotopies were executed with physical parameters rather than numerically generic parameters. This effect greatly hinders the efficiency that parameter homotopy is expected to have.
To further prove this point, we performed a perturbation analysis study on the same problem. Starting with two sets: one generic set of parameters selected at random and one set of physical parameters. We found the solutions to these sets and then generated 10 sets of randomly perturbed parameters using a random noise of ± 0.001 selected from a uniform distribution, with each parameter being individually perturbed. As tracking all 1,122,022 solutions of the original sets would prove infeasable, a representative set of 1000 solutions were chosen at random and used for all runs in this analysis instead. Solutions were found with Bertini [22] in quad precision and using the power-series endgame method. The solutions of the perturbed parameters were then identified and compared element-wise against the unperturbed solutions and the average norm of the deviation between the two was calculated. This average deviation was taken over all complex solutions for both generic and physical parameters.
The result was that, for the generic set of parameters, an average deviation of 2.6 was calculated between the perturbed and unperturbed sets. To construct this for sets of physical parameters, an average deviation of 1.612 × 1011 was observed. It is clear that, for the same problem, changing from generic to physical parameters greatly increased the sensitivity even though the fundamental polynomial structure of the problem was unchanged.
The effect is pronounced enough that many would-be practical implementations of parameter homotopy are pushed out of reach based on the need to integrate non-native higher precision floating point operations (FLOPS). By our experience, the transition from 64-bit (double) to 128-bit (quad) arithmetic accounts for at least a 10 × increase in compute times. The contribution of this paper is to show how to surmount such barriers through the combination of a specific algebraic setup and sparse linear algebra routines. Our resulting algorithms compute with the speed of doubles, but the success of quads. Kinematic synthesis is one class of engineering problems that stand to benefit from this work.
2 Background
2.1 Parameter Homotopy Continuation.
2.2 Path Tracking.
The process of path tracking involves iteratively predicting the next point zk+1 as tk increments from 0 to 1. Path tracking is performed with a predictor–corrector technique. Nominally, at tk, a Runge–Kutta iteration computes an estimate of zk+1 at tk+1. This estimate is corrected by cycling it through a few iterations of Newton’s method applied to (2).
Good prediction points are usually favored over increasing the number of correction steps due to path jumping. Path jumping occurs when a prediction point is placed in the zone of convergence of a different yet nearby path. The correction steps then cause the current path to converge to the solution curve of the nearby path, resulting in a repeated solution rather than two distinct solutions. Placing emphasis on good predictions helps mitigate this risk.
Path tracking is therefore the process of executing prediction and correction steps iteratively as t is perturbed from t = 0 to t = 1. At t = 1, there are three possible outcomes for a path:
The path becomes a distinct root f(z*; q(1)) = 0.
Two or more paths converge to the same root to become a root of multiplicity > 1.
Variable values tend toward infinity, termed points at infinity.
Most of computation spent during path tracking is spent solving linear systems of equations. Depending on the choice of integrator, (5) must be solved multiple times for each prediction step: six times for Runge–Kutta–Fehlberg and eight times for Dormand–Prince. Additionally, the correction equation (7) must be solved once for every iteration. To give an estimate, a path tracker taking 200 prediction–correction steps using Dormand–Prince for the prediction with two iterations of Newton’s method to correct must solve 200 × (8 + 2) = 2000 linear systems to track a path from t from 0 to 1. Solving these linear systems therefore is the heart of path tracking.
2.3 Path Failures and Numerical Precision.
Paths whose outcomes do not fall under the conditions listed in the previous section can be considered as failures. Path failures can happen for a variety of reasons, but the most common and relevant reason is ill-conditioning of the matrices involved in solving the linear systems. In homotopy continuation, ill-conditioning occurs at t* when f(z; q(t*)) approaches near singular conditions, even if an actual singularity does not occur. It is important to note that in most homotopy continuation settings, the choice of start parameters q0 is specially chosen such that there is a very low probability that the polynomial system truly has a root of multiplicity higher than one [12].
In the correction step, solving the linear system with a matrix of condition number C, it is generally expected that log 10(C) digits of precision are lost [13]. Numbers stored in double precision can expect 15–17 significant decimal digits [28]. When path tracking, ill-conditioned matrices often prevent a path from converging back to the curve (2) by restricting the number of digits that Newton’s method is able to correct. The path is then not a good estimate of a root for the next prediction step. This causes the path to rapidly diverge from (2), preventing it from tracking successfully.
A natural remedy for this issue is to simply increase the working precision. Quad precision numbers can expect 33–36 significant decimal digits [28], with the possibility of extending that number further using multiprecision libraries. However, on the current hardware, special routines are required in order to perform arithmetic in precision higher than that of a double. As a general rule, using quad math results in a runtime approximately five times longer than double math [29].
As solving linear systems is a major computational cost of path tracking, it is important to observe the structure of the linear systems to be solved. A generic system of equations has a complexity of , where n is the number of variables. Therefore, the number of variables of the system has a great impact on the time it takes to solve them. For the problem of kinematic synthesis, it is possible to influence the number of equations of the resulting system of equations.
2.4 Application: Kinematic Synthesis of Mechanisms.
Kinematics is the geometry of motion. The goal of kinematic synthesis of linkages is to discover the dimensions, the lengths, and angles of a mechanism that produces a desired motion designated by a set of tasks. These tasks are defined as constraints on position, velocity, acceleration, or some combination of the three.
The dimensions of the mechanism are found by first defining the loop equations. These are a set of vector equations that define the geometry that must be conserved throughout the motion of the linkage. This geometry consists of a sequence of rotation and translation operations with sets of moving and fixed points. There are multiple ways to define these rotations and translations. One way is to use the addition of vectors to define translations, and elements of SO(2) for rotations. Alternatively, Wampler [30] used isotropic coordinates which reduce the problem to the addition and multiplication of complex numbers, leading to a more direct formulation of polynomials. This paper uses isotropic coordinates to define all of the loop equations.
Rotation operators can be eliminated by solving the loop equations for a rotation operator and its conjugate, then substituting the result into the normalization conditions. Rotation operators are usually eliminated whenever possible to reduce the number of variables, and thus the time it takes to solve the corresponding linear systems.
However, the resulting equations grow in complexity as operators are eliminated, rapidly increasing the cost of evaluating f(z; q(t)), Jv, and Jp. Additionally, the exponential increase in operations introduces instability due to the intrinsic propagating error of floating point operations. Leaving the equations uneliminated reduces the number of floating point operations to evaluate Jacobians but increases the dimension of the linear systems that need to be solved (computation time scales by a cube of dimension for dense linear solving). For the problem of kinematic synthesis in particular, we know that leaving the equations in their uneliminated form means that Jv and Jp are particularly sparse matrices. Much work has been done in creating special routines for solving sparse linear systems that save memory and operations by working only with the nonzero elements of the matrix [31].
In this paper, we demonstrate how to overcome the numerical failures and slow compute times of parameter homotopies aimed at physical parameters. This is done by opting for equivalent formulations of polynomial systems that comprise of more equations and variables, and writing predictor–corrector tracking routines that implement sparse libraries. Such a setup carries more information from step to step across a homotopy through more variables rather than by representing few variables with more bits. Our technique appreciably reduces numerical failures without needing to resort to higher precision arithmetic. The latter point is important because the throughput of floating point operations on non-native (>64-bit) numbers is dramatically slower than native precision. The usage of more equations and variables naturally leads to sparse Jacobians, which we handle inside our path tracking algorithm with appropriate sparse routines. Else, our proposed benefit would be more than canceled by the cubic scaling of compute times to solve linear systems of increasing dimension.
2.5 Sparse Linear Solvers.
Matrices with a large percentage of zeros occur in many areas of science and engineering. These include structural analysis, analysis of network and power distribution systems, and numerically solving differential equations [32]. Typically these matrices are very large as to justify special methods to deal only with nonzero elements.
Sparse matrices are typically described in triplet form where the row, column, and value of all nonzero entities are specified. Any row and column not specified is assumed to be zero. This data structure is easily converted into other data structures such as compressed-column or compressed-row forms to allow for memory savings. Special algorithms are tuned for these data structures to facilitate matrix–matrix or matrix–vector computations [31].
Factoring a sparse matrix into L and U components must be done carefully to avoid fill-ins. A fill-in is the generation of a new nonzero element in a position in either L or U that is otherwise zero in the corresponding position of A. Sparse matrices gain their speedup in computation in part by avoiding superfluous fill-in operations. Rose and Tarjan [33] were first to describe a process for factoring A to reduce fill-ins. To do this, they created a directed graph from A where edge (i, j) exists if aij ≠ 0. This adjacency graph can then be re-ordered which has the effect of permuting the rows and columns of A. This re-ordering is done in a way that the number of fill-ins is reduced, but not necessarily minimal as finding an ordering that produces a minimal amount of fill-ins is an NP-complete problem [33,34].
There exists many methods of re-ordering the adjacency graph to find an ordering that reduces fill-ins. Some popular orderings include Cuthill–McKee [35], minimum degree [36,37], minimum deficiency [38], and nested dissection [39]. This re-ordering is also known as symbolic factorization, after which numerical factorization is performed to actually compute the L and U components.
Numerical factorization routines include supernodal [40], frontal [41], and multifrontal [42] methods. Numerical factorization must be performed for each decomposition, however, if multiple A matrices have the same sparsity pattern but different coefficients, the symbolic factorization only needs to be performed once.
3 Implementation
In this section, we detail the implementation of our proposed method. The code was written in C++, using Eigen 3 [43] as the linear algebra library for the matrices and solvers, both sparse and dense. We chose the SuperLU [44] package for our sparse solver using a minimum degree ordering for symbolic factorization. This ordering is performed once, and then used in all subsequent numerical factorizations. Microsoft’s Visual Studio was used as the development environment. The code was compiled using the Microsoft Visual C++ 14.2 compiler to run on Windows x64 architecture.
The elements of Jv were calculated using Mathematica 12, with the resulting expressions being converted to C++. From these expressions, we were able to evaluate the number of FLOPS required to evaluate Jv. This was done by computing the number of mathematical operations (addition, subtraction, multiplication, division, and power) in each expression. Then, knowing the number of FLOPS required for each operation when using complex numbers (two for addition/subtraction and six for multiplication/division), we calculated the total number of FLOPS from all the operations. The results and sparsity pattern for the four-bar problem can be found in Fig. 2. For the first example problem below (four-bar synthesis), for the four-bar problem the eliminated Jv required 64 × more FLOPS to evaluate than its uneliminated counterpart. For the second example problem below (six-bar synthesis), the eliminated Jv required 3 × operations to evaluate (see Fig. 3).
Equation (15) is solved to calculate the next predicted point in the sequence using a chosen integration method for a given step size Δs. A row of n + 1 random coefficients is appended to to square it up, and an extra 1 is appended to the vector of zeros on the right-hand side.
Finally, the number of prediction–correction steps is logged and if a path exceeds a prescribed number of steps, tracking halts and the path are declared a failure. This is to keep very poorly behaving paths from using excessive computation time.
4 Demonstration
In this section, we demonstrate our method on two kinematic synthesis problems. The first is the four-bar synthesis problem for nine path points. This is a classical problem in kinematic synthesis, making it a nice benchmark. The first ab initio solution to this problem found 8652 roots and was computed in 1992 by Wampler et al. [14]. Despite this landmark result, subsequent parameter homotopies directed at physical parameters still present a computational bog, as we illustrate below, albeit at a smaller scale. The second problem is the Stephenson IIB timed curve synthesis, which seeks to design a planar six-bar mechanism. This problem is larger (estimated to have 1,122,022 roots) and was solved for the first time in 2020 [10].
For both these problems, the number of solutions found and the runtime are compared for dense and sparse implementations, eliminated (small) and uneliminated (large) equation sets. Additionally, we will compare our method to that of BERTINI [22], a sophisticated homotopy continuation solver that represents the state of the art. All computations referenced in this section were executed on an Intel i7-8700K 3.7GHz processor running on a single thread.
4.1 Four-Bar Synthesis for Nine Path Points.
The goal is to design a four-bar linkage that guides a trace point attached to its coupler link through a set of N points Pj, j = 0, …, N − 1. In a complex plane, we seek to find the two points of fixed pivots A = Ax + iAy and B = Bx + iBy as well as the locations of the moving pivots C = Cx + iCy and D = Dx + iDy that define the mechanism. The points C and D connect to the first task position P0 to define how the trace point connects to the coupler link.
4.1.1 Loop Equations.
4.1.2 Algebraic Elimination.
4.1.3 Ab Initio.
In a previous work, Plecnik and Fearing [45] were able to generate the complete starting set of 8652 roots using their finite root generation method. This set was used as the {} starting values for the eliminated problem, rather than re-computing the set again in an ab initio run. Additionally, we note that although it is well known that solutions may be organized into sets of six that reduce the required number of startpoints to 1442, we chose to benchmark on the full set of 8652.
The uneliminated problem used the same set as its starting values. However, it was necessary to find the {}, j = 1, …, 8, for each root as well. To do this, we solved the inverse kinematics problem by substituting the known starting values into (35) as well as the starting parameters {}, j = 1, …, 8. We then solved the substituted system and (34) for the corresponding set of two {} values. Similar substitutions were performed on (36) and solved with (34) to find a second set of {} pairs. The values that matched between the two sets were chosen as the correct value for the jth parameter, j = 1, …, 8.
Then, for each value, the corresponding values were found by substituting the known values into (28) and (30) and solving the resulting equations for . Similar substitutions into (29) and (31) yielded . It was in this way that the 8652 roots of the starting uneliminated system were generated from the eliminated set.
4.1.4 Path Tracking Results.
Both the eliminated and uneliminated polynomial systems were solved using the same set of target parameters, taken from problem 1 in Ref. [14]. Each of the 8652 paths were tracked single-threaded on the same hardware. A path at z* was declared a success at t = 1 if with a cutoff value of σ = 1 × 10−12. The results can be found in Table 1.
Equation type | Linear solver type | Precision | Success rate (%) | Time (s) |
---|---|---|---|---|
Eliminated | Dense | Double | 89.4% | 2413 |
Eliminated | Dense | Quad | 98.4% | 751,305 |
Eliminated | Sparse | Double | 82.8% | 9536 |
Uneliminated | Dense | Double | 95.1% | 16,001 |
Uneliminated | Sparse | Double | 96.4% | 3002 |
Equation type | Linear solver type | Precision | Success rate (%) | Time (s) |
---|---|---|---|---|
Eliminated | Dense | Double | 89.4% | 2413 |
Eliminated | Dense | Quad | 98.4% | 751,305 |
Eliminated | Sparse | Double | 82.8% | 9536 |
Uneliminated | Dense | Double | 95.1% | 16,001 |
Uneliminated | Sparse | Double | 96.4% | 3002 |
Notes: The baseline performance of using the eliminated equations with a dense linear solver in double precision yielded a modest success rate. This success rate was improved by increasing precision from double to quads, but doing so greatly increased the runtime. Using the eliminated equations with a sparse solver did not improve performance or accuracy, likely due to the associated linear systems being solved being fully dense. To contrast this, using the uneliminated formulation with a dense linear solver had a good success rate, but a longer runtime due to the increase in dimensionality over the eliminated formulation. Finally, our proposed methodology of using the uneliminated formulation with a sparse linear solver (highlighted) yielded a success rate comparable to using the eliminated formulation and quad math, but with a greatly improved runtime.
Solving the eliminated system with a dense linear solver achieved an 89.4% success rate when using double precision in 2413 s ( ≈ 40.2 min). In contrast, solving the uneliminated system with a sparse solver took longer of 3002 s ( ≈ 50 min) but achieved a success rate of 96.4%.
4.2 Stephenson IIB Timed Curve Generator.
The ab initio solutions of all six-bar timed curve path generators were first computed by Baskar and Plecnik [10]. We select one six-bar timed curve generator, the Stephenson IIB, to further benchmark our technique contributed in this work. The synthesis of timed curve path generation refers to finding linkage dimensions that move a trace through desired points as coordinated with the angle of an input link. Such a synthesis approach enables a designer to prescribe the transmission ratio of forces as well as the motion of the mechanism.
4.2.1 Loop Equations.
4.2.2 Algebraic Elimination.
4.2.3 Ab Initio.
The set of starting roots for the uneliminated problem has been previously computed in Ref. [10]. This set was found using a monodromy method to find 1,122,022 roots. For the eliminated problem, the same set can be used directly by omitting {Qj, Tj, Vj} from each root of the uneliminated start set. This subset satisfies the eliminated equations (56)–(58).
4.2.4 Path Tracking Results.
To quantify the advantage of our approach, it is sufficient to select a representative subset from the 1,122,022 roots and extrapolate. A random selection of 10,000 roots was made to serve as start points for parameter homotopies for benchmarking. This randomized set was used for both the uneliminated and the eliminated problems. All 10,000 paths were tracked single-threaded on the same hardware. The task positions were taken from problem T-1 in Ref. [10]. As in the previous problem, a path at z* was declared a success at t = 1 if with a cutoff value of σ = 1 × 10−12. The results can be found in Table 2.
Equation type | Linear solver type | Precision | Success rate (%) | Runtime (s) |
---|---|---|---|---|
Eliminated | Dense | Double | 75.8% | 2654 |
Eliminated | Dense | Quad | 97.8% | 184,569 |
Eliminated | Sparse | Double | 72.2% | 1923 |
Uneliminated | Dense | Double | 88.1% | 12,033 |
Uneliminated | Sparse | Double | 97.2% | 3394 |
Equation type | Linear solver type | Precision | Success rate (%) | Runtime (s) |
---|---|---|---|---|
Eliminated | Dense | Double | 75.8% | 2654 |
Eliminated | Dense | Quad | 97.8% | 184,569 |
Eliminated | Sparse | Double | 72.2% | 1923 |
Uneliminated | Dense | Double | 88.1% | 12,033 |
Uneliminated | Sparse | Double | 97.2% | 3394 |
Notes: The baseline performance of using the eliminated equations with a dense linear solver in double precision yielded a poor success rate. This success rate was improved by increasing precision from double to quads, but doing so greatly increased the runtime. Using the eliminated equations with a sparse solver did not improve accuracy but had a slight improvement in runtime, likely due to the semi-sparse structure of the linear systems for the eliminated problem (see Fig. 3(a)). Using the uneliminated formulation with a dense linear solver had improved success rate, but a longer runtime due to the increase in dimensionality over the eliminated formulation. Finally, our proposed methodology of using the uneliminated formulation with a sparse linear solver (highlighted) again yielded a success rate comparable to using the eliminated formulation and quad math, but with a greatly improved runtime.
The smaller eliminated system running a dense linear solver in double precision took 2654 s ( ≈ 44.2 min) to successfully track 75.8% of the 10,000 paths. Running the uneliminated system with a sparse solver took 3394 s ( ≈ 56.6 min) but achieved a success rate of 97.2%. Quad precision was required to achieve a similar success rate for the eliminated system, achieving a 97.8% success rate in 184,569 s ( ≈ 2.1 days) of computation time. Our approach combining uneliminated algebraic systems with sparse linear solvers created a 54 × speedup over quad precision while maintaining a comparable accuracy level.
5 Discussion
For both problems, the eliminated version ran faster than the uneliminated version when both used dense linear solvers. This is expected as the linear systems in the uneliminated problems are larger than those of the eliminated. However, path tracking the uneliminated problem with sparse linear solvers was more accurate than tracking the eliminated problem with dense methods, with the difference becoming more pronounced in the more complex Stephenson IIB timed curve problem. Achieving a similar level of accuracy in the eliminated problem required moving to quad precision, but doing so paid a heavy penalty in runtime.
To highlight the necessity of using both the uneliminated equations and sparse routines, the success rates and times for eliminated/sparse and uneliminated/dense implementations have also been reported. For both problems, simply using a sparse solver did not increase the accuracy and runtime. The eliminated/sparse implementation saw a decrease in accuracy over its eliminated/dense counterpart. For the four-bar problem, there was an increase in runtime while for there was a decrease in runtime for the six-bar problem. This is likely due to the fact that the eliminated Jv in the six-bar problem is still relatively sparse, while in the four-bar problem the eliminated matrix is fully dense. Additionally, for the uneliminated implementation of both problems, switching from a sparse to a dense linear solver resulted in a decrease in success rate and an increase in runtime.
When comparing the performance of our implementation with that of BERTINI, we found that for the four-bar path synthesis problem, BERTINI was able to successfully track 83% of paths in 218 s ( ≈ 3.6 min) using double precision. Using quad precision, BERTINI successfully tracked 96% of paths in 19237 s ( ≈ 5.3 h).
For the Stephenson IIB timing problem, BERTINI was able to find 8523 solutions in 338 s ( ≈ 5.6 min) for a success rate of 85% using double precision, and 9670 solutions in 55,652 s ( ≈ 15.5 h) for a success rate of 97% using quad precision. At this higher success rate, our uneliminated/sparse implementation had a speedup over BERTINI of 6.4 × for the four-bar problem and 16.4 × for the Stephenson IIB problem.
Extrapolating the results for 10,000 paths to the full 1,122,022 paths required for the Stephenson IIB problem, we estimate that our method of sparse solvers and uneliminated systems distributed over an high-performance computing cluster of 192 processors would take 1983 s ( ≈ 33 min). Similarly, we estimate that BERTINI would take 6,244,277 ( ≈ 9 h) to solve the same set to a comparable accuracy using quad precision.
It is important to note that Bertini implements additional sophisticated features that boost speed and accuracy which are not included in our basic sparse/unelimminated implementation. These features include automatic differentiation and endgame algorithms [22]. In light of this, the comparisons made in Tables 1 and 2 are experimentally more controlled and meaningful. BERTINI exclusively uses dense linear solvers as it is intended to solve any general system of polynomials with no guarantee on any special structure.
6 Conclusion
This paper presents a novel methodology for quickly and accurately solving kinematics equations without requiring non-native precision by both solving the original, uneliminated equations and using sparse linear solvers on the systems associated with path tracking. This provides a novel method for calculating the roots of a polynomial system using parameters associated with physical problems. These parameters are known to cause associated matrices to become ill-conditioned enough such that practical implementations of parameter homotopy on physical problems is greatly compromised by speed and precision setbacks. The increase in numerical failures can be alleviated by using 128-bit precision, but this greatly increases runtime as current hardware is not designed to natively run quad precision. For the problem of kinematic synthesis, the rotation operators of the loop equations are traditionally algebraically eliminated to decrease the size of the linear system to be solved. This has the effect of increasing the complexity and thus the number of floating point operations in evaluating the function and its corresponding Jacobian matrices. Leaving the rotation operators uneliminated decreases the cost of evaluating the functions and Jacobians but increases the size of the corresponding linear system. These uneliminated linear systems, while larger in dimension, end up being quite sparse in nature over their eliminated counterparts. By coupling the uneliminated kinematic synthesis equations with a sparse linear solver, we obtained a path tracking success rate comparable to running the eliminated system in quad math while keeping the speed of performing all calculations in double precision. This resulted in a maximum of 54.4 × speedup over conventional methods.
Acknowledgment
This paper is based upon work supported by the National Science Foundation under Grant No. CMMI-2144732.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.