Keywords: algorithms, data structures, dynamic trees, randomized search trees, self-adjusting search trees, biased search trees, experimental evaluation.
Dynamic tree problem focuses on the maintenance of a collection of trees under several operations, including adding an edge, removing an existing edge, designating tree roots (for rooted forest) and querying on the unique path between two nodes. Data can be stored on either nodes or edges. For simplicity, a new node can represent an edge in the original trees in order to put all data only on nodes. Dynamic trees have numerous applications especially in graph theory, such as network flows132123 and dynamic graphs810111418.25
Dynamic tree problem has been well studied in literature, and there are many data structures that solve this problem, such as link-cut trees (LCT)2122, topology trees101112, Euler-tour trees (ETT)1423 and top trees2326. They all use search trees as underlying data structures but with different techniques: LCT uses path decomposition; ETT uses linearization; both topology trees and top trees use tree contraction.25 LCT seems to be the first complete solution to the original dynamic tree problem and has been applied to many combinatorial algorithms. Although all the data structures guarantee logarithmic time bounds for most dynamic tree operations, their efficiency differs in many aspects, due to the overheads of non-sequential accesses, heavy memory allocations and huge constant factors hidden in the asymptotic complexities. Meanwhile, the choice of underlying data structures may have significant impacts on the performance of dynamic trees due to different methods of implementations. There have been some reports on the performance issues between various dynamic trees25, but there seem to be few investigations in underlying data structures. In our research, we attempted to conduct empirical studies on the discrepancies of underlying search trees in LCT. The reason for our choice of LCT is that LCT is one of the lightest data structures among all variants of dynamic trees, since the overheads are primarily from the
Split of search trees rather than LCT itself. In previous studies, researchers usually use biased search trees21 and splay trees22 to maintain solid paths. The maintenance of solid paths relies heavily on biased searches, which implies alternatives such as weighted treaps207. In this paper, we provide an analysis for weighted treaps as implementation of underlying search trees in LCT, which should have amortized logarithmic time bounds.
The remainder of this paper is organized as follows. Section 2 introduces dynamic tree problem and outlines the technique of path decomposition and LCT. Section 3 reviews biased search trees, splay trees and randomized search trees, i.e. treaps, in the setting of dynamic trees and we will sketch the proof for treaps in this section. Section 4 describes the platform setup used for comparisons and presents the experiments with discussions of results. The last section contains the overall conclusions and implications for further researches. Appendix A contains graph terminologies used in this paper.
Dynamic Tree Problem. The original dynamic tree problem21 is aimed to maintain the structures of a collection of vertex-disjoint rooted trees under three kinds of operations:
Evert. Initially there are
In addition to operations that may alter tree structures, data can be associated with both nodes and edges, which enables queries on the chain between any two nodes. For simplicity, we can create new nodes to supersede the edges in original trees. Therefore, we can stored the data associated with edges on the corresponding nodes. Such simplification only leads to at most
Path Decomposition. In this paper, we focus on link-cut trees, or LCT as the abbreviation, which was introduced by Sleator and Tarjan21 and uses a technique called path decomposition. In path decomposition, each edge is marked as either solid or dashed and we guarantee that no more than one solid edge entering any node (solid edge invariant). Consecutive solid edges form a solid path. It is trivial to see that any solid path is within an access path. Intuitively, solid paths do not bend in the tree that the depths of nodes along the path are monotone. In this way, a tree is decomposed into a set of solid paths connected by dashed edges. In the remainder of this paper, we use the notion
In order to implement basic dynamic tree operations, we need two fundamental operations:
Expose, other dynamic tree operations can be viewed as combinations of two operations.21
Since the depth of node in a solid path ascends from top to bottom, nodes can be kept in search trees such as biased search trees21 and self-adjusting search trees22. For
Evert, a reversal bit is supposed to be attached to each internal node in search trees. Because
Splice requires splitting and concatenating of search trees, we describe
Join formally as follows:
Split); 2) two parts
Split). In both cases,
Join: Given two search trees
Remark. Biased trees only support 2-way splits and 2-way joins while self-adjusting trees allow both variants of splits and joins.
One of the most prominent properties of LCT is illustrated by the following theorem.
Theorem 1. If the overall number of
Corollary 1. If the time complexity of
Proof. Directly derived from Theorem 1.
Among three schemes of LCT, the concept of ranks is used in analyses. The rank of
Splice and other operations that may alter the tree structure. LCT keeps the invariant that total weight of an underlying search tree equals to the size of corresponding subtree.
Biased Search Trees. Bent et al.4 proposed biased
Split at node
Expose, Sleator et al.21 proved that the amortized running time over a sequence of
Splay Trees. One of the most well-known self-adjusting search trees is the splay tree, introduced by Sleator and Tarjan22. Compared to conventional search trees such as AVL tree and red-black tree, both of which rely on extra information maintained in search tree nodes to preserve low tree height, splay tree adjusts itself by tree rotations whenever accessed and manages to gather all frequently accessed nodes in the vicinity of search tree root, optimizing for future accesses. Splay trees utilize a variation of move-to-root heuristic5, i.e. the so-called
Splay, that unexpectedly improves the performance. The original move-to-root heuristic simply moves the node to root by rotating once at each step, while
Splay may rotate at most twice at each step. Provided that all splay tree operations are based on
Splay, the only non-trivial work is to devise the upper bound of rotations in
Splay, which is depicted by the Access Lemma22.
Lemma 1. (Access Lemma) The amortized time to splay a tree with root
Splay actually predicts the weight of each node based on previous accesses instead of keeping them explicitly. The wonderful property of implicit weights enables us to obtain heterogeneous time bounds under different circumstances without the great cost of “re-weighting”. Splay trees have been integrated into LCT to allow simple and efficient implementations and Sleator et al.22 showed that such LCT achieved amortized logarithmic time bounds. However, splay trees show a significant increase in changes to tree structures, which contributes to the huge overheads of node updating and slows down splay trees when accesses are the primary operations, since other search trees do not alter the tree structure while accessing.
Randomized Search Trees. Seidel and Aragon20 thoroughly explored randomizations in balanced search trees and coined the term “treap” for search trees with random priorities on nodes. In addition to obeying symmetric order of search trees, treaps use tree rotations to stipulate that nodes with larger priorities are not descendants of those with smaller priorities. Seidel20 claimed that an access to
Join, implicit priorities are equivalent to ordinary priorities in analysis.
Applying weighted treaps to LCT is feasible, as we will show in the remainder of this section. In our analysis, any node can represent the treap containing it, and
Since every special splice induces a new solid edge, it is suggested that there seem to be few special splices. Sleator et al.21 showed that there are at most
Expose on average. It indicates that special splices are not the major overheads in
Expose. To carry out the amortized analysis, we store
Split. Proofs for the next two lemmas are deferred to Appendix B and Appendix C.
Lemma 2. (Join Lemma) If treap
Lemma 3. (Split Lemma) Treap
Finally we investigate credit changes in
Lemma 4. (Expose Lemma)
Proof. We interpret
(1) Setup: Turn the solid edge from
Split. It takes
(2) Split Phase: Traverse from
Split. Note that
Split is obviously cast to
(4) Join Phase: Perform 3-way
Join only takes
The sum of costs of all steps telescopes and gives a bound of
Our result is stated by the following theorem, which is derived immediately from Expose Lemma.
Theorem 2. If weighted treaps are used as underlying data structures in LCT, then all dynamic tree operations have amortized
Finally we emphasize that the bounds are not averaged on worst-case operation sequences due to the randomness of treaps.
Experimental Setup. In this section we present an empirical study of impacts of different underlying data structures on LCT. Three kinds of search trees are used: biased 2-3 trees (biased), self-adjusting search trees (splay) and randomized search trees (treap). We tested all algorithms for minimum spanning forest problem (MSF) along with Kruskal’s
http://github.com/riteme/toys/tree/master/dynamic-trees. Source code was compiled by Clang 6.0.0 with
-O3 -Ofast -DNDEBUG options for full optimizations. Experiments were carried out on Intel™ i5 8300H running Ubuntu 18.04 (64bit operating system, Linux Kernel 4.20.13, Xorg not started) at 4.0GHz in a single core, with 128KB L1 data cache, 1MB L2 cache, 8MB L3 cache and 16GB RAM. CPU time was measured with
high_resolution_clock from Standard Template Library (STL) in C++ with 1ns precision. Each individual execution was repeated until the number of processed edges exceeded
rand in GNU C library) and we randomly chose seeds obtained from
random_device in STL to generate five different inputs for each set of parameters and report the average results. Time for initialization, finalization and reading files was not accounted.
In LCT-based solutions,
Cut and the new edge will be incorporated into dynamic trees via
Link. The new edge will be ignored if
Three solutions based on LCT used the same interface exported to the main program to guarantee uniformity. Treaps utilized Xorshift algorithm16 for extremely fast pseudo-random generations. Biased trees required extra memory for internal nodes and we implemented a stack-based memory pool for biased trees. All solutions were implemented in non-recursive fashion.
Random Data. We first tested on randomly generated graphs. Random graphs were obtained by generating
Cache Tests. Since the time complexity is logarithmic in theory, figures are expected to be linear. However, we noticed that figures approximately comprise two linear functions. Such peculiar phenomena was observed by Tarjan et al.25 as well and they deduced that cache-missing contributes to the increased slopes in right parts of figures. We attempted to examine the impacts of cache by a new data generator. We fixed
In this paper, we reviewed and briefly analyzed three schemes to implement link-cut trees in amortized logarithmic time. They share an identical interface to users and only differ in the underlying data structures. It is mainly an interest of theoretical research to devise alternatives in amortized LCT because empirical study has demonstrated that splay-based version is the most efficient and other ordinary search trees take no advantage in an ever-changing data structure and splay trees are more memory-friendly and cache-friendly in the comparisons. Werneck26 mentioned that Alstrup et al. attempted to implement LCT with weighted treaps and compared its performance to splay-based LCT, but their experimental part is unpublished. However, if we need to keep historical versions of dynamic trees, LCT must guarantee worst-case time complexities instead of amortized ones. Sleator et al.21 presented a modified version of biased trees, i.e. globally-biased
A. Graph Terminology. In graph theory, a graph
B. Proof for Join Lemma. Consider the right spine of treap
Join will end up node
Join. The time to travel on both spines is
C. Proof for Split Lemma.
Split is the inverse of
Join (figure omitted). Let
Join. (a) Left spine and right spine. (b) Treap $z$ after
Join. Dashed edges are removed during
U. A. Acar, G. E. Blelloch, R. Harper, J. L. Vittes, and S. L. M. Woo, “Dynamizing Static Algorithms, with Applications to Dynamic Trees and History Independence,” in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2004, pp. 531–540. ↩
S. Alstrup, J. Holm, K. de Lichtenberg, and M. Thorup, “Minimizing Diameters of Dynamic Trees,” in Automata, Languages and Programming, 1997, pp. 270–280. ↩
S. Alstrup, J. Holm, K. D. Lichtenberg, and M. Thorup, “Maintaining Information in Fully Dynamic Trees with Top Trees,” ACM Trans. Algorithms, vol. 1, no. 2, pp. 243–264, Oct. 2005. ↩
J. Bitner, “Heuristics That Dynamically Organize Data Structures,” SIAM J. Comput., vol. 8, no. 1, pp. 82–110, Feb. 1979. ↩
G. E. Blelloch, D. Ferizovic, and Y. Sun, “Just Join for Parallel Ordered Sets,” in Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, New York, NY, USA, 2016, pp. 253–264. ↩
G. E. Blelloch and M. Reid-Miller, “Fast Set Operations Using Treaps,” in Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, New York, NY, USA, 1998, pp. 16–26. ↩
G. Cattaneo, P. Faruolo, U. F. Petrillo, and G. F. Italiano, “Maintaining Dynamic Minimum Spanning Trees: An Experimental Study,” in Algorithm Engineering and Experiments, 2002, pp. 111–125. ↩
G. N. Frederickson, “A Data Structure for Dynamically Maintaining Rooted Trees,” Journal of Algorithms, vol. 24, no. 1, pp. 37–65, Jul. 1997. ↩
A. V. Goldberg, M. D. Grigoriadis, and R. E. Tarjan, “Use of Dynamic Trees in a Network Simplex Algorithm for the Maximum Flow Problem,” Mathematical Programming, vol. 50, no. 1, pp. 277–290, Mar. 1991. ↩
J. B. Kruskal, “On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem,” Proc. Amer. Math. Soc., vol. 7, no. 1, pp. 48–50, 1956. ↩
G. Marsaglia, “Xorshift RNGs,” Journal of Statistical Software, vol. 8, no. 1, pp. 1–6, Jul. 2003. ↩
B. Pfaff, “Performance Analysis of BSTs in System Software,” in Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, New York, NY, USA, 2004, pp. 410–411. ↩
T. Radzik, “Implementation of Dynamic Trees with In-subtree Operations,” J. Exp. Algorithmics, vol. 3, Sep. 1998. ↩
G. Ramalingam and T. Reps, “On the Computational Complexity of Dynamic Graph Problems,” Theoretical Computer Science, vol. 158, no. 1, pp. 233–277, May 1996. ↩
R. E. Tarjan and R. F. Werneck, “Self-Adjusting Top Trees,” in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2005, pp. 813–822. ↩