Dynamic Trees with Alternative Search Trees
这篇文章实际上是这学期学术英语课的期末课程论文。写了点关于 LCT 的东西,试着将 Sleator 和 Tarjan 的用 biased tree 实现均摊 LCT 的证明改动了一下,希望能够证明 treap 也能做到一个
$\log$ 的均摊复杂度。可能不是很严谨,但我觉得大致方向上没有问题。如果有问题欢迎提出来 QAQ文中实现的三种 LCT (biased tree, splay, treap)的代码:http://github.com/riteme/toys/tree/master/dynamic-trees
原文 PDF:eap19-v3.pdf
Abstract. We present an analysis for weighted treaps in link-cut trees (LCT) with amortized logarithmic time bounds and conduct an empirical study that investigates the discrepancies between different underlying data structures in LCT. We implemented three variants of LCT with biased search trees, self-adjusting search trees and weighted treaps and ran several experiments on them. Splay trees are the most efficient in LCT with amortized time bounds in our experiments while there is a
Keywords: algorithms, data structures, dynamic trees, randomized search trees, self-adjusting search trees, biased search trees, experimental evaluation.
1 Introduction
Dynamic tree problem focuses on the maintenance of a collection of trees under several operations, including adding an edge, removing an existing edge, designating tree roots (for rooted forest) and querying on the unique path between two nodes. Data can be stored on either nodes or edges. For simplicity, a new node can represent an edge in the original trees in order to put all data only on nodes. Dynamic trees have numerous applications especially in graph theory, such as network flows132123 and dynamic graphs810111418.25
Dynamic tree problem has been well studied in literature, and there are many data structures that solve this problem, such as link-cut trees (LCT)2122, topology trees101112, Euler-tour trees (ETT)1423 and top trees2326. They all use search trees as underlying data structures but with different techniques: LCT uses path decomposition; ETT uses linearization; both topology trees and top trees use tree contraction.25 LCT seems to be the first complete solution to the original dynamic tree problem and has been applied to many combinatorial algorithms. Although all the data structures guarantee logarithmic time bounds for most dynamic tree operations, their efficiency differs in many aspects, due to the overheads of non-sequential accesses, heavy memory allocations and huge constant factors hidden in the asymptotic complexities. Meanwhile, the choice of underlying data structures may have significant impacts on the performance of dynamic trees due to different methods of implementations. There have been some reports on the performance issues between various dynamic trees25, but there seem to be few investigations in underlying data structures. In our research, we attempted to conduct empirical studies on the discrepancies of underlying search trees in LCT. The reason for our choice of LCT is that LCT is one of the lightest data structures among all variants of dynamic trees, since the overheads are primarily from the Join
and Split
of search trees rather than LCT itself. In previous studies, researchers usually use biased search trees21 and splay trees22 to maintain solid paths. The maintenance of solid paths relies heavily on biased searches, which implies alternatives such as weighted treaps207. In this paper, we provide an analysis for weighted treaps as implementation of underlying search trees in LCT, which should have amortized logarithmic time bounds.
The remainder of this paper is organized as follows. Section 2 introduces dynamic tree problem and outlines the technique of path decomposition and LCT. Section 3 reviews biased search trees, splay trees and randomized search trees, i.e. treaps, in the setting of dynamic trees and we will sketch the proof for treaps in this section. Section 4 describes the platform setup used for comparisons and presents the experiments with discussions of results. The last section contains the overall conclusions and implications for further researches. Appendix A contains graph terminologies used in this paper.
2 Dynamic Trees
Dynamic Tree Problem. The original dynamic tree problem21 is aimed to maintain the structures of a collection of vertex-disjoint rooted trees under three kinds of operations: Link
, Cut
and Evert
. Initially there are
Link
$(x,\ y)$ : Put a new edge$(x,\ y)$ and thus combine two trees into one. For rooted forest, we designate the root of the original tree that contains node$x$ as the root of the new tree.Cut
$(x)$ : Assuming that node$x$ is not the tree root, remove the edge$(x,\ p(x))$ , and thus divide the tree into two parts. One part contains$p(x)$ and the other part contains$x$ and$x$ is the tree root.Evert
$(x)$ : Let node$x$ be the new tree root. If$x$ is already the tree root, it will be ignored. This operation actually does not alter the structure of the tree but changes the ancestor-descendant relations, making all other nodes the descendants of$x$ .
In addition to operations that may alter tree structures, data can be associated with both nodes and edges, which enables queries on the chain between any two nodes. For simplicity, we can create new nodes to supersede the edges in original trees. Therefore, we can stored the data associated with edges on the corresponding nodes. Such simplification only leads to at most
Path Decomposition. In this paper, we focus on link-cut trees, or LCT as the abbreviation, which was introduced by Sleator and Tarjan21 and uses a technique called path decomposition. In path decomposition, each edge is marked as either solid or dashed and we guarantee that no more than one solid edge entering any node (solid edge invariant). Consecutive solid edges form a solid path. It is trivial to see that any solid path is within an access path. Intuitively, solid paths do not bend in the tree that the depths of nodes along the path are monotone. In this way, a tree is decomposed into a set of solid paths connected by dashed edges. In the remainder of this paper, we use the notion
Expose
$(x)$.In order to implement basic dynamic tree operations, we need two fundamental operations:
Splice
$(x)$ : Provided that$x$ is the topmost node of$c(x)$ and not the tree root, if there is a solid edge from$p(x)$ , we make this edge dashed. Thereby we are able to mark edge$(x,\ p(x))$ as solid to connect$c(x)$ and$c(p(x))$ without violating the solid edge invariant.Expose
$(x)$ : Make the access path of$x$ a solid path and all edges incident to this path dashed. This operation typically utilizesSplice
.
With Splice
and Expose
, other dynamic tree operations can be viewed as combinations of two operations.21
Since the depth of node in a solid path ascends from top to bottom, nodes can be kept in search trees such as biased search trees21 and self-adjusting search trees22. For Evert
, a reversal bit is supposed to be attached to each internal node in search trees. Because Splice
requires splitting and concatenating of search trees, we describe Split
and Join
formally as follows:
Split
$(r,\ x)$ :$r$ is a search tree and$x$ is the target node to be divided. We divide the tree into 1) at most three parts$(L,\ I,\ R)$ where$L$ denotes all nodes prior to$x$ and$R$ denotes all nodes after$x$ and$I = \{x\}$ if$x$ exists in the search tree and otherwise$I = \varnothing$ (3-waySplit
); 2) two parts$(L,\ R)$ where$L$ denotes all nodes prior to$x$ and$x$ itself if existed and$R$ denotes all nodes after$x$ (2-waySplit
). In both cases,$L$ and$R$ are still search trees.Join
: Given two search trees$x$ and$y$ where all elements in$x$ precede those in$y$ (2-wayJoin
$(x,\ y)$ ), or two search trees$x$ and$y$ along with a new node$k$ placed between the last node of$x$ and the first node of$y$ (3-wayJoin
$(x,\ y,\ k)$ ), they are merged into one search tree with correct connections. A 3-way join is usually interpreted into two 2-way joins.
Remark. Biased trees only support 2-way splits and 2-way joins while self-adjusting trees allow both variants of splits and joins.
One of the most prominent properties of LCT is illustrated by the following theorem.
Theorem 1. If the overall number of Link
, Cut
and Expose
is Splice
.21
Corollary 1. If the time complexity of Splice
is Link
, Cut
, Evert
and Expose
take
Proof. Directly derived from Theorem 1.
3 Reviews and Analyses
Among three schemes of LCT, the concept of ranks is used in analyses. The rank of
and then Splice
and other operations that may alter the tree structure. LCT keeps the invariant that total weight of an underlying search tree equals to the size of corresponding subtree.
Biased Search Trees. Bent et al.4 proposed biased Join
Split
at node Expose
, Sleator et al.21 proved that the amortized running time over a sequence of
Splay Trees. One of the most well-known self-adjusting search trees is the splay tree, introduced by Sleator and Tarjan22. Compared to conventional search trees such as AVL tree and red-black tree, both of which rely on extra information maintained in search tree nodes to preserve low tree height, splay tree adjusts itself by tree rotations whenever accessed and manages to gather all frequently accessed nodes in the vicinity of search tree root, optimizing for future accesses. Splay trees utilize a variation of move-to-root heuristic5, i.e. the so-called Splay
, that unexpectedly improves the performance. The original move-to-root heuristic simply moves the node to root by rotating once at each step, while Splay
may rotate at most twice at each step. Provided that all splay tree operations are based on Splay
, the only non-trivial work is to devise the upper bound of rotations in Splay
, which is depicted by the Access Lemma22.
Lemma 1. (Access Lemma) The amortized time to splay a tree with root
The magic Splay
actually predicts the weight of each node based on previous accesses instead of keeping them explicitly. The wonderful property of implicit weights enables us to obtain heterogeneous time bounds under different circumstances without the great cost of “re-weighting”. Splay trees have been integrated into LCT to allow simple and efficient implementations and Sleator et al.22 showed that such LCT achieved amortized logarithmic time bounds. However, splay trees show a significant increase in changes to tree structures, which contributes to the huge overheads of node updating and slows down splay trees when accesses are the primary operations, since other search trees do not alter the tree structure while accessing.
Randomized Search Trees. Seidel and Aragon20 thoroughly explored randomizations in balanced search trees and coined the term “treap” for search trees with random priorities on nodes. In addition to obeying symmetric order of search trees, treaps use tree rotations to stipulate that nodes with larger priorities are not descendants of those with smaller priorities. Seidel20 claimed that an access to Join
Join
, implicit priorities are equivalent to ordinary priorities in analysis.
Applying weighted treaps to LCT is feasible, as we will show in the remainder of this section. In our analysis, any node can represent the treap containing it, and
Since every special splice induces a new solid edge, it is suggested that there seem to be few special splices. Sleator et al.21 showed that there are at most Splice
per Expose
on average. It indicates that special splices are not the major overheads in Expose
. To carry out the amortized analysis, we store Join
and Split
. Proofs for the next two lemmas are deferred to Appendix B and Appendix C.
Lemma 2. (Join Lemma) If treap Join
Join
.
Lemma 3. (Split Lemma) Treap Split
Finally we investigate credit changes in Expose
.
Lemma 4. (Expose Lemma) Expose
takes
Proof. We interpret Expose
(1) Setup: Turn the solid edge from Split
. It takes
(2) Split Phase: Traverse from Split
at Expose
Split
. Note that Split
is obviously cast to
(3) Re-weight:
(4) Join Phase: Perform 3-way Join
Join
only takes Join
since
The sum of costs of all steps telescopes and gives a bound of
Our result is stated by the following theorem, which is derived immediately from Expose Lemma.
Theorem 2. If weighted treaps are used as underlying data structures in LCT, then all dynamic tree operations have amortized
Finally we emphasize that the bounds are not averaged on worst-case operation sequences due to the randomness of treaps.
4 Experiments
Experimental Setup. In this section we present an empirical study of impacts of different underlying data structures on LCT. Three kinds of search trees are used: biased 2-3 trees (biased), self-adjusting search trees (splay) and randomized search trees (treap). We tested all algorithms for minimum spanning forest problem (MSF) along with Kruskal’s http://github.com/riteme/toys/tree/master/dynamic-trees
. Source code was compiled by Clang 6.0.0 with -O3 -Ofast -DNDEBUG
options for full optimizations. Experiments were carried out on Intel™ i5 8300H running Ubuntu 18.04 (64bit operating system, Linux Kernel 4.20.13, Xorg not started) at 4.0GHz in a single core, with 128KB L1 data cache, 1MB L2 cache, 8MB L3 cache and 16GB RAM. CPU time was measured with high_resolution_clock
from Standard Template Library (STL) in C++ with 1ns precision. Each individual execution was repeated until the number of processed edges exceeded rand
in GNU C library) and we randomly chose seeds obtained from random_device
in STL to generate five different inputs for each set of parameters and report the average results. Time for initialization, finalization and reading files was not accounted.
In LCT-based solutions, Cut
and the new edge will be incorporated into dynamic trees via Link
. The new edge will be ignored if
Three solutions based on LCT used the same interface exported to the main program to guarantee uniformity. Treaps utilized Xorshift algorithm16 for extremely fast pseudo-random generations. Biased trees required extra memory for internal nodes and we implemented a stack-based memory pool for biased trees. All solutions were implemented in non-recursive fashion.
Random Data. We first tested on randomly generated graphs. Random graphs were obtained by generating
Cache Tests. Since the time complexity is logarithmic in theory, figures are expected to be linear. However, we noticed that figures approximately comprise two linear functions. Such peculiar phenomena was observed by Tarjan et al.25 as well and they deduced that cache-missing contributes to the increased slopes in right parts of figures. We attempted to examine the impacts of cache by a new data generator. We fixed
5 Conclusions and Discussions
In this paper, we reviewed and briefly analyzed three schemes to implement link-cut trees in amortized logarithmic time. They share an identical interface to users and only differ in the underlying data structures. It is mainly an interest of theoretical research to devise alternatives in amortized LCT because empirical study has demonstrated that splay-based version is the most efficient and other ordinary search trees take no advantage in an ever-changing data structure and splay trees are more memory-friendly and cache-friendly in the comparisons. Werneck26 mentioned that Alstrup et al. attempted to implement LCT with weighted treaps and compared its performance to splay-based LCT, but their experimental part is unpublished. However, if we need to keep historical versions of dynamic trees, LCT must guarantee worst-case time complexities instead of amortized ones. Sleator et al.21 presented a modified version of biased trees, i.e. globally-biased
Appendixes
A. Graph Terminology. In graph theory, a graph
B. Proof for Join Lemma. Consider the right spine of treap Join
. Join
will end up node Join
. The time to travel on both spines is
C. Proof for Split Lemma. Split
is the inverse of Join
(figure omitted). Let
Join
. (a) Left spine and right spine. (b) Treap $z$ after Join
. Dashed edges are removed during Join
.-
U. A. Acar, G. E. Blelloch, R. Harper, J. L. Vittes, and S. L. M. Woo, “Dynamizing Static Algorithms, with Applications to Dynamic Trees and History Independence,” in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2004, pp. 531–540. ↩
-
S. Alstrup, J. Holm, K. de Lichtenberg, and M. Thorup, “Minimizing Diameters of Dynamic Trees,” in Automata, Languages and Programming, 1997, pp. 270–280. ↩
-
S. Alstrup, J. Holm, K. D. Lichtenberg, and M. Thorup, “Maintaining Information in Fully Dynamic Trees with Top Trees,” ACM Trans. Algorithms, vol. 1, no. 2, pp. 243–264, Oct. 2005. ↩
-
S. Bent, D. Sleator, and R. Tarjan, “Biased Search Trees,” SIAM J. Comput., vol. 14, no. 3, pp. 545–568, Aug. 1985. ↩↩↩↩
-
J. Bitner, “Heuristics That Dynamically Organize Data Structures,” SIAM J. Comput., vol. 8, no. 1, pp. 82–110, Feb. 1979. ↩
-
G. E. Blelloch, D. Ferizovic, and Y. Sun, “Just Join for Parallel Ordered Sets,” in Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, New York, NY, USA, 2016, pp. 253–264. ↩
-
G. E. Blelloch and M. Reid-Miller, “Fast Set Operations Using Treaps,” in Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, New York, NY, USA, 1998, pp. 16–26. ↩
-
G. Cattaneo, P. Faruolo, U. F. Petrillo, and G. F. Italiano, “Maintaining Dynamic Minimum Spanning Trees: An Experimental Study,” in Algorithm Engineering and Experiments, 2002, pp. 111–125. ↩
-
J. Feigenbaum and R. E. Tarjan, “Two New Kinds of Biased Search Trees,” The Bell System Technical Journal, vol. 62, no. 10, pp. 3139–3158, Dec. 1983. ↩↩↩↩
-
G. Frederickson, “Data Structures for On-Line Updating of Minimum Spanning Trees, with Applications,” SIAM J. Comput., vol. 14, no. 4, pp. 781–798, Nov. 1985. ↩↩
-
G. Frederickson, “Ambivalent Data Structures for Dynamic 2-Edge-Connectivity and
$k$ Smallest Spanning Trees,” SIAM J. Comput., vol. 26, no. 2, pp. 484–538, Mar. 1997. ↩↩ -
G. N. Frederickson, “A Data Structure for Dynamically Maintaining Rooted Trees,” Journal of Algorithms, vol. 24, no. 1, pp. 37–65, Jul. 1997. ↩
-
A. V. Goldberg, M. D. Grigoriadis, and R. E. Tarjan, “Use of Dynamic Trees in a Network Simplex Algorithm for the Maximum Flow Problem,” Mathematical Programming, vol. 50, no. 1, pp. 277–290, Mar. 1991. ↩
-
M. R. Henzinger, V. King, and V. King, “Randomized Fully Dynamic Graph Algorithms with Polylogarithmic Time Per Operation,” J. ACM, vol. 46, no. 4, pp. 502–516, Jul. 1999. ↩↩
-
J. B. Kruskal, “On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem,” Proc. Amer. Math. Soc., vol. 7, no. 1, pp. 48–50, 1956. ↩
-
G. Marsaglia, “Xorshift RNGs,” Journal of Statistical Software, vol. 8, no. 1, pp. 1–6, Jul. 2003. ↩
-
B. Pfaff, “Performance Analysis of BSTs in System Software,” in Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, New York, NY, USA, 2004, pp. 410–411. ↩
-
T. Radzik, “Implementation of Dynamic Trees with In-subtree Operations,” J. Exp. Algorithmics, vol. 3, Sep. 1998. ↩
-
G. Ramalingam and T. Reps, “On the Computational Complexity of Dynamic Graph Problems,” Theoretical Computer Science, vol. 158, no. 1, pp. 233–277, May 1996. ↩
-
R. Seidel and C. R. Aragon, “Randomized Search Trees,” Algorithmica, vol. 16, no. 4, pp. 464–497, Oct. 1996. ↩↩↩
-
D. D. Sleator and R. E. Tarjan, “A Data Structure for Dynamic Trees,” in Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 1981, pp. 114–122. ↩↩↩↩↩↩↩↩↩↩↩
-
D. D. Sleator and R. E. Tarjan, “Self-adjusting Binary Search Trees,” J. ACM, vol. 32, no. 3, pp. 652–686, Jul. 1985. ↩↩↩↩↩↩↩
-
R. E. Tarjan, “Dynamic Trees as Search Trees via Euler Tours, Applied to the Network Simplex Algorithm,” Mathematical Programming, vol. 78, no. 2, pp. 169–177, Aug. 1997. ↩↩
-
R. E. Tarjan and R. F. Werneck, “Self-Adjusting Top Trees,” in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2005, pp. 813–822. ↩
-
R. E. Tarjan and R. F. Werneck, “Dynamic Trees in Practice,” J. Exp. Algorithmics, vol. 14, pp. 5:4.5–5:4.23, Jan. 2010. ↩↩↩↩↩↩↩
-
R. Werneck, “Design and Analysis of Data Structures for Dynamic Trees,” Princeton University, 2006. ↩↩