Dynamic Distance Maps of the Internet Wolfgang Theilmann, Kurt Rothermel Institute of Parallel and Distributed High-Performance Systems (IPVR), University of Stuttgart, D-70565 Stuttgart, Germany email: [theilmannjrothermel]@informatik.uni-stuttgart.de Fakultätsbericht (Technical Report) 1999/08 Department of Computer Science University of Stuttgart, Germany July 1999 Abstract There is an increasing number of Internet applications that attempt to optimize their network communication by considering the network distance across which data is transferred. Such applications range from replication management to mobile agent applications. One major problem of these applications is to e?ciently acquire distance information for large computer networks. This paper presents an approach to create a global view on the Internet, a so-called network distance map, which realizes a hierarchical decomposition of the network into regions and which allows to estimate the network distance between any two hosts. This view is not only a single snapshot but is dynamically adapted to the continuously changing network conditions. The main idea is to use a certain set of hosts for performing distance measurements and to use the so gained information for estimating the distance between arbitrary hosts. A hierarchical clustering provides the notion of regions and allows to coordinate the measurements in such a way that the resulting network load is minimized. An experimental evaluation on the basis of 119 globally distributed measurement servers shows that already a small number of measurement servers allows to construct fairly accurate distance maps at low costs. 1 Introduction The Internet, today's most important and largest computer network, is suffering from serious performance problems. Apart from improving the protocols on the network layer, various 1 approaches have been undertaken to speed up communication at the application level by minimizing the distance across which data is transferred. For example, clients of replicated services or replicated data want to locate the nearest location of a replicated entity [3, 11]. Replication servers want to disseminate popular data items towards large client groups [12, 2, 16]. Hierarchical caches and distributed object repositories aim to reflect a decomposition of the network into a hierarchy of regions [4, 20]. Routing of user queries in distributed Digital Libraries is optimized at the application level [14]. Finally, in mobile agent systems the distance between client and server is needed to decide whether to ship the client or to load the data over the network [19]. A major problem of all these applications is to learn about network distances without probing the whole network. For many of them it would even be enough to learn about the coarse adherence of hosts to regions (e.g. [2, 12, 14, 20]). However, most current applications are based on ad hoc solutions, which do not really solve the problem. In addition, most solutions are only appropriate for a particular application scenario and cannot be shared by different applications. This paper presents so-called network distance maps, which offer a global view on a computer network such that the distance between any two network hosts can be derived. Besides, distance maps provide a decomposition of the network into a hierarchy of regions. They can be constructed for any distance metric. Moreover, distance maps are dynamic in the way that they adapt continuously to changing network conditions. The main idea of this approach is to use a subset of hosts to perform distance measurements and to cluster these hosts hierarchically into regions of closely connected hosts. It is important to stress that the approach applies to any interconnected network that supports some form of distance measurement. The remainder of this paper proceeds as follows. After a discussion of related work in Section 2, Section 3 introduces some general assumptions, goals and restrictions of our approach. Section 4 presents the approach of network distance maps in detail and Section 5 reports on an experimental evaluation. Finally, Section 6 summarizes our conclusions. 2 Related Work The estimation of real network distances through geographic distances has been proposed by Gwertzman and Seltzer [12]. However, they found out that the correlation between geographic distances and network distances is rather poor, especially between different backbones. In addition, there is, so far, no possibility to automatically determine the geographic location of all Internet hosts. Various approaches propose to perform local measurements, e.g. from clients to replication servers ([3, 11]) or vice versa [2]. This way, a host is able to learn the distance between itself and some remote hosts, but it cannot derive the distance between two remote hosts. This is especially required for replication servers, which need to discover the distance between possible replication locations and clients [2]. A second deficiency of these approaches is the high overhead that occurs if every single host is responsible for performing measurements. Consider, for example, two closely connected clients of a replicated service. Since they do not know of each other they have to do almost the same measurements. 2 Rabinovich et al. exploit the information available in the routing tables of Internet routers [16]. In comparison to probing approaches this is a very e?cient way of learning network distances. On the other hand, such an approach is bound to the distance metrics available in the routing tables. This can be an important restriction since, for example, the distance metric used for the routing between autonomous systems (ASs) is simply the number of traversed ASs [17]. In addition, the access to such tables is not public. So in general, such an approach can only be followed by the operator of an autonomous system and is restricted to distances between hosts within this system. Two protocols for the dissemination of distance information, namely the SONAR- and the HOPS-service, have been proposed by [15], [5] respectively. A SONAR-server offers distance information between itself and arbitrary remote hosts. The HOPS-service offers distances between arbitrary hosts, by distributing the information in a DNS-like hierarchy. However, the problem of acquiring distance information is not addressed. To our knowledge, the first and only approach for a global, measurement based view on the Internet has been presented by Francis et al. [6]. They propose to use a set of servers that measure the distances between themselves and to other end systems. Shortest path algorithms shall prune the resulting data structure. Two different models are presented. The first one tries to discover the real Internet topology, i.e. determines autonomous systems (ASs) and inter-AS links as well as intra-AS links. The second model is simply based on the measurement of endto-end distances. To determine the closest measurement server for every end systems, they propose a random driven approach in which each server repeatedly measures its distance to a randomly selected end system and checks whether it is closer than the so far closest known server. Unfortunately, the whole discussion is determined by the goal to minimize the amount of data needed to store the network distance information. The network load caused by the measurements is not considered. No concrete algorithms have been presented and the problem of updating the acquired data structures is poorly discussed. We differ from the proposed methods in that we try to achieve scalability (both in terms of network load and storage requirements) by clustering the set of measurement servers hierarchically. 3 Assumptions, Goals and Restrictions This section introduces the major assumptions and goals of our approach and provides a discussion of the inherent restrictions, we have to face. 3.1 System Model Network. Our model of a network consists of a set of hosts H together with a function ?(x; y) that assigns a distance to each pair of hosts x; y 2 H. We assume distances to be non-negative and symmetric. We will discuss the necessity of this symmetry assumption in Section 4 and will show the extent of its validity for our experimental validation in Section 5. We neither require any special distance metric nor care about the method for performing a single distance measurement. Instead, our approach can be used with any distance metric, for example, the number of hops (i.e. the number of network routers existing on a path between 3 two hosts), the round trip time (the time needed to transmit a simple datagram packet to a remote host and back), packet loss rates, bandwidth or anything else. Because of space limitations, we do not discuss how our algorithms deal with host or network failures. Instead, we assume for our presentation that any pair of network hosts is connected. Section 4.5 presents some basic principles how fault tolerance can be included into our approach. A possible optimization, which we do not discuss in this paper but which could be easily integrated into our approach, is to reduce the granularity of the considered Internet to address prefixes, i.e. to group together all hosts with the same address prefix. A good discussion of this optimization idea can be found in [6]. Network View. Of course, we cannot assume to have access to every network's host in order to perform distance measurements. However, we assume the availability of a set M ? H of measurement servers (called mServers for the remainder of this article) that allow to perform distance measurements to arbitrary hosts. Remark that this applies only to the \public" part of the Internet since we cannot measure distances to hosts behind firewalls or to hosts within private networks. But even the distance to a firewall should be quite useful for an application from the outside. In the following, we will call a host simple host if we want to emphasize that it is not an mServer. 3.2 Accuracy and Timeliness of Distance Information An important aspect of network distance maps is the quality (i.e. the accuracy and timeliness) of the distance information they provide. However, this aspect is influenced by some factors, external to the development of network map algorithms: First of all, there is the extra communication overhead we want to spend. This tolerable overhead in turn depends on the number of applications that make use of the distance map. Given the tolerable overhead, accuracy and timeliness depend on two additional factors: One is the effort needed to perform a single distance measurement. For example, the measurement of the currently available bandwidth causes an overhead much higher than the measurement of the current round trip time [3]. The other is the variation scale of the respective metric. For example, #hops distances are supposed to be relatively static, while current round trip time is highly dynamic. To enable a maximum of flexibility for the operator of a distance map, we developed algorithms that can be tuned to arbitrary degrees of accuracy and timeliness. 3.3 Scalability There are three requirements in terms of scalability that must be satisfied by algorithms to distance maps. Firstly, the network load caused by the process of constructing and maintaining a map must be limited. Because large computer networks consist of millions of hosts, it is out of question to perform measurements from every host to every other host. Secondly, distance maps should have small storage requirements. This allows to replicate them to almost any ordinary server system. Finally, the derivation of distance values should be quickly feasible so that a distance map server can satisfy a lot of requests almost simultaneously. 4 4 Network Distance Maps This section introduces the approach of network distance maps in detail. After an overview on the basic ideas of our approach, we proceed with the main data structure. Then, we present the algorithms for constructing and updating a map and finally, we discuss extensions for the treatment of failures. 4.1 Overview The basic ideas of our approach are (1) to rely on a set of measurement servers (mServers), (2) to measure the distances between these mServers, (3) to assign simple hosts to their most closely connected mServer and (4) to estimate the distance between two hosts by the distance between their two assigned closest mServers. The accuracy of this approach is limited by the number and distribution of the mServers. The more mServers we have the average distance between a host and its closest mServer becomes smaller, and so the estimation by the distance between mServers becomes more accurate. Figure 1 sketches this scenario. The distances between hosts 1 and 2 and between hosts 2 and 3 are both measurement server simple host 3 2 A B C true distance inter server distance dist. to closest server 1 Figure 1: Distance estimation with measurement servers. estimated by the distance between mServers A and B. While the first estimation is supposed to be relatively precise, the second one is not due to the large distance between host 3 and its closest mServer A. Two problems in terms of scalability of the network load arise with this approach. Firstly, the computation of the closest assignment for simple hosts requires to measure the distance to this host from every mServer. Secondly, the number of distance measurements between mServers grows quadratically with the number of mServers. A lot of these measurements are redundant. For example in the scenario sketched in Figure 1, the distance between mServers A and C need not be measured if we know that C is close to B and B is far from A. To solve these problems we cluster the mServers in a hierarchical manner, thus achieving a decomposition into regions in which each region is further refined into subregions. For each region/cluster we select a representative mServer. Then, the closest assignment for simple hosts can be done hierarchically by measuring the distance of the respective host to each representative of a toplevel cluster. The cluster with the closest representative is selected and the process is continued for its subclusters. Since we do not want to measure all distances between mServers for the initial clustering, we propose a mixed algorithm, that first computes a pre-clustering on a subset of the available mServers and then assigns the additional mServers to their most appropriate cluster. 5 It is important to remark that our cluster algorithms require symmetric distances. Otherwise they cannot decide whether to group together two entities or not, especially if the two distances significantly differ from each other. 4.2 Data Structure The data structure of a network distance map is presented in Figure 2. It consists of a cluster tree and the additional assignment of simple hosts to their closest mServer. Inner nodes of this tree represent clusters of mServers, leaf nodes correspond to single mServers. The tree is is partitioned into measurement server host cluster distance between 2 distance between host and closest measurement server clusters / servers ... Cluster 1 Cluster 2 Cluster 2.1 Cluster 2.2 Network ... ... Figure 2: Representation of a network distance map. extended by distance values between sibling nodes. The distance between two sibling clusters is an estimation for the average distance between arbitrary hosts belonging to these clusters. The distance between sibling mServers is directly derived from the corresponding network measurement. In addition, the assignment between simple hosts and their closest mServer is extended with the associated, measured network distance. Distances (denoted by k?; ?k) can be derived from this tree representation as follows. The distance km1; m2k between two mServers m1 and m2 is estimated as the distance between the children of the least common ancestor1 of m1 and m2. Based on this, the distance kh1; h2k between two arbitrary hosts h1 and h2 is estimated as maxfkcl(h1); cl(h2)k; ?(h1; cl(h1)); ?(h2; cl(h2))g; where cl denotes the closest assignment and ? the associated, measured distances. Note that the first argument in the max-function is estimated from the cluster tree while the other ones are gained by network measurements. The max-function guarantees accurate distance estimations also for hosts, which are connected to the same mServer, and for hosts, which are not closely connected to any mServer, i.e. which are not well covered by the cluster tree. The derivation of any distance is feasible in linear time, i.e. linear to the tree's depth. The storage requirements of a cluster tree can be estimated as follows: Let k be the maximum number of sub-clusters or mServers within one cluster and d the depth of the cluster tree 1The ancestor-relation is the transitive extension of the parent-relation. 6 (the root's depth is defined as 0). Then, a complete cluster tree consists of (kd+1 ? 1)=(k ? 1) nodes. Since each node must contain distance values for its siblings (at most k ? 1) and assuming a well balanced tree with jMj = kd we can specify an upper bound for the amount of data required for this representation of a network to O jHj + kd+1 ? 1 k ? 1 ? (k ? 1) ! = O (jHj + jMj ? k) : The parameter k determines the tradeoff between the accuracy of the network's representation and the amount of required data. Setting k to jMj would result in an exact representation of a network view except for the fact that the simple hosts are only linked to their closest mServer. Taking an example network with 10.000 mServers and 1 million hosts and assuming k = 10 we need only 1:1 ? 106 data items which is better by orders of magnitude than the 1010 items required for a complete representation of a network view. Cluster Representative. A cluster representative should satisfy two conflicting goals: On the one hand, it should be \representative" in the way that distances from hosts outside the cluster to the cluster representative should be similar to the distances to the other cluster's hosts. On the other hand, the election of a representative should not induce a tremendous additional network load. This is especially important for the updating process. We decided to take the cluster centre as the representative, i.e. the host for which the maximum distance to the other hosts of a cluster is minimal. This choice has several advantages: It does not require any additional network measurements, neither for the initial computation nor for the updating. If we compute the centres hierarchically, that is if the centre of each cluster is computed as the centre of its child-clusters' centres, we can use the distances derived from the cluster tree. In addition, cluster centres, which are characterized by their good internal connectivity, are supposed not to have an unrepresentative bad connectivity to the outside. Therefore, they are quite appropriate for the insertion and updating of simple hosts (compare to the algorithms in the next paragraphs). 4.3 Initial Computation of a Distance Map Cluster Criteria. The first question is how to compute an effective clustering. Francis et al. discuss the value of clustering according to autonomous systems (ASs) in the Internet [6]. However, they abandon this idea since ASs may be very widespread and hosts from different AS may be close to each other (in terms of latency). According to our way of deriving distances from the cluster tree, the optimal cluster criterion would be the Min k-Avg-Error criterion, which minimizes the average error that occurs from our distance estimation, i.e. minimizes the expression X i;j2[1::k] i