%body DoIT NET: %body
Home
 
Login
Data
FAQs
NetStats
Security
NetWatch
Maps
FAQs
NetStats
Security
NetWatch
Maps
Wiring
Video
Admin
Search
Internet Traffic Flow Size Analysis
This document contains the results of an informal investigation into the distribution of flow sizes between Fall 1999, Spring 2000, and Fall 2000, ostensibly the period of time during which "file sharing" applications such as Napster, Gnutella, Scour Exchange, and the like, became more popular and generally available.

This research is based upon internet traffic flow data collected at the University of Wisconsin - Madison.

This document is available here: http://net.doit.wisc.edu/data/flow/size/

Dave Plonka PLONKA at DOIT dot WISC dot EDU

Table of Contents

Internet Traffic Flow Size Analysis
    Introduction
    Background
    About the Flow Sampling Periods
    About the Graphs
    A Look at Average Flow Sizes
            Figure 2. Average Napster Flow Sizes
            Figure 3. Average ftp-data Flow Sizes
    General IP Flow Size Distributions
            Figure 4. Distribution of Flows Sizes, Fall 1999
            Figure 5. Distribution of Flow Sizes, Spring 2000
    Napster Flow Sizes
        NapUser Flow Size Distributions
            Figure 6. Cumulative Napster Flows % vs. Flow Size
            Figure 7. Napster Content % vs. Flow Size
        NapUser Flow Size Distribution Summary
            Figure 8. Fall 1999, Spring 2000, Fall 2000: Content % vs. Flow Size
    Summary
    Thanks
    Analysis Tools
    References

Introduction

This analysis was initiated as the result of some questions, which are paraphrased here:

  • Is there a peak in the distribution of flow sizes which approximates the size of a typical MP3 file?
  • Do flows of Napster traffic exhibit a characterstic signature in terms of the sizes of its IP flows?
  • While the use "sharing" applications such as Napster are increasing bandwidth used, is it also changing the typical size of IP flows?
Apart from those questions, we wondered if flow size analysis might provide a useful indication of general trends in Internet workload.

Background

A flow is a unidirectional series of IP packets of a given protocol, between a source and destination port, within a certain duration.

This analysis used flows as defined by Cisco's NetFlow V5 flow export format. The tunable "timeout active" value was set to one minute. This means that active flows were expired/exported in as little as one minute.

A portion of this analysis examines the distribution of flow sizes. This is similar in some ways to past investigations by others which examined the distribution of packet sizes.

About the Flow Sampling Periods

TotalNapster
SemesterSample "Day"InboundOutboundInboundOutbound
Fall 1999September 15-1626 Mb/s45 Mb/s??
Spring 2000May 12-1345 Mb/s73 Mb/s7 Mb/s21 Mb/s
Fall 2000November 1660 Mb/s110 Mb/s13 Mb/s31 Mb/s
At the time of the Spring 2000 sample, Napster traffic represented a significant portion of the campus traffic as a whole. Specifically, Napster is thought to have represented 29% of our campus outbound traffic, and 15% of the inbound traffic during this period. Also, during this Spring 2000 sample, Napster flows represented 13% of the outbound flows and 7% of the inbound flows.

The increase in the number of flows between the Fall and Spring is indicative of the increased internet usage that our campus observed throughout the 1999/2000 school year. This continous increase in data traffic has been recently investigated and reported on by others.

There was no rhyme or reason to our selection of those sample periods. The first two sample periods were the only ones for which data was available since detailed logs of campus traffic are not systematically retained over long periods of time. We retrieved these samples from backup tapes from infrequent manual backups of our real-time analysis machine. These backups were performed only for disaster recovery. As such, it was a stroke of luck that they coincided with interesting points in time regarding the use of "file sharing" applications.

About the Graphs

We use two methods to visualize flows sizes. In the first we plot average flow sizes over time.

In the second, we employ a step graph histogram of a distribution across 32 intervals, incremented by consecutive powers-of-two. That is, the first column represents flows of sizes 1 or 2 (units of packets or bytes, depending on line color), the second column represents 2 through 4, then 4 through 8, 8 through 16, and so on, up to the maximum size representable (2^31 through 2^32): approximately those between 2 billion and 4 billion.

A Look at Average Flow Sizes

Over the 1999-2000 academic school year, the average flow size appears to have increased, as evidenced by plots of average flow size and average TCP flow size over time. Furthermore, the average size of a Napster flow is significantly larger than that of IP flows in general. (Note that measurement of Napster traffic at the University of Wisconsin - Madison begain in March of 2000, so no previous Napster flow data is available.)

Although Napster flows are larger than average, there are other popular well-known applications, such as ftp-data transfers, with average flow sizes that far exceed those of Napster. This is evidenced by plots of average ftp-data flows size over time.

General IP Flow Size Distributions

Considering the size distribution in general - i.e. amongst IP flows of all types, there is similarity among the percentages of flows of particular sizes between Fall 1999 and Spring 2000. For instance, in either sample, about half of the flows are of sizes less than 512 bytes.

One curiousity visible when plotting the Cumulative Percentage of Flows vs. Flow Size is that, in Spring 2000, 5.7% of the flows were less than 32 bytes in contrast with 0.1% from the previous fall. Remembering that a full 27% of the Spring traffic during the Spring sample was Napster traffic, it is likely that the small flows represent the 28-byte "ping" packets generated by Napster, discussed below.

Considering the distribution of flow sizes from Fall 1999 and Spring 2000, there was an increase in the percentage of byte content delivered in flows between 4MB and 16MB in size. Specifically, the Spring 2000 samply shows a marked spike in those two intervals. In both these earlier samples the median is within the 2-4MB interval.

Considering the Fall 2000 sample, there is not quite as high a percentage of content being delivered in flows within just those two intervals (namely 4MB to 8MB and 8MB to 16MB), however, the median has shifted up to the 4MB to 8MB interval.

Napster Flow Sizes

Internet flows produced by the Napster application, and work-alike clones, are of, at least, these types:
  1. TCP initial connections from client user to "redirect" server
  2. TCP responses from "redirect" server to client user (specifying address of an "index" server)
  3. TCP commands/requests from client user to "index" server
  4. TCP responses from "index" server to client user
  5. ICMP ECHO from client user to candidate "server" user (28 byte packets)
  6. ICMP ECHOREPLY from candidate "server" user to client user (28 byte packets)
  7. TCP request from client user to "server" user (request and subsequent ACKs)
  8. TCP responses from "server" user to client user (possibly containing MP3 content)

The term "NapUser" is used below to label traffic believed to be generated by the Napster application, or one of its clones. For this analysis NapUser traffic was identified by a method implemented in FlowScan.

NapUser Flow Size Distributions

The following graphs represent the sizes of Napster application flows in terms of packets and bytes, both ICMP/TCP combined, and TCP alone. The "NapUser TCP bytes" and "NapUser TCP packets" plots, there-in, represent just the Napster TCP flows, and therefore emphasize the flows representing the content-carrying flows: those representing the interaction with Napster index servers and representing the bidirectional TCP data streams which carries the MP3 data.

NapUser Flow Size Distribution Summary

Most Napster-related flows are actually the small ICMP flows from Napster clients to candidate servers. As such, well over half the flows produced by the application carry a trivial amount of content as measured in bytes or packets. However, the average Napster-produced TCP flow is larger than the average flow amongst all types and therefore Napster does appear to have increased the size of the average Internet IP flow.

Summary

Napster flows are clearly larger than average, even when its numerous but often overlooked "ping" flows participate in the calculation. Furthermore, Napster flows appear to follow a characterstic pattern - namely that most Napster content is delivered in flows of sizes between 4 megabytes and 16 megabytes. This is perhaps not surprising since it is a reasonable estimate of the range of the MP3 files typically exchanged.

Considering our results, it seems likely that use of the popular file-sharing applications such as Napster will not only continue to increase bandwidth usage solely by virtue of their popularity, but will also likely shift the distribution of flow sizes higher. Operational implications of such a shift in Internet workload and usage characteristics warrants further study.

Thanks

Thank you to these folks for their assistance in the form of answers, and questions:

k claffy KC at CAIDA dot ORG
Michael Hare MHARE at DOIT dot WISC dot EDU

Analysis Tools

The following tools were used during this analysis:

References

  1. Sample Packet Size Distributions
    http://www.caida.org/outreach/presentations/Soa9911/mgp00025.html
  2. CAIDA's CoralReef Demos
    https://anala.caida.org/CoralReef/Demos/
  3. Internet growth: Is there a ``Moore's Law'' for data traffic?
    http://www.research.att.com/~amo/doc/networks.html
  4. UW-Madison's Napter Traffic Measurement
    http://net.doit.wisc.edu/data/Napster/
  5. NetStats - UW-Madison's NetStats web page
    http://wwwstats.net.wisc.edu/
  6. FlowScan - UW-Madison's Network Measurement Tool
    http://net.doit.wisc.edu/~plonka/FlowScan/
  7. CAIDA's Network Measurement Tool used by FlowScan
    http://www.caida.org/tools/measurement/cflowd/
  8. RRDtool, the Round Robin Database tool used by FlowScan
    http://ee-staff.ethz.ch/~oetiker/webtools/rrdtool/
  9. Cisco's IOS NetFlow feature
    http://www.cisco.com/warp/public/732/netflow/

$Id: index.wml,v 1.19 2002/10/16 20:36:13 egeiger Exp $