Internet Traffic Flow Size Analysis
|
This document contains the results of an informal investigation into
the distribution of flow sizes between Fall 1999, Spring 2000, and Fall
2000, ostensibly the period of time during which "file sharing"
applications such as Napster, Gnutella, Scour Exchange, and the like,
became more popular and generally available. This research is
based upon internet traffic flow data collected at the University of
Wisconsin - Madison. This document is available here:
http://net.doit.wisc.edu/data/flow/size/
|
Dave Plonka
|
|
|
Table of Contents
Internet Traffic Flow Size Analysis
Introduction
Background
About the Flow Sampling Periods
About the Graphs
A Look at Average Flow Sizes
Figure 2. Average Napster Flow Sizes
Figure 3. Average ftp-data Flow Sizes
General IP Flow Size Distributions
Figure 4. Distribution of Flows Sizes, Fall 1999
Figure 5. Distribution of Flow Sizes, Spring 2000
Napster Flow Sizes
NapUser Flow Size Distributions
Figure 6. Cumulative Napster Flows % vs. Flow Size
Figure 7. Napster Content % vs. Flow Size
NapUser Flow Size Distribution Summary
Figure 8. Fall 1999, Spring 2000, Fall 2000: Content % vs. Flow Size
Summary
Thanks
Analysis Tools
References
This analysis was initiated as the result of some questions, which are
paraphrased here:
-
Is there a peak in the distribution of flow sizes which approximates the
size of a typical MP3 file?
-
Do flows of Napster traffic exhibit a characterstic signature in terms
of the sizes of its IP flows?
-
While the use "sharing" applications such as Napster are increasing
bandwidth used, is it also changing the typical size of IP flows?
Apart from those questions, we wondered if flow size analysis might
provide a useful indication of general trends in Internet workload.
A flow is a unidirectional series of IP packets of a given
protocol, between a source and destination port, within a certain
duration.
This analysis used flows as defined by Cisco's
NetFlow
V5 flow export format. The tunable "timeout active" value was set to one
minute. This means that active flows were expired/exported in as
little as one minute.
A portion of this analysis examines the distribution of flow sizes.
This is similar in some ways to past
investigations
by others which
examined the distribution of packet sizes.
| | Total | Napster |
| Semester | Sample "Day" | Inbound | Outbound | Inbound | Outbound |
| Fall 1999 | September 15-16 | 26 Mb/s | 45 Mb/s | ? | ? |
| Spring 2000 | May 12-13 | 45 Mb/s | 73 Mb/s | 7 Mb/s | 21 Mb/s |
| Fall 2000 | November 16 | 60 Mb/s | 110 Mb/s | 13 Mb/s | 31 Mb/s |
At the time of the Spring 2000 sample, Napster traffic represented a
significant portion of the campus traffic as a whole. Specifically,
Napster is thought to have represented 29% of our campus outbound
traffic, and 15% of the inbound traffic during this period. Also,
during this Spring 2000 sample, Napster flows represented 13% of the
outbound flows and 7% of the inbound flows.
The increase in the number of flows between the Fall and Spring is
indicative of the increased internet usage that our campus observed
throughout the 1999/2000 school year. This continous increase in data
traffic has been recently investigated and reported on by
others.
There was no rhyme or reason to our selection of those sample periods. The
first two sample periods were the only ones for which data was available
since detailed logs of campus traffic are not systematically retained over
long periods of time. We retrieved these samples from backup tapes from
infrequent manual backups of our real-time analysis machine.
These backups were performed only for disaster recovery.
As such, it was a stroke of luck that they coincided with interesting points
in time regarding the use of "file sharing" applications.
We use two methods to visualize flows sizes. In the first we plot average
flow sizes over time.
In the second, we employ a step graph histogram of a distribution
across 32 intervals, incremented by consecutive powers-of-two. That
is, the first column represents flows of sizes 1 or 2 (units of packets
or bytes, depending on line color), the second column represents 2
through 4, then 4 through 8, 8 through 16, and so on, up to the maximum
size representable (2^31 through 2^32): approximately those between 2
billion and 4 billion.
Over the 1999-2000 academic school year, the average flow size appears
to have increased, as evidenced by plots of average flow size and
average TCP flow size over time. Furthermore, the average size of a
Napster flow is significantly larger than that of IP flows in general.
(Note that measurement of Napster traffic at the University of
Wisconsin - Madison begain in March of 2000, so no previous Napster
flow data is available.)
Although Napster flows are larger than average, there are other popular
well-known applications, such as ftp-data transfers, with average flow
sizes that far exceed those of Napster. This is evidenced by plots of
average ftp-data flows size over time.
Considering the size distribution in general - i.e. amongst IP flows of
all types, there is similarity among the percentages of flows of
particular sizes between Fall 1999 and Spring 2000. For instance, in
either sample, about half of the flows are of sizes less than 512
bytes.
One curiousity visible when plotting the Cumulative Percentage of Flows
vs. Flow Size is that, in Spring 2000, 5.7% of the flows were less than
32 bytes in contrast with 0.1% from the previous fall.
Remembering that a full 27% of the Spring traffic during the Spring
sample was Napster traffic, it is likely that the small flows represent
the 28-byte "ping" packets generated by Napster, discussed below.
Considering the distribution of flow sizes from Fall 1999 and Spring
2000, there was an increase in the percentage of byte content delivered
in flows between 4MB and 16MB in size. Specifically, the Spring 2000
samply shows a marked spike in those two intervals. In both these
earlier samples the median is within the 2-4MB interval.
Considering the Fall 2000 sample, there is not quite as high a
percentage of content being delivered in flows within just those two
intervals (namely 4MB to 8MB and 8MB to 16MB), however, the median has
shifted up to the 4MB to 8MB interval.
Internet flows produced by the Napster application, and work-alike clones,
are of, at least, these types:
- TCP initial connections from client user to "redirect" server
- TCP responses from "redirect" server to client user (specifying address of an "index" server)
- TCP commands/requests from client user to "index" server
- TCP responses from "index" server to client user
- ICMP ECHO from client user to candidate "server" user (28 byte packets)
- ICMP ECHOREPLY from candidate "server" user to client user (28 byte packets)
- TCP request from client user to "server" user (request and subsequent ACKs)
- TCP responses from "server" user to client user (possibly containing MP3 content)
The term "NapUser" is used below to label traffic believed to be generated
by the Napster application,
or one of its clones. For this analysis NapUser traffic was identified
by a method
implemented in FlowScan.
The following graphs represent the sizes of Napster application flows in terms
of packets and bytes, both ICMP/TCP combined, and TCP alone.
The "NapUser TCP bytes" and "NapUser TCP packets" plots, there-in, represent
just the Napster TCP flows, and therefore emphasize the flows representing
the content-carrying flows: those representing the interaction with
Napster index servers and representing the bidirectional TCP data streams
which carries the MP3 data.
Most Napster-related flows are actually
the small ICMP flows from Napster clients to candidate servers.
As such, well over half the flows produced by the application carry a
trivial amount of content as measured in bytes or packets.
However, the average Napster-produced TCP flow is larger than the
average flow amongst all types and therefore Napster does
appear to have increased the size of the average Internet IP flow.
Napster flows are clearly larger than average, even when its numerous
but often overlooked "ping" flows participate in the calculation.
Furthermore, Napster flows appear to follow a characterstic pattern -
namely that most Napster content is delivered in flows of sizes between
4 megabytes and 16 megabytes. This is perhaps not surprising since it
is a reasonable estimate of the range of the MP3 files typically
exchanged.
Considering our results, it seems likely that use of the popular
file-sharing applications such as Napster will not only continue to
increase bandwidth usage solely by virtue of their popularity, but will
also likely shift the distribution of flow sizes higher. Operational
implications of such a shift in Internet workload and usage
characteristics warrants further study.
Thank you to these folks for their assistance in the form of answers,
and questions:
k claffy
Michael Hare
The following tools were used during this analysis:
-
Sample Packet Size Distributions
http://www.caida.org/outreach/presentations/Soa9911/mgp00025.html
-
CAIDA's CoralReef Demos
https://anala.caida.org/CoralReef/Demos/
-
Internet growth: Is there a ``Moore's Law'' for data traffic?
http://www.research.att.com/~amo/doc/networks.html
-
UW-Madison's Napter Traffic Measurement
http://net.doit.wisc.edu/data/Napster/
-
NetStats - UW-Madison's NetStats web page
http://wwwstats.net.wisc.edu/
-
FlowScan - UW-Madison's Network Measurement Tool
http://net.doit.wisc.edu/~plonka/FlowScan/
-
CAIDA's Network Measurement Tool used by FlowScan
http://www.caida.org/tools/measurement/cflowd/
-
RRDtool, the Round Robin Database tool used by FlowScan
http://ee-staff.ethz.ch/~oetiker/webtools/rrdtool/
-
Cisco's IOS NetFlow feature
http://www.cisco.com/warp/public/732/netflow/
$Id: index.wml,v 1.19 2002/10/16 20:36:13 egeiger Exp $
|