Re: Speeding up flowscan

Date view Thread view Subject view Author view

Subject: Re: Speeding up flowscan
From: Dave Plonka (plonka@doit.wisc.edu)
Date: Fri Nov 09 2001 - 09:31:07 CST

On Wed, Nov 07, 2001 at 05:23:22PM -0500, Maarten van Gelder wrote:
> 
> I am running Flowcan 1.006 on a dual pentium III (700MHz) box with FreeBSD 
> 4.2, and am running into a performance issue.  Two routers are sending 
> flows to this box.  A five minute flows file can be over 100 MB in 
> size.

Your figures (from the log message below) come out to over 6000
flows-per-second across the network's border.  (Assuming NetFlow v5
with timeout set to one minute) Why so high?  Is this just during DoS
floods from forged sources with lots of flows of only a single packet?
Do you have a web site where we can see the graphs?

It sounds too high since exceed the average inbound+outbound flows for
large campuses like UW-Madison and UF-Gainesville which each have
40,000-50,000 people, and average hundreds of megabits-per-second of
bandwidth to the outside world.

> These files take a lot more than 5 minutes to process, as shown in 
> this excerpt from the flowscan log file
> 
> 2001/11/07 16:55:21 flowscan-1.020 CampusIO: Cflow::find took 629 wallclock 
> secs (617.81 usr +  2.91 sys = 620.72 CPU)  for 100296955 flow file bytes, 
> flow hit ratio: 1793462/1823581

OK, that shows you processing 2937 flows-per-second.  (1823581 divided
by 620.72)

Hmm, your hit ratio is very high, which is good, meaning nearly all the
flows you process are being identified as either inbound or outbound
(not intranet), so you're unlikely to be processing flows unncessarily.

> Suggestions on how to speed up flowscan are high appreciated.  Is it 
> feasible to process this many flows on my hardware?

The performance varies based on the configuration of CampusIO/SubNetIO,
but I've managed to process about 5000 flows-per-second on a 600MHz
dual PIII running Linux 2.2 with the "pset" kernel patch to reserve one
processor just to FlowScan.  (The patch just assures that FlowScan can
always get the CPU when it wants - it looks like you're fine in that
respect since you reported getting over 620 seconds of CPU in 629
seconds of real time.)

More recently when experimenting on a newer machine, I've managed more
than 9000 flows-per-second on a 1.4GHz Athlon running Linux 2.4, and
have heard similar results from other FlowScan users.

With the load you're observing, you really need to get in the ballpark
of 9000-10000 flows-per-second (which will take about 200 seconds to
process one five minute file).

So, the gist is that your dabbling in the area of those that have seen
the highest performance on the current Intel hardware.

Are you using cflowd or flow-tools?  (If flow-tools - be sure yo
use "flow-capture -z0" to disable compression.)

What kind of routers are you using?

At FlowScan sites such as UW-Madison and WiscNet, which currently
exports flows from 3 border routers (2 Junipers and a Cisco GSR) to one
instance of FlowScan, managed to reduce the load by doing flow-export
based on packet-sampling (on the order of 1 in 100 or 1 in 200).
However, if you enable packet sampling with the vanilla FlowScan-1.006
release, all your RRD and graphs values will be scaled-down, roughly by
the sampling rate, which is probably unacceptable unless you write alot
of customizations to the code and graphs yourself.

I'm working on configuration options for the next FlowScan release to
scale the values appropriately to make things more compatible with
flows from packet sampling.

... Sorry there's no silver bullet.  Rewriting more of FlowScan in C(++)
would help (currently the Cflow and Net::Patricia perl modules are in
C, which helped speed things up), but I'm not planning to do that since
router vendors such as Cisco and Juniper are requiring us to use packet
sampling on high-capacity links.  I may change my mind if I find myself
having to process lots more flows from Catalyst 6000 gear or from other
vendors, but not today.  My current stance is that it makes more sense
for users to get faster, cheap hardware than for me to write the code
to solve problems that I don't experience.

Dave

-- 
plonka@doit.wisc.edu  http://net.doit.wisc.edu/~plonka  ARS:N9HZF  Madison, WI

--
Help        mailto:majordomo@net.doit.wisc.edu and say "help" in message body
Unsubscribe mailto:majordomo@net.doit.wisc.edu and say
"unsubscribe flowscan" in message body
Archive     http://net.doit.wisc.edu/~plonka/list/flowscan/archive/

Date view Thread view Subject view Author view

This archive was generated by hypermail 2b25 : Fri Nov 09 2001 - 09:33:50 CST