archive: SETI Re: High bandwidth digitizer, FFT's, and analyzer

SETI Re: High bandwidth digitizer, FFT's, and analyzer

David Woolley ( david@djwhome.demon.co.uk )
Fri, 2 Oct 1998 08:24:07 +0100 (BST)

> So, let me turn this question around, to answer the question more directly.
> You let me know if I have this right. You are saying that approximately 300
> Gips is neccessary to FFT a 100 mhz bandwidth. Lets say a 300 Mhz pentium II
> processor has only about 10 percent of that capacity. Therefore, the upper

I think 0.15% would be a more realistic estimate, although I don't have
cycle count information for anything more recent than the 486. With sufficient
information about caching logic, pipelining, etc., it would be possible to
get a very accurate cycle count for the specgrm code, though.

> limit of FFT capacity of a 300 Mhz pentium II should be about 10 mhz. This

10 milli Hertz is not a useful bandwidth :-(.

There is a log(n) element as well, although the drop from 100Mhz to 10MHz
would only change this by about 10 % (by not to).

> would leave no extra cpu cycles for other things. So, maybe 5 mhz of bandwidth
> would use about half of the CPU cycles of a pentium II. Do I have this
> correct? It seems plausible to me.

A first cut gives about 75kHz, but the log component would be about half by
then, so I would say that 150kHz is reasonable; I think that a 100Mhz machine
would just be able to keep up with the maximum sampling rate on an SB Pro
card, for a 50% target CPU utilisation.

However, as I warned might be the case earlier, there is actually a
factor of two error in my initial figures, and the correct formula is B
* 0.5 * N * log2 (N), where N is the number of samples and B is the cost
for each elemental step. This is for an elemental step generating two
outputs from two inputs; one can reduce the multiplies a little at the
cost of extra adds/subtracts, by processing 4 inputs at once - this is
a constant factor and doesn't depend on number of points. So the final
estimate would be about 300kHz, except that the log term is creeping up
again.

One thing to note is that generating the water fall display consumes a lot
of the processing power. There are also order N setup and mop up costs
on a run.

Regarding cycle count, I think MMX includes some array processing and it
*might* be possible to hand craft MMX assembler to do better. There would
also be some improvement by unrolling the inner loop. Both ways I think
you are still in the mid 100s of kHz. The average cycle count of two
is a finger in the air guess, but the 64 bit multiplies, needed for the
100MHz system, are unlikely to be cheap. The specgrm code has twice as
many multplies as the text book says it should have, although this only
slightly decreases the instruction count - the cycle count would go do
rather more.