I have run benchmarks on the following hardware:
- Xeon 3600, 1024k Cache, no SEP, no model #
- Xeon 2800, 512k Cache, no SEP, no model #
- Xeon 2333, 4095k Cache, no SEP, E5E45
- Opteron 1000, 1024k Cache, no SEP, 270
- Opteron 2000, 1024k Cache, SEP, 270
- Opteron 2600, 1024k Cache, no SEP, 285
- Opteron 2600, 1024k Cache, SEP, 285
These systems were under various amounts of load, so averages were taken, and they cannot be deemed 100% reliable. The benchmarks used a gettimeofday() call which looped for at least 3 seconds. The attached PDFs shows the results in terms of "Tics per SemOp". and "SemOps per Second".
Observations:
The Opteron 270 running at 1GHz and using int 80 for system calls
was the fastest per clock tic. The fastest processors (the Xeons) were the slowest. This suggests the problem is the memory access.
The raw numbers show that the sysent call on the Opteron makes the semop about 14% faster. Static compilation generally improved speeds, though because of system load, I would not put much significance on these numbers. The only dynamic linking, really, is the library call to invoke a system call.
Analysis:
I'm going to leave it to others to explain this data.