Not really fair, as I should have really used SIMD on the CPU, or somesuch, to use the CPU’s (limited) parallel processing functionality.
Let’s be benevolent and allow the CPU to use all it’s 8 threads in it’s imaginary parallel processing (at about 4x the clock speed of the GPU… still gives a 2.5 times throughput speed factor to the GPU (but would be nowhere near that low in reality).
Amazon Auto Links: No products found.
so interesting, thanks a lot
Gtx 780 and 23 gflops? Did you compile for debug or release?
I get 202gflops from k420 which has just 1 smx or 192 cuda cores.
Compiling for release would also make cpu use simd on applicable parts of codes.