Sample Code

OSX Driver and Kext Samples/ Dispatch_Compared/ Dispatch_Compared/ Sample_Results.txt/

Analysis of Sample Results for Dispatch_Compare

The following results were obtained by running on Mac OS X v10.6 Snow Leopard:

$ Dispatch_Compared -t 60 -m 1000000 -f 16
Benchmark averaged over: 60 seconds
CPU speed: 2.66 GHz
Iterate maximum of: 1000000 times
Work function folded: 16 times

Note that the actual results may vary greatly depending on the configuration of the machine it is run on.

There are several salient points to observe about these results:

1. The basic act of queuing is much faster than forking a new thread, over 100X when doing 8 or more

2. For this kind of looping, dispatch_apply is always faster than manually creating blocks and queues. In addition, if there is only a single iteration, dispatch_apply will use a fast path to run on the current thread with virtually no overhead.

3. OpenMP has a large initial overhead, probably due to always spinning up at least one (new) thread.  GCD avoids that problem by using a system-wide thread pool, but otherwise they perform very similarly for this type of problem.

4. Unlike with threads, the bulk of GCD's time is usually spent in user space, enabling more efficient scheduling.

5. On this machine, using concurrency (via dispatch_apply) becomes faster than a simple "for loop" when the total work takes around 40 microseconds.  Note that the crossover point could occur sooner with appropriate "striding" of the computation.

6. Creating lots of queues--though bad programming practice--is nonetheless quite cheap, and for small workloads is actually faster than using a single concurrent queue (presumably since all the queues run on the parent thread). However, it is never the optimal solution.  Use a single dispatch_async for small workloads, and a concurrent queue for large ones.

7. Explicitly creating threads is quite expensive: around 20 microseconds on this machine, much more if you're creating lots of them. In a real application, you would need to make sure you didn't create more than absolutely necessary, and the "right" number would vary depending on the hardware involved and what other applications were being run.

Note that if you increase the number of folds (and thus the computation time of the work function) the crossover points from serial to parallel will occur much sooner.  Also, the specific crossover points may vary dramatically on different machines.  In most cases, however, the time spend on short iterations is inconsequential, so you should worry more about optimizing for when there are lots of iterations or large amounts of calculation per iteration.


Sample Results for Dispatch_Compare

$ /Users/Shared/Build/Release/Dispatch_Compared -t 60 -m 1000000 -f 16
Benchmark averaged over: 60 seconds
CPU speed: 2.66 GHz
Iterate maximum of: 1000000 times
Work function folded: 16 times

ASYNCHRONOUS: Microseconds to *initiate* execution (avg. over 60 seconds)

  µsecs±error/1        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.15± 0.43/alloc    =     1.15±0.43    [   +0%]      0.483u +     0.6628s [      0%]
   2.19± 0.18/array    =     2.19±0.18    [  -48%]      1.465u +     0.6663s [     86%]
   2.12± 0.22/dsptch_f =     2.12±0.22    [  -46%]      1.354u +      4.102s [    376%]
   2.44± 0.57/dispatch =     2.44±0.57    [  -53%]      1.623u +       4.47s [    432%]
  18.71± 4.52/fork     =     18.7±4.5     [  -94%]      3.183u +      18.29s [  1,774%]

  µsecs±error/2        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.62± 0.19/alloc    =     1.24±0.37    [   +0%]     0.5449u +      0.695s [      0%]
   1.57± 0.09/array    =     3.15±0.17    [  -61%]      2.364u +     0.7167s [    148%]
   1.45± 0.18/dsptch_f =     2.91±0.37    [  -57%]      2.089u +      5.282s [    494%]
   2.01± 0.26/dispatch =     4.02±0.53    [  -69%]      3.194u +      5.349s [    589%]
  20.27± 6.00/fork     =     40.5±12      [  -97%]      12.82u +      52.21s [  5,145%]

  µsecs±error/4        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.35± 0.15/alloc    =     1.42±0.59    [   +0%]        0.7u +     0.7147s [      0%]
   1.10± 0.16/array    =     4.41±0.65    [  -68%]      3.615u +     0.7172s [    206%]
   0.86± 0.37/dsptch_f =     3.44±1.5     [  -59%]      2.605u +      5.075s [    443%]
   1.28± 0.18/dispatch =      5.1±0.71    [  -72%]      4.238u +      5.362s [    579%]
  43.56±10.65/fork     =      174±43      [  -99%]      39.68u +      342.8s [ 26,937%]

  µsecs±error/8        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.22± 0.15/alloc    =     1.79±1.2     [   +0%]      1.028u +     0.7636s [      0%]
   0.92± 0.03/array    =     7.35±0.27    [  -76%]       6.53u +     0.7135s [    304%]
   0.58± 0.06/dsptch_f =     4.62±0.5     [  -61%]      3.798u +      5.202s [    402%]
   0.90± 0.18/dispatch =     7.18±1.4     [  -75%]      6.307u +      5.137s [    539%]
  98.19±11.87/fork     =      785±95      [ -100%]        104u +      1,463s [ 87,351%]

  µsecs±error/16       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.17± 3.94/alloc    =     2.74±63      [   +0%]      1.768u +     0.8645s [      0%]
   0.80± 0.07/array    =     12.9±1.2     [  -79%]      12.03u +     0.7144s [    384%]
   0.42± 0.04/dsptch_f =     6.79±0.64    [  -60%]      5.975u +      5.219s [    325%]
   0.68± 0.12/dispatch =     10.9±2       [  -75%]      10.07u +      5.222s [    481%]
 162.99±12.96/fork     = 2.61e+03±2.1e+02 [ -100%]      234.7u +      4,240s [169,892%]

  µsecs±error/32       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.16± 4.99/alloc    =     4.96±1.6e+02 [   +0%]      3.353u +      1.068s [      0%]
   0.71± 0.04/array    =     22.7±1.2     [  -78%]      21.84u +     0.7253s [    411%]
   0.33± 0.04/dsptch_f =     10.6±1.3     [  -53%]      13.48u +      5.648s [    333%]
   0.79± 0.23/dispatch =     25.2±7.2     [  -80%]      29.05u +       6.58s [    706%]
 236.52±13.09/fork     = 7.57e+03±4.2e+02 [ -100%]      503.1u +  1.111e+04s [262,671%]

  µsecs±error/64       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.22±11.59/alloc    =     14.4±7.4e+02 [   +0%]      6.385u +      1.526s [      0%]
   0.66± 0.24/array    =     42.4±15      [  -66%]      41.55u +     0.7312s [    435%]
   0.34± 0.04/dsptch_f =     21.8±2.6     [  -34%]      31.65u +       6.08s [    377%]
   0.63± 0.15/dispatch =     40.1±9.7     [  -64%]       47.4u +      6.196s [    577%]
 295.36±15.24/fork     = 1.89e+04±9.8e+02 [ -100%]      1,039u +  2.622e+04s [344,415%]

  µsecs±error/128      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.27±11.49/alloc    =     34.2±1.5e+03 [   +0%]      12.19u +      2.231s [      0%]
   0.64± 0.04/array    =     82.1±5.7     [  -58%]      81.22u +     0.7739s [    469%]
   0.45± 0.07/dsptch_f =     57.8±9.4     [  -41%]      80.36u +      6.816s [    504%]
   0.73± 0.28/dispatch =     93.2±35      [  -63%]      114.4u +      6.681s [    739%]
 311.40±14.49/fork     = 3.99e+04±1.9e+03 [ -100%]      2,084u +  5.462e+04s [393,081%]

  µsecs±error/256      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.12± 1.75/alloc    =     30.9±4.5e+02 [   +0%]      23.92u +        3.6s [      0%]
   0.62± 0.49/array    =      158±1.3e+02 [  -80%]        157u +     0.8497s [    474%]
   0.46± 0.07/dsptch_f =      116±19      [  -73%]      157.5u +      11.05s [    512%]
   0.68± 0.58/dispatch =      173±1.5e+02 [  -82%]      208.5u +      9.335s [    692%]
 312.23±29.57/fork     = 7.99e+04±7.6e+03 [ -100%]      3,948u +  1.085e+05s [408,529%]

  µsecs±error/512      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.14± 2.36/alloc    =       74±1.2e+03 [   +0%]      47.22u +      6.607s [      0%]
   0.62± 0.04/array    =      317±21      [  -77%]      315.3u +     0.9726s [    488%]
   0.43± 0.05/dsptch_f =      222±26      [  -67%]      306.1u +      14.11s [    495%]
   0.70± 0.86/dispatch =      359±4.4e+02 [  -79%]      434.4u +      12.66s [    731%]
 341.74±15.05/fork     = 1.75e+05±7.7e+03 [ -100%]      8,332u +  2.336e+05s [449,352%]

  µsecs±error/1,024    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.13± 1.37/alloc    =      132±1.4e+03 [   +0%]      94.01u +      11.95s [      0%]
   0.62± 0.30/array    =      630±3.1e+02 [  -79%]      627.5u +      1.068s [    493%]
   0.45± 0.06/dsptch_f =      463±64      [  -72%]      640.3u +      23.73s [    527%]
   0.69± 1.64/dispatch =      709±1.7e+03 [  -81%]      843.7u +      21.28s [    716%]
 360.52±15.05/fork     = 3.69e+05±1.5e+04 [ -100%]  1.658e+04u +  4.848e+05s [473,091%]

  µsecs±error/2,048    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.14± 1.10/alloc    =      284±2.3e+03 [   +0%]        189u +      23.51s [      0%]
   0.61± 0.06/array    = 1.25e+03±1.3e+02 [  -77%]      1,251u +      1.489s [    489%]
   0.40± 0.06/dsptch_f =      813±1.1e+02 [  -65%]      1,142u +       39.8s [    456%]
   0.63± 2.49/dispatch = 1.29e+03±5.1e+03 [  -78%]      1,536u +      34.76s [    639%]
 386.28±15.08/fork     = 7.91e+05±3.1e+04 [ -100%]  3.328e+04u +  1.016e+06s [493,535%]

  µsecs±error/4,096    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.14± 0.89/alloc    =      582±3.7e+03 [   +0%]      403.9u +      45.66s [      0%]
   0.61± 0.13/array    = 2.52e+03±5.2e+02 [  -77%]      2,507u +      3.119s [    458%]
   0.37± 0.06/dsptch_f = 1.51e+03±2.5e+02 [  -61%]      2,129u +      52.21s [    385%]
   0.53± 0.09/dispatch = 2.18e+03±3.6e+02 [  -73%]      2,759u +      52.36s [    526%]
 419.85± 3.02/fork     = 1.72e+06±1.2e+04 [ -100%]  6.283e+04u +  2.145e+06s [491,064%]

  µsecs±error/8,192    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.13± 0.51/alloc    = 1.03e+03±4.1e+03 [   +0%]      829.2u +      90.52s [      0%]
   0.62± 0.13/array    = 5.07e+03±1.1e+03 [  -80%]      5,044u +      6.089s [    449%]
   0.36± 0.06/dsptch_f = 2.92e+03±5.3e+02 [  -65%]      4,128u +      93.91s [    359%]
   0.51± 0.08/dispatch = 4.17e+03±6.5e+02 [  -75%]      5,355u +      92.06s [    492%]
 511.74±52.66/fork     = 4.19e+06±4.3e+05 [ -100%]   2.16e+05u +  4.833e+06s [548,833%]

  µsecs±error/16,384   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.13± 0.25/alloc    = 2.16e+03±4.1e+03 [   +0%]      1,501u +      450.9s [      0%]
   0.64± 0.04/array    = 1.05e+04±7e+02   [  -79%]  1.028e+04u +      218.8s [    438%]
   0.31± 0.09/dsptch_f = 5.04e+03±1.5e+03 [  -57%]      7,015u +      510.6s [    286%]
   0.43± 0.23/dispatch =  7.1e+03±3.7e+03 [  -70%]      8,921u +        397s [    377%]

  µsecs±error/32,768   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.14± 0.23/alloc    = 4.52e+03±7.4e+03 [   +0%]      2,944u +      922.4s [      0%]
   0.63± 0.01/array    = 2.05e+04±4.6e+02 [  -78%]  2.028e+04u +      231.7s [    431%]
   0.34± 0.07/dsptch_f = 1.12e+04±2.2e+03 [  -60%]  1.513e+04u +      1,910s [    341%]
   0.47± 0.08/dispatch = 1.55e+04±2.6e+03 [  -71%]  1.918e+04u +      2,033s [    449%]

  µsecs±error/65,536   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.43± 1.57/alloc    =  2.8e+04±1e+05   [   +0%]      5,719u +      1,744s [      0%]
   0.66± 0.01/array    = 4.33e+04±7.9e+02 [  -35%]  4.251e+04u +      745.4s [    480%]
   0.33± 0.06/dsptch_f = 2.18e+04±3.8e+03 [  +28%]  2.879e+04u +      5,195s [    355%]
   0.46± 0.06/dispatch = 3.01e+04±3.9e+03 [   -7%]  3.691e+04u +      5,419s [    467%]

  µsecs±error/131,072  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.42±31.87/alloc    = 1.86e+05±4.2e+06 [   +0%]  1.148e+04u +      5,315s [      0%]
   0.67± 0.01/array    =  8.8e+04±9.1e+02 [ +112%]  8.706e+04u +      935.2s [    424%]
   0.32± 0.05/dsptch_f = 4.18e+04±6.5e+03 [ +346%]  5.484e+04u +  1.144e+04s [    295%]
   0.45± 0.05/dispatch = 5.86e+04±7.1e+03 [ +218%]  7.077e+04u +  1.204e+04s [    393%]

  µsecs±error/262,144  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.85±13.37/alloc    = 2.24e+05±3.5e+06 [   +0%]  2.274e+04u +      7,169s [      0%]
   0.72± 0.17/array    = 1.89e+05±4.5e+04 [  +18%]  1.811e+05u +      2,961s [    515%]
   0.31± 0.04/dsptch_f = 8.21e+04±1.1e+04 [ +172%]  1.053e+05u +  2.606e+04s [    339%]
   0.46± 0.05/dispatch = 1.21e+05±1.4e+04 [  +85%]    1.4e+05u +  3.168e+04s [    474%]

  µsecs±error/524,288  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   0.22± 0.43/alloc    = 1.15e+05±2.3e+05 [   +0%]  4.556e+04u +      8,460s [      0%]
   0.90± 0.29/array    = 4.73e+05±1.5e+05 [  -76%]  4.315e+05u +      4,137s [    706%]
   0.31± 0.04/dsptch_f = 1.64e+05±1.9e+04 [  -30%]  2.068e+05u +   5.53e+04s [    385%]
   0.46± 0.09/dispatch = 2.42e+05±4.8e+04 [  -52%]  2.638e+05u +  7.415e+04s [    526%]

SYNCHRONOUS: Microseconds to *complete* execution (avg. over 60 seconds)

  µsecs±error/1        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   3.59± 0.28/loop     =     3.59±0.28    [   +0%]      2.873u +     0.7078s [      0%]
   3.64± 0.34/apply    =     3.64±0.34    [   -1%]       2.92u +     0.6984s [      1%]
  26.03± 5.81/serial   =       26±5.8     [  -86%]      13.34u +      19.47s [    816%]
  31.70± 8.08/parallel =     31.7±8.1     [  -89%]       13.6u +       28.5s [  1,076%]
  28.22± 6.09/queues   =     28.2±6.1     [  -87%]      15.53u +      19.61s [    881%]
  58.37± 9.58/openmp   =     58.4±9.6     [  -94%]      31.38u +      73.56s [  2,831%]
 199.05±200.63/thread   =      199±2e+02   [  -98%]      26.08u +      114.1s [  3,814%]

  µsecs±error/2        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   3.30± 0.18/loop     =     6.59±0.37    [   +0%]      5.655u +     0.9082s [      0%]
   6.81± 1.24/apply    =     13.6±2.5     [  -52%]      10.99u +      9.674s [    215%]
  14.98± 3.24/serial   =       30±6.5     [  -78%]      18.54u +      19.26s [    476%]
  18.86± 5.16/parallel =     37.7±10      [  -83%]      21.42u +      38.79s [    817%]
  18.69± 3.98/queues   =     37.4±8       [  -82%]      27.87u +      28.77s [    763%]
  29.27± 4.82/openmp   =     58.5±9.6     [  -89%]      35.22u +      74.46s [  1,571%]
 236.94±222.33/thread   =      474±4.4e+02 [  -99%]      49.51u +      329.9s [  5,681%]

  µsecs±error/4        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   3.06± 0.12/loop     =     12.2±0.47    [   +0%]      11.28u +     0.9113s [      0%]
   6.38± 1.38/apply    =     25.5±5.5     [  -52%]      21.63u +      27.37s [    302%]
   9.60± 2.14/serial   =     38.4±8.6     [  -68%]      27.75u +      20.03s [    292%]
  14.47± 2.62/parallel =     57.9±10      [  -79%]      42.32u +      58.05s [    723%]
  13.34± 2.43/queues   =     53.4±9.7     [  -77%]      50.32u +      54.64s [    761%]
  14.48± 2.44/openmp   =     57.9±9.7     [  -79%]      40.88u +      72.98s [    834%]
 765.02±146.39/thread   = 3.06e+03±5.9e+02 [ -100%]      133.5u +      3,056s [ 26,057%]

  µsecs±error/8        = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.94± 0.16/loop     =     23.5±1.2     [   +0%]      22.54u +     0.9127s [      0%]
   4.93± 0.90/apply    =     39.5±7.2     [  -40%]      38.35u +      39.39s [    232%]
   6.92± 1.23/serial   =     55.4±9.8     [  -58%]      46.93u +      21.54s [    192%]
  10.35± 1.59/parallel =     82.8±13      [  -72%]      75.51u +      80.27s [    564%]
  10.17± 1.63/queues   =     81.3±13      [  -71%]      103.9u +      84.11s [    702%]
   7.60± 1.40/openmp   =     60.8±11      [  -61%]      52.75u +      71.71s [    431%]
 964.98±127.91/thread   = 7.72e+03±1e+03   [ -100%]      244.8u +      7,726s [ 33,892%]

  µsecs±error/16       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.88± 0.04/loop     =       46±0.67    [   +0%]      45.04u +     0.9285s [      0%]
   2.49± 0.51/apply    =     39.8±8.1     [  +16%]      58.48u +       36.6s [    107%]
   6.14± 1.21/serial   =     98.3±19      [  -53%]      93.05u +       23.1s [    153%]
   6.06± 0.95/parallel =     96.9±15      [  -53%]      131.6u +      86.59s [    375%]
   6.04± 3.83/queues   =     96.6±61      [  -52%]      172.5u +      85.43s [    461%]
   4.12± 0.67/openmp   =       66±11      [  -30%]      75.33u +      71.15s [    219%]
1024.88±131.54/thread   = 1.64e+04±2.1e+03 [ -100%]      432.5u +  1.629e+04s [ 36,286%]

  µsecs±error/32       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.86± 0.22/loop     =     91.5±6.9     [   +0%]      90.26u +      1.194s [      0%]
   1.80± 0.53/apply    =     57.6±17      [  +59%]      113.2u +      38.64s [     66%]
   4.49± 0.64/serial   =      144±21      [  -36%]      152.9u +      19.62s [     89%]
   3.90± 0.49/parallel =      125±16      [  -27%]      250.7u +      88.44s [    271%]
   3.66± 1.51/queues   =      117±48      [  -22%]      293.5u +      58.75s [    285%]
   2.46± 0.37/openmp   =     78.7±12      [  +16%]        121u +         72s [    111%]
1141.46±106.05/thread   = 3.65e+04±3.4e+03 [ -100%]        860u +  3.564e+04s [ 39,814%]

  µsecs±error/64       = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.83± 0.02/loop     =      181±1.1     [   +0%]      180.1u +      0.977s [      0%]
   1.26± 0.18/apply    =     80.8±12      [ +124%]      208.4u +      37.49s [     36%]
   3.91± 0.28/serial   =      250±18      [  -28%]      286.6u +      22.55s [     71%]
   2.65± 0.32/parallel =      170±20      [   +7%]      465.5u +      84.52s [    204%]
   3.38± 0.61/queues   =      217±39      [  -16%]      619.9u +       81.5s [    287%]
   1.62± 0.25/openmp   =      104±16      [  +75%]      213.2u +      73.02s [     58%]
1169.36±103.51/thread   = 7.48e+04±6.6e+03 [ -100%]      1,584u +  7.271e+04s [ 40,936%]

  µsecs±error/128      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.82± 0.01/loop     =      361±1.2     [   +0%]        360u +     0.9864s [      0%]
   1.04± 0.12/apply    =      133±16      [ +171%]      410.1u +      38.65s [     24%]
   3.69± 0.21/serial   =      473±26      [  -24%]      551.4u +      24.43s [     60%]
   2.02± 0.22/parallel =      258±28      [  +40%]        822u +      90.94s [    153%]
   3.35± 0.82/queues   =      428±1.1e+02 [  -16%]      1,239u +      146.7s [    284%]
   1.26± 0.22/openmp   =      162±29      [ +123%]      417.3u +       77.4s [     37%]
1180.73±111.26/thread   = 1.51e+05±1.4e+04 [ -100%]      3,014u +  1.471e+05s [ 41,467%]

  µsecs±error/256      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.82± 0.05/loop     =      723±12      [   +0%]      720.5u +      1.633s [      0%]
   0.92± 0.10/apply    =      235±24      [ +208%]      782.5u +      39.39s [     14%]
   3.58± 0.19/serial   =      916±50      [  -21%]      1,095u +      28.95s [     56%]
   1.78± 0.18/parallel =      455±47      [  +59%]      1,585u +        103s [    134%]
   3.28± 0.21/queues   =      841±55      [  -14%]      2,455u +      270.1s [    277%]
   1.10± 0.20/openmp   =      282±51      [ +156%]      844.7u +      84.23s [     29%]
1183.97±125.56/thread   = 3.03e+05±3.2e+04 [ -100%]      6,007u +  2.973e+05s [ 41,909%]

  µsecs±error/512      = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.82± 0.01/loop     = 1.44e+03±6.2     [   +0%]      1,440u +      1.854s [      0%]
   0.85± 0.08/apply    =      435±42      [ +232%]      1,563u +      40.43s [     11%]
   3.53± 0.16/serial   = 1.81e+03±80      [  -20%]      2,182u +      37.31s [     54%]
   1.65± 0.15/parallel =      845±77      [  +71%]      3,121u +      108.9s [    124%]
   3.23± 0.19/queues   = 1.66e+03±99      [  -13%]      4,773u +      557.7s [    270%]
   0.94± 0.18/openmp   =      479±91      [ +201%]      1,567u +      81.07s [     14%]
1184.66±129.03/thread   = 6.07e+05±6.6e+04 [ -100%]  1.179e+04u +  5.956e+05s [ 42,016%]

  µsecs±error/1,024    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.82± 0.01/loop     = 2.88e+03±6.5     [   +0%]      2,880u +      2.214s [      0%]
   0.82± 0.09/apply    =      837±88      [ +245%]      3,089u +      40.06s [      9%]
   3.51± 0.14/serial   = 3.59e+03±1.4e+02 [  -20%]      4,361u +       59.2s [     53%]
   1.59± 0.13/parallel = 1.63e+03±1.3e+02 [  +77%]      6,165u +      136.8s [    119%]
   3.25± 0.74/queues   = 3.33e+03±7.6e+02 [  -13%]      9,458u +      1,197s [    270%]
   0.89± 0.14/openmp   =      911±1.5e+02 [ +216%]      3,134u +      93.56s [     12%]
1211.52±112.92/thread   = 1.24e+06±1.2e+05 [ -100%]  2.385e+04u +  1.207e+06s [ 42,593%]

  µsecs±error/2,048    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.81± 0.01/loop     = 5.77e+03±11      [   +0%]      5,761u +      3.552s [      0%]
   0.80± 0.07/apply    = 1.63e+03±1.4e+02 [ +254%]      6,050u +      43.06s [      6%]
   3.44± 0.11/serial   = 7.05e+03±2.2e+02 [  -18%]      8,587u +      98.01s [     51%]
   1.58± 0.12/parallel = 3.23e+03±2.5e+02 [  +79%]  1.229e+04u +      201.4s [    117%]
   3.20± 0.23/queues   = 6.56e+03±4.6e+02 [  -12%]  1.869e+04u +      2,287s [    264%]
   0.82± 0.14/openmp   = 1.67e+03±2.9e+02 [ +244%]      6,030u +      91.75s [      6%]
1240.29±49.39/thread   = 2.54e+06±1e+05   [ -100%]  4.873e+04u +  2.489e+06s [ 43,926%]

  µsecs±error/4,096    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.81± 0.00/loop     = 1.15e+04±14      [   +0%]  1.152e+04u +      4.097s [      0%]
   0.78± 0.06/apply    = 3.21e+03±2.6e+02 [ +259%]  1.216e+04u +      45.48s [      6%]
   3.36± 0.10/serial   = 1.38e+04±4.1e+02 [  -16%]  1.653e+04u +      167.1s [     45%]
   1.57± 0.11/parallel = 6.41e+03±4.5e+02 [  +80%]  2.448e+04u +      341.7s [    115%]
   3.28± 0.25/queues   = 1.34e+04±1e+03   [  -14%]  3.785e+04u +      5,071s [    272%]
   0.78± 0.14/openmp   =  3.2e+03±5.7e+02 [ +260%]  1.184e+04u +       96.2s [      4%]
1232.05±107.67/thread   = 5.05e+06±4.4e+05 [ -100%]  9.843e+04u +  4.875e+06s [ 43,054%]

  µsecs±error/8,192    = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.79± 0.00/loop     = 2.29e+04±30      [   +0%]  2.288e+04u +      10.48s [      0%]
   0.77± 0.06/apply    = 6.34e+03±5e+02   [ +261%]  2.444e+04u +       55.8s [      7%]
   3.29± 0.11/serial   = 2.69e+04±9e+02   [  -15%]   3.23e+04u +        296s [     42%]
   1.59± 0.13/parallel =  1.3e+04±1.1e+03 [  +76%]  4.841e+04u +      703.4s [    115%]
   3.09± 0.29/queues   = 2.53e+04±2.4e+03 [   -9%]  7.057e+04u +      6,884s [    238%]
   0.77± 0.15/openmp   = 6.29e+03±1.2e+03 [ +264%]  2.318e+04u +      106.6s [      2%]
1237.49±93.46/thread   = 1.01e+07±7.7e+05 [ -100%]  1.768e+05u +  9.877e+06s [ 43,821%]

  µsecs±error/16,384   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.50± 0.00/loop     =  4.1e+04±53      [   +0%]  4.097e+04u +      22.67s [      0%]
   0.70± 0.05/apply    = 1.14e+04±8.7e+02 [ +259%]   4.43e+04u +      76.68s [      8%]
   2.95± 0.12/serial   = 4.84e+04±2e+03   [  -15%]  5.843e+04u +      1,495s [     46%]
   1.49± 0.10/parallel = 2.45e+04±1.7e+03 [  +68%]  9.294e+04u +      1,490s [    130%]
   3.20± 0.23/queues   = 5.24e+04±3.7e+03 [  -22%]  1.443e+05u +  2.185e+04s [    305%]
   0.74± 0.10/openmp   = 1.22e+04±1.6e+03 [ +236%]  4.133e+04u +      111.1s [      1%]

  µsecs±error/32,768   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   2.35± 0.00/loop     =  7.7e+04±79      [   +0%]  7.691e+04u +       42.3s [      0%]
   0.66± 0.05/apply    = 2.15e+04±1.6e+03 [ +258%]  8.354e+04u +      108.5s [      9%]
   2.71± 0.09/serial   = 8.88e+04±2.9e+03 [  -13%]  1.064e+05u +      3,037s [     42%]
   1.47± 0.08/parallel = 4.82e+04±2.8e+03 [  +60%]  1.824e+05u +      4,690s [    143%]
   3.16± 0.22/queues   = 1.03e+05±7.2e+03 [  -26%]  2.824e+05u +  4.546e+04s [    326%]
   0.73± 0.07/openmp   = 2.39e+04±2.2e+03 [ +221%]  7.746e+04u +      137.8s [      1%]

  µsecs±error/65,536   = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.75± 0.00/loop     = 1.15e+05±86      [   +0%]  1.145e+05u +      75.38s [      0%]
   0.51± 0.05/apply    = 3.35e+04±3e+03   [ +242%]  1.269e+05u +      205.3s [     11%]
   2.07± 0.07/serial   = 1.35e+05±4.9e+03 [  -15%]  1.691e+05u +      6,997s [     54%]
   1.37± 0.05/parallel = 9.01e+04±3.5e+03 [  +27%]  3.374e+05u +  1.177e+04s [    205%]
   3.19± 0.19/queues   = 2.09e+05±1.2e+04 [  -45%]  5.326e+05u +  1.111e+05s [    462%]
   0.65± 0.02/openmp   = 4.26e+04±1.3e+03 [ +169%]   1.16e+05u +      175.5s [      1%]

  µsecs±error/131,072  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.56± 0.00/loop     = 2.04e+05±3.4e+02 [   +0%]  2.041e+05u +      150.1s [      0%]
   0.45± 0.03/apply    = 5.91e+04±4e+03   [ +246%]  2.291e+05u +      226.8s [     12%]
   1.82± 0.05/serial   = 2.39e+05±6.5e+03 [  -14%]  3.044e+05u +  1.394e+04s [     56%]
   1.35± 0.04/parallel = 1.76e+05±5.5e+03 [  +16%]  6.554e+05u +  2.749e+04s [    234%]
   3.33± 0.17/queues   = 4.36e+05±2.2e+04 [  -53%]  1.064e+06u +  2.546e+05s [    546%]
   0.62± 0.03/openmp   = 8.11e+04±4.2e+03 [ +152%]  2.072e+05u +      242.9s [      2%]

  µsecs±error/262,144  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.44± 0.00/loop     = 3.78e+05±2.6e+02 [   +0%]  3.776e+05u +      182.9s [      0%]
   0.42± 0.03/apply    =  1.1e+05±6.8e+03 [ +243%]  4.292e+05u +      380.2s [     14%]
   1.68± 0.05/serial   = 4.41e+05±1.2e+04 [  -14%]   5.69e+05u +  3.531e+04s [     60%]
   1.32± 0.03/parallel = 3.46e+05±7.6e+03 [   +9%]  1.273e+06u +  6.527e+04s [    254%]
   3.23± 0.05/queues   = 8.47e+05±1.3e+04 [  -55%]  2.071e+06u +  5.011e+05s [    581%]
   0.46± 0.03/openmp   = 1.21e+05±6.9e+03 [ +212%]  3.823e+05u +      344.1s [      1%]

  µsecs±error/524,288  = WALL(µs)±error   [+-rate]   USER (µs) +    SYS (µs) [overhead]
   1.38± 0.00/loop     = 7.23e+05±4.2e+02 [   +0%]  7.221e+05u +      585.3s [      0%]
   0.40± 0.02/apply    = 2.12e+05±1e+04   [ +241%]  8.258e+05u +      687.3s [     14%]
   1.62± 0.03/serial   = 8.47e+05±1.5e+04 [  -15%]  1.107e+06u +  8.113e+04s [     64%]
   1.31± 0.03/parallel = 6.88e+05±1.3e+04 [   +5%]  2.498e+06u +  1.528e+05s [    267%]
   3.26± 0.03/queues   = 1.71e+06±1.6e+04 [  -58%]  4.125e+06u +  1.041e+06s [    615%]
   0.41± 0.03/openmp   = 2.17e+05±1.3e+04 [ +233%]  7.304e+05u +      594.9s [      1%]

Our Services

  • What our customers say about us?

© 2011-2024 All Rights Reserved. Joya Systems. 4425 South Mopac Building II Suite 101 Austin, TX 78735 Tel: 800-DEV-KERNEL

Privacy Policy. Terms of use. Valid XHTML & CSS