DTrace saving the planet again....

...maybe not the planet, but DTrace was again a big help in finding another performance problem.

During peak-hours and when we had a spam-wave coming in, we saw a spike on our T5120 systems.

Where we usually see a very low load of ~ 1-2, the load jumped up to 400 within a few minutes.

Something did'nt seem right.

Like with the previous DTrace Problem, we saw that our userland processes had a high system time, so most of the things it was doing (or maybe not) was in the kernel.

It was time to use one of the secret weapons that Solars Admins have: lockstat (uses DTrace).

Sorry for the lengthy output...

# lockstat -i 133 -I sleep 5

Profiling interrupt: 43328 events in 5.088 seconds (8516 events/sec)

Count indv cuml rcnt     nsec CPU+PIL                Caller                 
-------------------------------------------------------------------------------
  155   0%   0% 0.00     2347 cpu[42]                rdccr_delay+0x8        
  152   0%   1% 0.00     2184 cpu[53]                rdccr_delay+0x4        
  151   0%   1% 0.00     2135 cpu[58]                rdccr_delay+0x8        
  150   0%   1% 0.00     2201 cpu[10]                rdccr_delay+0x4        
  149   0%   2% 0.00     2486 cpu[24]                rdccr_delay+0x8        
  147   0%   2% 0.00     2296 cpu[57]                default_lock_delay+0x74
  147   0%   2% 0.00     2476 cpu[56]                default_lock_delay+0x74
  147   0%   3% 0.00     2138 cpu[46]                default_lock_delay+0x74
  146   0%   3% 0.00     2148 cpu[55]                default_lock_delay+0x74
  145   0%   3% 0.00     2157 cpu[61]                rdccr_delay+0x8        
  143   0%   4% 0.00     2098 cpu[49]                rdccr_delay+0x8        
  143   0%   4% 0.00     2231 cpu[45]                rdccr_delay+0x4        
  143   0%   4% 0.00     2025 cpu[28]                rdccr_delay+0x8        
  142   0%   5% 0.00     2000 cpu[63]                rdccr_delay+0x4        
  141   0%   5% 0.00     2106 cpu[55]                rdccr_delay+0x4        
  141   0%   5% 0.00     2173 cpu[52]                rdccr_delay+0x4        
  141   0%   6% 0.00     2464 cpu[48]                default_lock_delay+0x74
  141   0%   6% 0.00     2097 cpu[4]                 default_lock_delay+0x74
  140   0%   6% 0.00     2195 cpu[54]                rdccr_delay+0x4        
  140   0%   7% 0.00     2084 cpu[14]                rdccr_delay+0x4        
  140   0%   7% 0.00     2475 cpu[8]                 default_lock_delay+0x74
  139   0%   7% 0.00     2196 cpu[53]                rdccr_delay+0x8        
  139   0%   8% 0.00     2123 cpu[44]                rdccr_delay+0x8        
  139   0%   8% 0.00     2167 cpu[41]                default_lock_delay+0x74
  139   0%   8% 0.00     2386 cpu[31]                rdccr_delay+0x8        
  139   0%   9% 0.00     2551 cpu[16]                default_lock_delay+0x74
  139   0%   9% 0.00     2598 cpu[16]                rdccr_delay+0x8        
  139   0%   9% 0.00     2204 cpu[11]                default_lock_delay+0x74
  138   0%  10% 0.00     2238 cpu[62]                rdccr_delay+0x4        
  138   0%  10% 0.00     2163 cpu[51]                rdccr_delay+0x8        
  138   0%  10% 0.00     2164 cpu[47]                rdccr_delay+0x4        
  138   0%  11% 0.00     2221 cpu[22]                default_lock_delay+0x74
  138   0%  11% 0.00     2158 cpu[22]                rdccr_delay+0x4        
  137   0%  11% 0.00     2082 cpu[59]                rdccr_delay+0x4        
  137   0%  11% 0.00     2190 cpu[54]                rdccr_delay+0x8        
  137   0%  12% 0.00     2139 cpu[47]                default_lock_delay+0x74
  137   0%  12% 0.00     2169 cpu[46]                rdccr_delay+0x4        
  137   0%  12% 0.00     2234 cpu[36]                default_lock_delay+0x74
  137   0%  13% 0.00     2491 cpu[32]                rdccr_delay+0x8        
  137   0%  13% 0.00     2074 cpu[23]                default_lock_delay+0x74
  137   0%  13% 0.00     2198 cpu[19]                default_lock_delay+0x74
  137   0%  14% 0.00     2185 cpu[5]                 default_lock_delay+0x74
  136   0%  14% 0.00     2076 cpu[60]                rdccr_delay+0x8        
  136   0%  14% 0.00     2160 cpu[50]                default_lock_delay+0x74
  136   0%  15% 0.00     2225 cpu[43]                rdccr_delay+0x4        
  136   0%  15% 0.00     2250 cpu[39]                rdccr_delay+0x8        
  136   0%  15% 0.00     2088 cpu[20]                rdccr_delay+0x4        
  136   0%  16% 0.00     2209 cpu[7]                 rdccr_delay+0x4        
  135   0%  16% 0.00     2253 cpu[62]                default_lock_delay+0x74
  135   0%  16% 0.00     2191 cpu[61]                rdccr_delay+0x4        
  135   0%  17% 0.00     2539 cpu[40]                rdccr_delay+0x4        
  135   0%  17% 0.00     2233 cpu[36]                rdccr_delay+0x4        
  135   0%  17% 0.00     2047 cpu[28]                rdccr_delay+0x4        
  135   0%  17% 0.00     2192 cpu[19]                rdccr_delay+0x8        
  135   0%  18% 0.00     2203 cpu[1]                 default_lock_delay+0x74
  134   0%  18% 0.00     2354 cpu[37]                default_lock_delay+0x74
  134   0%  18% 0.00     2411 cpu[37]                rdccr_delay+0x4        
  134   0%  19% 0.00     2062 cpu[23]                rdccr_delay+0x4        
  134   0%  19% 0.00     2317 cpu[15]                rdccr_delay+0x8        
  133   0%  19% 0.00     2161 cpu[41]                rdccr_delay+0x4        
  133   0%  20% 0.00     2211 cpu[19]                rdccr_delay+0x4        
  133   0%  20% 0.00     2282 cpu[9]                 rdccr_delay+0x4        
  133   0%  20% 0.00     2093 cpu[4]                 rdccr_delay+0x8        
  133   0%  21% 0.00     2753 cpu[0]                 default_lock_delay+0x74
  132   0%  21% 0.00     2444 cpu[48]                rdccr_delay+0x4        
  132   0%  21% 0.00     2223 cpu[45]                rdccr_delay+0x8        
  132   0%  21% 0.00     2238 cpu[39]                default_lock_delay+0x74
  132   0%  22% 0.00     2164 cpu[22]                rdccr_delay+0x8        
  131   0%  22% 0.00     2572 cpu[56]                rdccr_delay+0x8        
  131   0%  22% 0.00     2121 cpu[46]                rdccr_delay+0x8        
  131   0%  23% 0.00     2539 cpu[40]                rdccr_delay+0x8        
  131   0%  23% 0.00     2246 cpu[36]                rdccr_delay+0x8        
  131   0%  23% 0.00     2388 cpu[31]                rdccr_delay+0x4        
  131   0%  24% 0.00     1957 cpu[26]                default_lock_delay+0x74
  131   0%  24% 0.00     2252 cpu[18]                rdccr_delay+0x4        
  131   0%  24% 0.00     2374 cpu[17]                default_lock_delay+0x74
  131   0%  24% 0.00     2287 cpu[12]                rdccr_delay+0x4        
  131   0%  25% 0.00     2307 cpu[3]                 rdccr_delay+0x4        
  131   0%  25% 0.00     2221 cpu[1]                 rdccr_delay+0x8        
  130   0%  25% 0.00     2088 cpu[60]                rdccr_delay+0x4        
  130   0%  26% 0.00     2198 cpu[51]                default_lock_delay+0x74
  130   0%  26% 0.00     2141 cpu[44]                default_lock_delay+0x74
  130   0%  26% 0.00     2264 cpu[42]                default_lock_delay+0x74
  130   0%  27% 0.00     2203 cpu[33]                default_lock_delay+0x74
  130   0%  27% 0.00     2110 cpu[14]                default_lock_delay+0x74
  130   0%  27% 0.00     2170 cpu[12]                default_lock_delay+0x74
  130   0%  27% 0.00     2251 cpu[7]                 default_lock_delay+0x74
  130   0%  28% 0.00     2065 cpu[2]                 default_lock_delay+0x74
  129   0%  28% 0.00     2028 cpu[63]                rdccr_delay+0x8        
  129   0%  28% 0.00     2090 cpu[59]                default_lock_delay+0x74
  129   0%  29% 0.00     2418 cpu[56]                rdccr_delay+0x4        
  129   0%  29% 0.00     2099 cpu[49]                rdccr_delay+0x4        
  129   0%  29% 0.00     2290 cpu[38]                default_lock_delay+0x74
  129   0%  30% 0.00     2256 cpu[35]                default_lock_delay+0x74
  129   0%  30% 0.00     2324 cpu[17]                rdccr_delay+0x8        
  129   0%  30% 0.00     2688 cpu[0]                 rdccr_delay+0x4        
  128   0%  30% 0.00     2237 cpu[30]                default_lock_delay+0x74
  128   0%  31% 0.00     2330 cpu[30]                rdccr_delay+0x4        
  128   0%  31% 0.00     2291 cpu[29]                rdccr_delay+0x4        
  128   0%  31% 0.00     1969 cpu[26]                rdccr_delay+0x8        
  128   0%  32% 0.00     2224 cpu[18]                default_lock_delay+0x74
  128   0%  32% 0.00     2104 cpu[6]                 rdccr_delay+0x8        
  127   0%  32% 0.00     2148 cpu[60]                default_lock_delay+0x74
  127   0%  33% 0.00     2191 cpu[51]                rdccr_delay+0x4        
  127   0%  33% 0.00     2143 cpu[49]                default_lock_delay+0x74
  127   0%  33% 0.00     2247 cpu[38]                rdccr_delay+0x4        
  127   0%  33% 0.00     2261 cpu[30]                rdccr_delay+0x8        
  127   0%  34% 0.00     2288 cpu[29]                default_lock_delay+0x74
  127   0%  34% 0.00     2027 cpu[28]                default_lock_delay+0x74
  127   0%  34% 0.00     2189 cpu[13]                default_lock_delay+0x74
  127   0%  35% 0.00     2497 cpu[8]                 rdccr_delay+0x4        
  127   0%  35% 0.00     2079 cpu[6]                 rdccr_delay+0x4        
  126   0%  35% 0.00     2174 cpu[50]                rdccr_delay+0x4        
  126   0%  35% 0.00     2449 cpu[48]                rdccr_delay+0x8        
  126   0%  36% 0.00     2229 cpu[34]                rdccr_delay+0x4        
  126   0%  36% 0.00     2267 cpu[21]                rdccr_delay+0x4        
  126   0%  36% 0.00     2569 cpu[16]                rdccr_delay+0x4        
  126   0%  37% 0.00     2294 cpu[15]                rdccr_delay+0x4        
  126   0%  37% 0.00     2213 cpu[5]                 rdccr_delay+0x4        
  126   0%  37% 0.00     2265 cpu[3]                 default_lock_delay+0x74
  125   0%  37% 0.00     2523 cpu[32]                rdccr_delay+0x4        
  125   0%  38% 0.00     2371 cpu[17]                rdccr_delay+0x4        
  125   0%  38% 0.00     2242 cpu[9]                 default_lock_delay+0x74
  124   0%  38% 0.00     2185 cpu[50]                rdccr_delay+0x8        
  124   0%  39% 0.00     2201 cpu[43]                rdccr_delay+0x8        
  124   0%  39% 0.00     2172 cpu[41]                rdccr_delay+0x8        
  124   0%  39% 0.00     2162 cpu[33]                rdccr_delay+0x8        
  124   0%  39% 0.00     2237 cpu[21]                default_lock_delay+0x74
  123   0%  40% 0.00     2153 cpu[58]                default_lock_delay+0x74
  123   0%  40% 0.00     2240 cpu[39]                rdccr_delay+0x4        
  123   0%  40% 0.00     2121 cpu[33]                rdccr_delay+0x4        
  123   0%  41% 0.00     2256 cpu[21]                rdccr_delay+0x8        
  123   0%  41% 0.00     2360 cpu[15]                default_lock_delay+0x74
  123   0%  41% 0.00     2181 cpu[11]                rdccr_delay+0x8        
  122   0%  41% 0.00     2249 cpu[57]                rdccr_delay+0x8        
  122   0%  42% 0.00     2120 cpu[55]                rdccr_delay+0x8        
  122   0%  42% 0.00     2136 cpu[52]                rdccr_delay+0x8        
  122   0%  42% 0.00     2231 cpu[34]                rdccr_delay+0x8        
  122   0%  43% 0.00     2115 cpu[20]                default_lock_delay+0x74
  122   0%  43% 0.00     2211 cpu[5]                 rdccr_delay+0x8        
  121   0%  43% 0.00     2274 cpu[35]                rdccr_delay+0x8        
  121   0%  43% 0.00     2263 cpu[29]                rdccr_delay+0x8        
  121   0%  44% 0.00     2125 cpu[25]                rdccr_delay+0x8        
  121   0%  44% 0.00     2298 cpu[3]                 rdccr_delay+0x8        
  121   0%  44% 0.00     2711 cpu[0]                 rdccr_delay+0x8        
  120   0%  45% 0.00     2002 cpu[63]                default_lock_delay+0x74
  120   0%  45% 0.00     2124 cpu[43]                default_lock_delay+0x74
  120   0%  45% 0.00     2539 cpu[40]                default_lock_delay+0x74
  120   0%  45% 0.00     2018 cpu[27]                default_lock_delay+0x74
  120   0%  46% 0.00     2184 cpu[25]                rdccr_delay+0x4        
  120   0%  46% 0.00     2089 cpu[23]                rdccr_delay+0x8        
  120   0%  46% 0.00     2202 cpu[13]                rdccr_delay+0x8        
  120   0%  47% 0.00     2243 cpu[10]                default_lock_delay+0x74
  119   0%  47% 0.00     2158 cpu[58]                rdccr_delay+0x4        
  119   0%  47% 0.00     2489 cpu[24]                rdccr_delay+0x4        
  119   0%  47% 0.00     2085 cpu[14]                rdccr_delay+0x8        
  118   0%  48% 0.00     2064 cpu[59]                rdccr_delay+0x8        
  118   0%  48% 0.00     2187 cpu[47]                rdccr_delay+0x8        
  118   0%  48% 0.00     2277 cpu[35]                rdccr_delay+0x4        
  118   0%  48% 0.00     2063 cpu[27]                rdccr_delay+0x4        
  118   0%  49% 0.00     2259 cpu[10]                rdccr_delay+0x8        
  117   0%  49% 0.00     1937 cpu[26]                rdccr_delay+0x4        
  117   0%  49% 0.00     2174 cpu[11]                rdccr_delay+0x4        
  117   0%  49% 0.00     2516 cpu[8]                 rdccr_delay+0x8        
  117   0%  50% 0.00     2113 cpu[4]                 rdccr_delay+0x4        
  116   0%  50% 0.00     2228 cpu[61]                default_lock_delay+0x74
  116   0%  50% 0.00     2111 cpu[44]                rdccr_delay+0x4        
  116   0%  51% 0.00     2285 cpu[42]                rdccr_delay+0x4        
  116   0%  51% 0.00     2432 cpu[37]                rdccr_delay+0x8        
  116   0%  51% 0.00     1997 cpu[27]                rdccr_delay+0x8        
  116   0%  51% 0.00     2472 cpu[24]                default_lock_delay+0x74
  116   0%  52% 0.00     2113 cpu[6]                 default_lock_delay+0x74
  116   0%  52% 0.00     2040 cpu[2]                 rdccr_delay+0x4        
  115   0%  52% 0.00     2219 cpu[54]                default_lock_delay+0x74
  114   0%  52% 0.00     2221 cpu[38]                rdccr_delay+0x8        
  114   0%  53% 0.00     2466 cpu[32]                default_lock_delay+0x74
  114   0%  53% 0.00     2148 cpu[25]                default_lock_delay+0x74
  114   0%  53% 0.00     2103 cpu[20]                rdccr_delay+0x8        
  114   0%  53% 0.00     2259 cpu[9]                 rdccr_delay+0x8        
  113   0%  54% 0.00     2175 cpu[13]                rdccr_delay+0x4        
  112   0%  54% 0.00     2367 cpu[31]                default_lock_delay+0x74
  112   0%  54% 0.00     2191 cpu[12]                rdccr_delay+0x8        
  112   0%  55% 0.00     2060 cpu[2]                 rdccr_delay+0x8        
  112   0%  55% 0.00     2174 cpu[1]                 rdccr_delay+0x4        
  111   0%  55% 0.00     2240 cpu[57]                rdccr_delay+0x4        
  111   0%  55% 0.00     2196 cpu[45]                default_lock_delay+0x74
  110   0%  56% 0.00     2189 cpu[52]                default_lock_delay+0x74
  108   0%  56% 0.00     2257 cpu[7]                 rdccr_delay+0x8        
  107   0%  56% 0.00     2224 cpu[53]                default_lock_delay+0x74
  107   0%  56% 0.00     2236 cpu[34]                default_lock_delay+0x74
  106   0%  57% 0.00     2267 cpu[18]                rdccr_delay+0x8        
  105   0%  57% 0.00     2224 cpu[62]                rdccr_delay+0x8        
   55   0%  57% 0.00     2643 cpu[8]                 cpu_halt+0x10c         
   54   0%  57% 0.00     2251 cpu[41]                cpu_halt+0x10c         
   54   0%  57% 0.00     2469 cpu[12]                cpu_halt+0x10c         
   54   0%  57% 0.00     2326 cpu[11]                cpu_halt+0x10c         
   54   0%  57% 0.00     2373 cpu[7]                 cpu_halt+0x10c         
   54   0%  58% 0.00     2380 cpu[5]                 cpu_halt+0x10c         
   54   0%  58% 0.00     2535 cpu[3]                 cpu_halt+0x10c         
   53   0%  58% 0.00     2409 cpu[34]                cpu_halt+0x10c         
   52   0%  58% 0.00     2691 cpu[24]                cpu_halt+0x10c         
   52   0%  58% 0.00     2365 cpu[19]                cpu_halt+0x10c         
   52   0%  58% 0.00     2520 cpu[17]                cpu_halt+0x10c         
   52   0%  58% 0.00     2345 cpu[9]                 cpu_halt+0x10c         
   51   0%  58% 0.00     2315 cpu[53]                cpu_halt+0x10c         
   51   0%  58% 0.00     2311 cpu[52]                cpu_halt+0x10c         
   51   0%  59% 0.00     2471 cpu[42]                cpu_halt+0x10c         
   51   0%  59% 0.00     2358 cpu[38]                cpu_halt+0x10c         
   51   0%  59% 0.00     2367 cpu[21]                cpu_halt+0x10c         

What the h... are these rdccr_delay functions? Don't ask me :-) But looking at the code rcddr_delay appears somewhere in a mutex construct. Mutex? Sounds like we have a bad locking situation here....

Ok, let's check, if we have mutex spins:

# lockstat -C sleep 5
          
Adaptive mutex spin: 289321 events in 5.074 seconds (57021 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                 
-------------------------------------------------------------------------------
174359  60%  60% 0.00   863315 0x6003fe15a38        nfs4_start_open_seqid_sync+0x68
87472  30%  90% 0.00   424820 0x6003fe15e98         nfs4_start_open_seqid_sync+0x68
2572   1%  91% 0.00    41206 door_knob              door_return+0x1e8      
2499   1%  92% 0.00    25247 door_knob              door_upcall_limited+0x27c
2469   1%  93% 0.00    36722 door_knob              door_return+0x23c      
1664   1%  94% 0.00   260935 0x6003fe15e98          open_owner_rele+0x8    
1506   1%  94% 0.00    53714 door_knob              door_return+0xbc       
1490   1%  95% 0.00   168535 0x6003fe15e98          find_open_owner_nolock+0x2c
1484   1%  95% 0.00    54843 door_knob              door_return+0x44       
1410   0%  96% 0.00    71905 door_knob              door_upcall_limited+0x94
 884   0%  96% 0.00   769573 0x6003fe15a38          open_owner_rele+0x8    
 762   0%  96% 0.00   894325 0x6003fe15a38          find_open_owner_nolock+0x2c
 730   0%  97% 0.00    75239 0x6003fe15e98          nfs4_start_open_seqid_sync+0x48
 386   0%  97% 0.00   766709 0x6003fe15a38          nfs4_start_open_seqid_sync+0x48
 359   0%  97% 0.00   196996 0x6003c393000          wait_for_recovery+0x4  
 357   0%  97% 0.00   558062 0x6003fe15a38          nfs4_end_open_seqid_sync+0xc
 350   0%  97% 0.00    17691 0x6003c393000          nfs4_rfscall+0x48      
 317   0%  97% 0.00   895414 0x6003c2a4000          wait_for_recovery+0x4  
 307   0%  97% 0.00  1561398 0x60033a86e80          sigunintr+0x38         
 275   0%  97% 0.00   123310 0x6003c2a4000          nfs4_rfscall+0x48      
 274   0%  97% 0.00    24396 0x6003c393000          nfs4_rfscall+0x17c     
 271   0%  98% 0.00    92475 0x6003fe15e98          nfs4_get_otw_cred+0x34 
 268   0%  98% 0.00    73946 0x6003fe15e98          nfs4_end_open_seqid_sync+0xc
 260   0%  98% 0.00   236558 0x60033a86e80          clock_tick_process+0xc8
 257   0%  98% 0.00   483864 0x6003c393000          nfs4_start_open_seqid_sync+0x9c
 219   0%  98% 0.00     4798 mod_lock               mod_hold_stub+0x18     
 211   0%  98% 0.00    64277 0x6003c2a4000          nfs4_rfscall+0x17c     
 197   0%  98% 0.00    18179 0x6003c393000          nfs4_start_open_seqid_sync+0x4
 181   0%  98% 0.00   950176 0x6003c2a4000          nfs4_attr_cache+0xc    
 175   0%  98% 0.00   823990 0x60033a86e80          sigintr+0x2c           
 163   0%  98% 0.00   628697 0x6003fe15a38          nfs4_get_otw_cred+0x34 
 161   0%  98% 0.00   685230 0x6003fe15a38          nfs4open_otw+0xfa8     
 160   0%  98% 0.00   547855 0x6003fe15a38          open_owner_hold+0xc    
 159   0%  98% 0.00   643441 0x6003fe15a38          nfs4_get_otw_cred_by_osp+0x178
 132   0%  98% 0.00   524536 0x60033a86e80          post_syscall+0x320     
 128   0%  98% 0.00    57596 0x6003c2a4000          nfs4_start_open_seqid_sync+0x4
 123   0%  99% 0.00     4786 mod_lock               mod_release_stub+0x10  
 120   0%  99% 0.00    15053 0x6003c392000          find_open_owner+0x4    
 115   0%  99% 0.00    13426 0x6003fe15e98          nfs4open_otw+0xfa8     
 115   0%  99% 0.00   222285 0x6003c393000          find_open_owner+0x4    
 113   0%  99% 0.00   173366 0x6003c393000          nfs4_attr_cache+0xc    
 109   0%  99% 0.00     8928 0x6003c392000          nfs4write+0x174        
 108   0%  99% 0.00    14343 0x6003fe15e98          open_owner_hold+0xc    
 105   0%  99% 0.00   924250 0x6003c2a4000          nfs4_start_open_seqid_sync+0x9c
 103   0%  99% 0.00    86073 0x6003fe15e98          nfs4_get_otw_cred_by_osp+0x178
  94   0%  99% 0.00     2909 0x60031e81380          hook_run+0xd8          
  93   0%  99% 0.00     9617 0x6003c392000          nfs4write+0x78         
  89   0%  99% 0.00     9063 0x6003c392000          nfs4_rfscall+0x48      
  89   0%  99% 0.00     5480 0x6003c24ef28          mir_wput+0xc4          
  88   0%  99% 0.00     8461 0x6003c392000          nfs4_rfscall+0x17c     
  87   0%  99% 0.00  1146262 0x60033a86e80          lookuppnat+0x2c        
  84   0%  99% 0.00   295989 0x6003c393000          nfs4_delegation_accept+0x2c8
  83   0%  99% 0.00     4963 0x6003c393ae8          nfs_rw_exit+0xc        
  79   0%  99% 0.00    10948 0x6003c392000          wait_for_recovery+0x4  
  75   0%  99% 0.00    13338 0x6003c393000          nfs4_start_fop+0x378   
  75   0%  99% 0.00   380086 0x6003c2a4000          find_open_owner+0x4    
  61   0%  99% 0.00     2234 zone0+0x18             zone_getspecific+0x8   
  59   0%  99% 0.00     3084 0x60031e81380          hook_run+0x10          
  57   0%  99% 0.00     6601 0x60034fb6a40          e1000g_rxfree_func+0x48
  54   0%  99% 0.00    12081 0x6003c393ae8          nfs_rw_enter_sig+0x8   
  49   0%  99% 0.00     1100 0x6003505a6a0          e1000g_send+0x1534     
  43   0%  99% 0.00     5959 0x6003c392000          nfs4_async_start+0x278 
  43   0%  99% 0.00   641531 0x6003c2a4000          nfs4_delegation_accept+0x2c8
  43   0%  99% 0.00   916554 0x6003c2a4000          nfs4open_otw+0xbc0     
  43   0%  99% 0.00     7897 0x60037b50a38          nfs_idmap_str_uid+0xb0 
  40   0%  99% 0.00    19531 0x6003c392000          nfs4write+0x4dc        
  38   0%  99% 0.00     1135 0x60037554900          tcp_wput+0xa4          
  37   0%  99% 0.00   101535 0x6003c2a4000          nfs4_start_fop+0x378   
  35   0%  99% 0.00     6144 0x60037b50a38          nfs_idmap_str_gid+0xb0 
  34   0%  99% 0.00    23000 0x6003c392000          nfs4write+0x234        
  30   0%  99% 0.00    14483 0x6003c393000          nfs4open_otw+0x5e4     
  28   0%  99% 0.00   452930 0x6003c2a4000          nfs4write+0x174        
  27   0%  99% 0.00  1138896 0x6003c2a4000          nfs4read+0x240
     

This looks like we have a congestion of nfs4 operations!

I was not able to find much more information about this issue, so I contacted my Solaris Guru of Choice. He was able to find a bug:

"NFSv4 can be much slower and more expensive than NFSv3 when creating files at high concurrency"

This sounded exactly what our application does. Putting spam-mails into folders, using parallel threads. :-) 

Now it was time to open a support-case. About 10 days after opening the support case, we received an "Interim Diagnostic Relief (IDR)" -Patch.

And this is how it helped us (Sunday/Monday on the right):










Again, it was relatively easy to find the root-cause for our problem, with the help of a few *stat commands.

The question remains, why did it happen.

First of all NFSv4 is still a relatively new protocol, with new bugs. We have also tried out NFSv3, but what we've seen, for our workload (small files) version 4 is much more efficient.

Second, while our workload is a perfect fit for these CoolThread CPUs (normally ;-), we're still on new grounds.

My guess is, the world will see much more concurrency problems like these in the future.

Comments

RNS said…
There was [is?] an additional problem in nfs4_end_open_seqid_sync, which uses cv_broadcast to wake up all the threads asleep inside nfs4_start_open_seqid_sync on the same cv. This is the classic Thundering Herds problem. It appears to me that it should haved used cv_signal instead. Hopefully the IDR has eliminated that too.

If you're creating files heavily in the same directory you may also run into 6886066 (not nfsv4-specific though).
Unknown said…
Hi RNS

The IDR contains the cv_signal change and there should soon be a patch available:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6996729

The IDR has solved the problem for us...

Popular posts from this blog

SLOG Latency

Heating up the Data Pipeline (Part 1)

Heating up the Data Pipeline (Part 2)