| Luke, All 
 On Jun 1, 2009, at 8:43 PM, Luke S Crawford wrote: Peter Booth <peter_booth@xxxxxxx > writes: Here's more context. The VMs weren't page scanning. They did show non-
 trivial %steal (where non-trivial is > 1%)
 These VMs are commercially hosted on five quad core hosts with approx
 14 VMs per host and just under 1GB RAM per VM. Thats not a lot of
 memory, but then the workload of one nginx and three mongrels per VM
 is comfortably under 512MB of RSS.
 I guess I don't know much about mongrel, but if someone was complaining to me about performance of a modern web application in an image with only 1GB ram,  CPU would not be the first thing I'd look at.    
 
 I look at everything. Yes 1GB is a limitation. The mongrel was configured taking that into account.  
 so steal was >1%?   what was idle?  what was iowait?   if steal was only 10%and iowait was 50%, I'd still add more ram before I added more CPU.
 Theres no need to discuss hypotheticals. Lets look at real numbers at a busy time: 
 sar -W -f 00:00:01     pswpin/s pswpout/s 00:00:06         0.00      0.00 00:00:11         0.00      0.00 00:00:16         0.00      0.00 00:00:21         0.00      0.00 
 pswpin/s pswpout/s is equal to zero at all times, in other words, no swapping is occurring. So disk isn't a factor here. 
 00:00:01        CPU     %user     %nice   %system   %iowait    %steal     %idle 00:00:06        all     84.42      0.00      6.92      3.08      0.96      4.62 00:00:11        all     92.46      0.00      6.15      0.00      1.19      0.20 00:00:16        all     90.24      0.00      6.37      0.40      2.00      1.00 00:00:21        all     88.42      0.00      8.98      0.00      1.80      0.80 
We are clearly CPU starved.  
 and his performance improved.  Disk is orders of magnitude slower thanjust about anything else (besides maybe network)  so whenever you can
 exchange disk access for ram access, you see dramatic performance
 improvements.
 
 
 That is not the case. You will only see an improvement if disk access is a bottleneck 
 My point, however, is that Xen performance is not well understood in
 general, and there are situations where virtualization doesn't perform
 well.
 These sar readings on DomU do not tell the whole picture, nor do the studies that  show Xen throughput is at worst only 8% worse than native Linux. 
 There are scenarios where the impact of virtualization on user response time  can be a factor of 3 or 4. 
 This issue is poorly understood, has been seen and described in research literature, and  until we get a handle on it and understand it, it will cause substantial problems. 
 With the increasing popularity of the cloud and virtualized environments, where there is less transparency than a physical environment, we should expect that  performance problems will increase. 
 |