[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "The Kernel is the Problem, Not the Solution"



On 13/05/13 22:36, Dave Scott wrote:

http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html

Some quite interesting stuff in there.


There's some valid points here that we found with the R2D2 work, too (e.g. raw sockets don't do more than ~1 Mpps), but also a bit of folklore that does not seem right to me:

"Don’t scribble data all over memory via pointers. Each time you follow a pointer it will be a cache miss: [hash pointer] -> [Task Control Block] -> [Socket] -> [App]. That’s four cache misses."

... that sounds like someone didn't understand caches, or they're talking about a particularly weird kind of pointer. Normal pointers don't require looking up in the TCB.

"The paging table for 32gigs require 64MB of paging tables which doesn’t fit in cache. So you have two caches misses, one for the paging table and one for what it points to. [...]

Solutions: compress data; use cache efficient structures instead of binary search tree that has a lot of memory accesses"

This seems to ignore the fact that page tables are in HW; if they're talking about the data structures used for maintaining mapped memory regions in the kernel, this is more accurate. In that case, approaches like that in RadixVM (EuroSys 2013; http://dl.acm.org/citation.cfm?id=2465373) seem promising.

And there's some misattribution of SDN concepts, I think:

"The control plane is left to Linux, for the data plane, nothing. The data plane runs in application code. It never interacts with the kernel. There’s no thread scheduling, no system calls, no interrupts, nothing."

Sounds nice, but is largely untrue for Linux. Even user-space networking solutions like NetMap still occasionally make syscalls (though they amortise many operations into one of them). What they are proposing sounds more like the ideas in Arrakis (HotOS 2013) and some super-computing work (e.g. FusedOS at SC 2012) -- or indeed like some ideas we're looking at for DIOS, where cores are "unhooked" from Linux once it boots up, and get to use NIC queues directly (without ever telling the Linux kernel, but having the option to interact with it for resource allocation).

Cheers,
Malte




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.