Aim is to implement Xen Memory
Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems such
as Linux, FreeBSD and Microsoft Windows. In such environment many domains share
read-only filesystems that contain operating system and frequently usedprogram
files and libraries.Each domain has their own writable filesystems for storing
data and temporary files. In this configuration, multiple pages scattered in
different domains mostly happen to contain same disk block. So, in our approach
to perform deduplication we intend to add a data structure in dom 0 which store
disk block number and the machine frame number(MFN) when a read request for the
read only code(and data) is made. Now when another domain U places the request
for the block of code and Dom 0 recieves a request for I/O (DMA), it will first
check into the data structure for the entry for the block. If it finds the
block it will return the MFN of the already read page and map it to the
requesting domain's PFN resulting in zero I/O processing time of blocks which
are already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.
Test case scenario:
Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple DomUs
running same kernel. Now, as deduplication is enabled in the file system
initially all the blocks of the domains will be pointing to the same disk
blocks. Now when the kernel's are booted, they all will consume memory only
once for the programs(code segment) loaded in the memory. Now as these OSs
start to write to their own virtual filesystems the blocks of the image will be
COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?