[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Draft NVDIMM proposal


  • To: Dan Williams <dan.j.williams@xxxxxxxxx>
  • From: George Dunlap <george.dunlap@xxxxxxxxxx>
  • Date: Tue, 15 May 2018 15:19:08 +0100
  • Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwUCVFq6vQIbDAUJAeEzgAAKCRCmNjwxBZC0bWknD/97 Tkh3PMAcvMZINmJefBdYYspmwTWZSR9USsy68oWzDsXKNDNTqBC781lR/7PSqhqaSOmSnty3 FNblaBYKfMV3OOWgrP0H8Voqp4IgH3yOOkQLVITIwulqbbxQtmCsJ3xkhZm6CA0EKbc9VM/j FX3aCAfOJf52vlY1gXjYOvVjrdrRrBXEjs8E5f6EsrQKDrWCKNx/9qRfmtsQeKHTsgpINkpZ s11ClX/sM/RCR9/BgB/K08QQZYsWD6lgZh1KxLXRzKRunba0L+jpcRsoQFUMj/ofrfnHAdl0 q2upzISM/wR8aer+kekMo+y00schmYJYu5JAAzbjQQuhCAg0UTBGPaNwteL2l3c9Ps8on1nl mq9TnbYwGLAxJzXSb3BATgz7dygpsBBNS5WhUNQgIJvcZJbLggEIqjZGs8o7/+dt4klwxCYL FVlsWYSwEjX0UYHVLMS/F7FcXbCMUeoN/4krmRyv7YICE/VDQSDPcSKedzWvQM8T+5uY5pFJ NiIaa6asFndP50GiKbFtD6xAM+rbnwT7Io+iPtvD/3ddMXQs58IVMzgNA/hcdOX/qlx6Jqk/ hYQQsl4HoQsx/GyrNiwiPErTx32QNeXxoGYm6kwxt7F5qK7AN5tyYNkEyoxYrv8bl9VjAve8 hpECyf4O1mOGC/dIuBCDk8gxL5Pbo3jl98LBZQQYAQoADwIbDAUCVlNqsQUJA9njdAAKCRCm NjwxBZC0bbJMEACigmtpL2lzS47DXydApr1X8SYCHIPc39OjvmErjP05lKUZjmesmhlM5eKO gPb/fzeJ0wXB4J8OyseIJ0D/XwyLLQeM8d/HUFFMBWr+HE7jIukAUXeQ6GRwR+MBYGK/KmR9 JHbMAUz8f3G087Ma12BfpNWayndlFwR3rvdV4lvlyx6cl0EaFhbzPu/N07HG5MTk0evtphgZ 7wuG1oAtO+DGA6orHEicor6nBAQNZzPyjqo40dBxTs+amx7UndMRPSL1dD57eJwbbvBeNa8I w8wT7oNy2/C21VWmSy5XzMzcUTgmjmQz6DSNJPz2dMK4Y/LtcVFTfSZTmlBIkfoc9Vay2EB9 3z2EmjZwGT7n/DRu9QDtLbXyeVTBuLTaP3D+q5AyR1/5Z4T0LhwNvxeND5yO+YNAwqocZwL+ OcctpSZUBpAuU4Ju/9JKMX57GlnbjB8YGahoBJsQZx4CZyw0MXlkCk5cR0EPjY9iI2CEA5lO QueOSbo0hf1ZJwCx724lx0WSwL8ngd8wZTYMNc8GngaU61kmzfcuCklhokTxQdK7Efme5ccv A1txzgGewx9mDhPgNcJweasBnyL0N3wya2RMAzm04gCio8y4FKQepwQpKCNKAYZIU4juAPxn nb6cbBGiMGO1NDuxG+qvl1cMElnq+cuhSUlZdr2sE9JRfa0gucLBZQQYAQoADwIbDAUCWHQN VAUJBfqGFwAKCRCmNjwxBZC0bbgCD/oC6mWUrxQKWPDvFE9+fzm8UKqKP7aciz+gvWUN3o4i 4sRFNyvAEOW/QY2zwM1pN07BFZ3Z+8AVxpgR6h7RQzDJYSPZ5k5WWCJzJEQs2sPI5rfYJGK8 um7mlsSvf2xcLK/1Aj07BmWDjR6glDDRY+iMmSSdHe6Te6tiQPPS6Woj8AE3qf5lBsdvcEln nrkSwzNeVKRQQROUOskVw4WmCsNJjZtKmrVpgId3df/5HWG7Bi4nPwA8IFOt6O72lJlkORFy DF5P7ML7Pc5LbEFimzETPBxTJzVu1UoOQb/THB+qxhKMXXudSf/5sdMhwvOwItIcc5pib/v6 7gWK48bAzoOTgNYzmDCVC/roeLLU2SpEQIlIR0eAaWImgt8VEtre3Gch33e41DtbUli54DX0 dRdhqQaDM1T1q77VyDoZcs+SpGX9Ic9mxl+BN+6vtGIUVgaOG5pF85aQlRfCD6IlFQgiZtiR XeRpeIYG27RUw5kIljW+VxPMdBUvZpUXEazqjoPvBKybg0oKFfMXrMj4vHo6J0FD3ZEToGnP dANspUCZRewRozjp7ZWIu7QfGasfJNQ8c1IDiAFl3rV+dAGXXdmrDcX6w2q5lqoFz+8npK2I ehKCA94U+J/RLywUiaLuHnXt40WvQ98kHm7uTsy36iWqqawPqzmn8m5ruynVHmmcXsLBZQQY AQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9hjn1k5WcRHlu19WGuH6q0Kgm1 LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8kYj2Hn1QgX5SqQsysWTHWOEse GeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467FS/k4FJ5CHNRumvhLa0l2HEEu 5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWreDoaFqzq1TKtzHhFgQG7yFUE epxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4LH3hxQtiaIpuXqq2D4z63h6vC x2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4AjiKZ5qWNSEdvEpL43fTvZYxQh DCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180ADw7a3gnmr5RumcZP3NGSSZA 6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTRYJ2ms7oCe870gh4D1wFFqTLe yXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkHpTt3YYZvrhS2MO2EYEcWjyu6 LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGBq+/kRPrWXpoaQn7FXWGfMqU+ NkY9enyrlw==
  • Cc: linux-nvdimm <linux-nvdimm@xxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Yi Zhang <yi.z.zhang@xxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 15 May 2018 14:19:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 05/11/2018 05:33 PM, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> Great write up! Some comments below...

Thanks for the quick response!

It seems I still have some fundamental misconceptions about what's going
on, so I'd better start with that. :-)

Here's the part that I'm having a hard time getting.

If actual data on the NVDIMMs is a noun, and the act of writing is a
verb, then the SPA and interleave sets are adverbs: they define *how*
the write happens.  When the processor says, "Write to address X", the
memory controller converts address X into a <dimm number, dimm-physical
address> tuple to actually write the data.

So, who decides what this SPA range and interleave set is?  Can the
operating system change these interleave sets and mappings, or change
data from PMEM to BLK, and is so, how?

If you read through section 13.19 of the UEFI manual, it seems to imply
that this is determined by the label area -- that each DIMM has a
separate label area describing regions local to that DIMM; and that if
you have 4 DIMMs you'll have 4 label areas, and each label area will
have a label describing the DPA region on that DIMM which corresponds to
the interleave set.  And somehow someone sets up the interleave sets and
SPA based on what's written there.

Which would mean that an operating system could change how the
interleave sets work by rewriting the various labels on the DIMMs; for
instance, changing a single 4-way set spanning the entirety of 4 DIMMs,
to one 4-way set spanning half of 4 DIMMs, and 2 2-way sets spanning
half of 2 DIMMs each.

But then you say:

> Unlike NVMe an NVDIMM itself has no concept of namespaces. Some DIMMs
> provide a "label area" which is an out-of-band non-volatile memory
> area where the OS can store whatever it likes. The UEFI 2.7
> specification defines a data format for the definition of namespaces
> on top of persistent memory ranges advertised to the OS via the ACPI
> NFIT structure.

OK, so that sounds like no, that's that what happens.  So where do the
SPA range and interleave sets come from?

Random guess: The BIOS / firmware makes it up.  Either it's hard-coded,
or there's some menu in the BIOS you can use to change things around;
but once it hits the operating system, that's it -- the mapping of SPA
range onto interleave sets onto DIMMs is, from the operating system's
point of view, fixed.

And so (here's another guess) -- when you're talking about namespaces
and label areas, you're talking about namespaces stored *within a
pre-existing SPA range*.  You use the same format as described in the
UEFI spec, but ignore all the stuff about interleave sets and whatever,
and use system physical addresses relative to the SPA range rather than
DPAs.

Is that right?

But then there's things like this:

> There is no obligation for an NVDIMM to provide a label area, and as
> far as I know all NVDIMMs on the market today do not provide a label
> area.
[snip]
> Linux supports "label-less" mode where it exposes
> the raw capacity of a region in 1:1 mapped namespace without a label.
> This is how Linux supports "legacy" NVDIMMs that do not support
> labels.

So are "all NVDIMMs on the market today" then classed as "legacy"
NVDIMMs because they don't support labels?  And if labels are simply the
NVDIMM equivalent of a partition table, then what does it mena to
"support" or "not support" labels?

And then there's this:

> In any
> event we do the DIMM to SPA association first before reading labels.
> The OS calculates a so called "Interleave Set Cookie" from the NFIT
> information to compare against a similar value stored in the labels.
> This lets the OS determine that the Interleave Set composition has not
> changed from when the labels were initially written. An Interleave Set
> Cookie mismatch indicates the labels are stale, corrupted, or that the
> physical composition of the Interleave Set has changed.

So wait, the SPA and interleave sets can actually change?  And the
labels which the OS reads actually are per-DIMM, and do control somehow
how the DPA ranges of individual DIMMs are mapped into interleave sets
and exposed as SPAs?  (And perhaps, can be changed by the operating system?)

And:

> There are checksums in the Namespace definition to account label
> validity. Starting with ACPI 6.2 DSMs for labels are deprecated in
> favor of the new / named methods for label access _LSI, _LSR, and
> _LSW.

Does this mean the methods will use checksums to verify writes to the
label area, and refuse writes which create invalid labels?

If all of the above is true, then in what way can it be said that
"NVDIMM has no concept of namespaces", that an OS can "store whatever it
likes" in the label area, and that UEFI namespaces are "on top of
persistent memory ranges advertised to the OS via the ACPI NFIT structure"?

I'm sorry if this is obvious, but I am exactly as confused as I was
before I started writing this. :-)

This is all pretty foundational.  Xen can read static ACPI tables, but
it can't do AML.  So to do a proper design for Xen, we need to know:
1. If Xen can find out, without Linux's help, what namespaces exist and
if there is one it can use for its own purposes
2. If the SPA regions can change at runtime.

If SPA regions don't change after boot, and if Xen can find its own
Xen-specific namespace to use for the frame tables by reading the NFIT
table, then that significantly reduces the amount of interaction it
needs with Linux.

If SPA regions *can* change after boot, and if Xen must rely on Linux to
read labels and find out what it can safely use for frame tables, then
it makes things significantly more involved.  Not impossible by any
means, but a lot more complicated.

Hope all that makes sense -- thanks again for your help.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.