Xen project Mailing List

On Oct 11, 2019, at 07:11, Lars Kurth <lars.kurth@xxxxxxxxxx> wrote:

On 11/10/2019, 02:24, "Stefano Stabellini" <sstabellini@xxxxxxxxxx> wrote:

   On Thu, 10 Oct 2019, Lars Kurth wrote:
* Would we ever include API docs generated from GPLv2 code? E.g. for safety use-cases?
@Stefano, @Artem: I guess this one is for you.
I suppose if we would have a similar issue for a safety manual
I am also assuming we would want to use sphinx docs and rst to generate a future safety manual

   Hi Lars,

   Thanks for putting this email together.

   In terms of formats, I don't have a preference between rst and pandoc,
   but if we are going to use rst going forward, I'd say to try to use rst
   for everything, including converting all the old stuff. The fewer
   different formats, the better.

I think the proposal that needs to follow on from this (which would at some
point need to be voted on) would then be to go for rst.

   As I mentioned during the FuSa call, I agree with you, Andrew, and
   others that it would be best to have the docs under a CC license. I do
   expect that we'll end up copy/pasting snippets of in-code comments into
   the docs, so I think it is important that we are allowed to do that from
   a license perspective. It is great that GPLv2 allows it (we need to be
   sure about this).

The GPL does *not* allow this, but (c) law and fair use clauses do. So typically
stuff such as
* Referring to function names, signatures, etc. tend to be all fine
* Copying large portions of in-line comments would not be fine, but
If they are large, they would in most cases be re-written in a more suitable
language.

So, I think overall, we should be fine. It's a bit of a grey area though.

And as you point out below, most of the code in question is typically BSD

   Yes, I expect that some docs might be automatically generated, but from
   header files, not from source code. Especailly public/ header files,
   which are typically BSD, not GPLv2. I cannot come up with examples of
   docs we need to generated from GPLv2-only code at the moment, hopefully
   there won't be any.

That makes things a lot easier.

   I wasn't planning on reusing any of the markup, and wasn't expecting to
   use much of the text either. I'm still considering the option of
   defining that xen/public/* isn't the canonical description of the ABI,
   because C is the wrong tool for the job.

   Its fine to provide a C set of headers implementing an ABI, but there is
   a very deliberate reason why the canonical migration v2 spec is in a
   text document.

@Stefano: as you and I believe Brian will be spending time on improving the
ABI docs, I think we need to build some agreement here on what/how
to do it. I was assuming that generally the consensus was to have
docs close to the code in source, but this does not seem to be the case.

But if we do have stuff separately, ideally we would have a tool that helps
point people editing headers to also look at the relevant docs. Otherwise it will
be hard to keep them in sync.

   In general, it is a good idea to keep the docs close to the code to make
   it easier to keep them up to date. But there is no one-size-fits-all
   here. For public ABI descriptions, I agree with Andrew that ideally they
   should not be defined as C header files.

   But it is not an issue: any work that we do here won't be wasted. For
   instance, we could start by adding more comments to the current header
   files. Then, as a second step, take all the comments and turn them into
   a proper ABI description document without any C function declarations.
   It is easy to move English text around, as long as the license allows it
   -- that is the only potential blocker I can see.

This is likely to be problematic. First of all, we are talking about BSD-3-Clause
or BSD-2-Clause code (the latter is more dominant in headers I believe) in
all known cases.

The main properties of the BSD are
1: Can be pretty much used anywhere for any purpose
2: Can be modified for any purpose
3: But the original license header must be retained in derivates

This is equivalent to attribution of the copyright owner of the originally created file.

Does *not* have requirements around attribution as CC-BY-4: however,
as we store everything in git attribution is handled by us by default

See above, the license header attributes copyright, since BSD was created for "software" and people who work on "software" would typically be looking at source code, hence the primary attribution takes place there, with secondary attribution in EULAs, "About" panels, etc.

CC-BY-4 also has properties 1-3
In addition: it does require that
4: Derived works are giving appropriate credit to authors
   We could clarify in a COPYING how we prefer to do this
   4.1: We could say that "referring to the Xen Project community"
           is sufficient to comply with the attribution clause

One motivation for CC-BY (with attribution) is to create an incentive (credit) for the creation of documentation, which is not commonly a favorite pastime of developers. Credit typically goes at least to the original author of a section of documentation, with varying ways of crediting subsequent contributors. The documentation can be structured to make crediting easier. The mechanism for crediting can be designed to encourage specific outcomes, along our projected doc lifecycle for safety certification, contributors, evaluators and commercial investors.

   4.2: We could require individual authors to be credited: in that
           case we probably ought to lead by example and list the authors
           in a credit/license section and extract the information from
           git logs when we generate it (at some point in the future)
5: You give an indication whether you made changes ... in practice
this means you have to state significant changes made to the works

This is also helpful for provenance of changes, which is relevant in safety-oriented documentation. It can be used to clearly delineate CC-licensed content (which may be reused by many companies) from "All Rights Reserved" commercial content that may be added for a specific commercial audience or purpose.

There is a difference between "software" which "runs on machines" and "documentation" which "runs on humans". Combined software (e.g. BSD code from two origins) is executed identically, despite origin. Humans make value judgements based on the author/origin of content, hence the focus on attribution. Yes, there is a provenance graph in git (software/data), but that's not typically visible to human readers, except as a generated report, i.e. documentation.

As such, BSD-2/3-Clause in our context works similarly to CC-BY-4
from a downstream's perspective. In fact CC-BY-4 is somewhat stricter

If we don't want the incentives and provenance properties of CC-BY, there is the option of CC0, which is the equivalent of public domain. This would delegate the task of separating commercial vs CC content to each reader, without any license-required attribution or separation.

Some background on licenses designed for documentation, which has different legal requirements than software:

https://www.dreamsongs.com/IHE/IHE-50.html

https://creativecommons.org/faq/#what-are-creative-commons-licenses (not for s/w)

Rich

Re: [Xen-devel] [RFC] Documentation formats, licenses and file system structure