[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script


  • To: Anthony Perard <anthony.perard@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Tue, 17 Jan 2023 17:21:32 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hxV2T10Qrv9JpvFmxil+/4aisuRrhyEXIyRaKhunBxs=; b=WebNrPU592njWIJwPKt9Ly19/26OjR0wPmKemYtN10eF31/ChPqO+VsXFnnl9749ddYLMeYnh7tX6094gF7Lo/1GPkbW+96ZK+by5JjxRnvbP4tZUjh3r5Ud6CSIUuudLE+Dt9iBvDDq8w5mJlp0fOBtMwdY8pB8K1xP0Gi/bemX+lZ2/BqgSDwH0Kfcsf+5Pr83AMLgb9x1PPC6cjmZ8ko4j5vUkvap5i0XgjYyaJoY5yElrfkqeMO6KmXy8qV/MR5CRg36uJ6M0Vljk2PHRc/r/RWmRHfugoaSCnS03IVD97slxZgngpqTJRQF2aXkdjVh/Qn+/SKTetum+e2FPw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B8qO9RT6GRO6FkFea/3ju57yMTwrNPjnYK2keajLxbIYj1MTM3N+r0lLAz9+WU6bizx7WhlUEt6hkYQFgMDewa3ANKuysgYcYygX5DcO6I4a81Rh+1GSrwaO/z7XmxApJFapWrdxq56xm+ycp7DUKTgslXRyx2xRXFpuwMmKgd0GulBnvAJrzfdAvc6SRomKPUURjK9Q2KjfJU9FHKJKKL12Eg705tgHBAcb8iVarErja3cmU9mKRliwjckPHYgd3vUc7kQouU+YasOLflP0pukV9EYscYnvNYp+mPcM8yXCtAfMu59oh3GAoFr1i2E/aDDWxmaY1PDewdYTlLf8dg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 17 Jan 2023 17:22:04 +0000
  • Ironport-data: A9a23:b2mXhK1TmGIwOXQD+/bD5Rlwkn2cJEfYwER7XKvMYLTBsI5bpzIGz 2BNXTyPaf/eYDekfdB2b43n9R5VupKEnIBhQVY+pC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK5ULSfUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS93uDgNyo4GlD5gVnOqgR1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfXV1z5 8A1JT00Nz+Ki763w72ra7lwv5F2RCXrFNt3VnBI6xj8VKxjZK+ZBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxpvS6PkWSd05C0WDbRUvWMSd9YgQCzo WXe8n6iKhobKMae2XyO9XfEaurnzHijBtxJTufQGvhC31uOw2kULkMtbnjnpcGLkxbiXcgPN BlBksYphe1onKCxdfHtUhv9rHOasxo0X9tLD/Z8+AyL0rDT4QuSGi4DVDEpQN4sudIyRDcq/ kSUhN6vDjtq2JWXVHac+7G8vT60fy8PIgcqfjQYRAEI593ipoAbjR/VSNtnVqmvgbXdBjXY0 z2M6i8kiN0uYdUj0qy6+RXCnGiqr52QFAotvF2LAySi8x9zY5Oja8qw81/H4P1cLYGfCF6co HwDnMvY5+cLZX2QqBGwrCw2NOnBz5643Pf03QEH80UJn9h1x0OeQA==
  • Ironport-hdrordr: A9a23:ltxUOq6MYCaK3sjNYAPXwPXXdLJyesId70hD6mlbQxY9SL3+qy nIppgmPH7P5wr5PUtKpTnuAse9qB/nlKKdg7NhXotKLTOHhILAFugLh+bfKlbbak/DH4BmpM NdWpk7JNrsDUVryebWiTPIderIGeP3lZyVuQ==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZKdXmSfLi4OljAEm5WNlNB2w+ZK6i3JyA
  • Thread-topic: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script

On 16/01/2023 6:10 pm, Anthony PERARD wrote:
> The get-fields.sh which generate all the include/compat/.xlat/*.h
> headers is quite slow. It takes for example nearly 3 seconds to
> generate platform.h on a recent machine, or 2.3 seconds for memory.h.
>
> Rewriting the mix of shell/sed/python into a single python script make
> the generation of those file a lot faster.
>
> No functional change, the headers generated are identical.
>
> Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>

I'll happily trust the testing you've done, and that the outputs are
equivalent.  However, I have some general python suggestions.

> diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
> new file mode 100644
> index 0000000000..c1b361ac56
> --- /dev/null
> +++ b/xen/tools/compat-xlat-header.py
> @@ -0,0 +1,468 @@
> +#!/usr/bin/env python
> +
> +from __future__ import print_function
> +import re
> +import sys
> +
> +typedefs = []
> +
> +def removeprefix(s, prefix):
> +    if s.startswith(prefix):
> +        return s[len(prefix):]
> +    return s
> +
> +def removesuffix(s, suffix):
> +    if s.endswith(suffix):
> +        return s[:-len(suffix)]
> +    return s
> +
> +def get_fields(looking_for, header_tokens):
> +    level = 1
> +    aggr = 0
> +    fields = []
> +    name = ''
> +
> +    for token in header_tokens:
> +        if token in ['struct', 'union']:

('struct', 'union')

A tuple here rather than a list is marginally more efficient.

> +            if level == 1:
> +                aggr = 1
> +                fields = []
> +                name = ''
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1 and name == looking_for:
> +                fields.append(token)
> +                return fields
> +        elif re.match(r'^[a-zA-Z_]', token):

Here and many other places, you've got re.{search,match} with repeated
regexes.   This can end up being quite expensive, and we already had one
build system slowdown caused by it.

Up near typedefs at the top, you want something like:

token_re = re.compile('[a-zA-Z_]')

to prepare the regex once at the start of day, and and use 'elif
token_re.match(token)' here.

Another thing to note, the difference between re.search and re.match is
that match has an implicit '^' anchor.  You need to be aware of this
when compiling one regex to be used with both.


All of this said, where is 0-9 in the token regex?  Have we just got
extremely lucky with having no embedded digits in identifiers thus far?

> +            if not (aggr == 0 or name != ''):
> +                name = token
> +
> +        if aggr != 0:
> +            fields.append(token)
> +
> +    return []
> +
> +def get_typedefs(tokens):
> +    level = 1
> +    state = 0
> +    typedefs = []

This shadows the global typedefs list, but the result of calling this
function is simply assigned to it.  Looking at the code, the global list
is used in several places

It would be better to "global typedefs" here, and make this function
void, unless you want to modify handle_field() and build_body() to take
it in as a regular parameter.

I'm pretty sure typedefs actually wants to be a dict rather than a list
(will have better "id in typedefs" performance lower down), but that
wants matching with code changes elsewhere, and probably wants doing
separately.

> +    for token in tokens:
> +        if token == 'typedef':
> +            if level == 1:
> +                state = 1
> +        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +            if not (level != 1 or state != 1):
> +                state = 2
> +        elif token in ['[', '{']:
> +            level += 1
> +        elif token in [']', '}']:
> +            level -= 1
> +        elif token == ';':
> +            if level == 1:
> +                state = 0
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            if not (level != 1 or state != 2):
> +                typedefs.append(token)
> +    return typedefs
> +
> +def build_enums(name, tokens):
> +    level = 1
> +    kind = ''
> +    named = ''
> +    fields = []
> +    members = []
> +    id = ''
> +
> +    for token in tokens:
> +        if token in ['struct', 'union']:
> +            if not level != 2:
> +                fields = ['']
> +            kind = "%s;%s" % (token, kind)
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1:
> +                subkind = kind
> +                (subkind, _, _) = subkind.partition(';')
> +                if subkind == 'union':
> +                    print("\nenum XLAT_%s {" % (name,))
> +                    for m in members:
> +                        print("    XLAT_%s_%s," % (name, m))
> +                    print("};")
> +                return
> +            elif level == 2:
> +                named = '?'
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            id = token
> +            k = kind
> +            (_, _, k) = k.partition(';')
> +            if named != '' and k != '':
> +                if len(fields) > 0 and fields[0] == '':
> +                    fields.pop(0)
> +                build_enums("%s_%s" % (name, token), fields)
> +                named = '!'
> +        elif token == ',':
> +            if level == 2:
> +                members.append(id)
> +        elif token == ';':
> +            if level == 2:
> +                members.append(id)
> +            if named != '':
> +                (_, _, kind) = kind.partition(';')
> +            named = ''
> +        if len(fields) != 0:
> +            fields.append(token)
> +
> +def handle_field(prefix, name, id, type, fields):
> +    if len(fields) == 0:
> +        print(" \\")
> +        if type == '':
> +            print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='')
> +        else:
> +            k = id.replace('.', '_')
> +            print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='')
> +    elif not re.search(r'[{}]', ' '.join(fields)):
> +        tag = ' '.join(fields)
> +        tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag)
> +        print(" \\")
> +        print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), 
> end='')
> +    else:
> +        func_id = id
> +        func_tokens = fields
> +        kind = ''
> +        array = ""
> +        level = 1
> +        arrlvl = 1
> +        array_type = ''
> +        id = ''
> +        type = ''
> +        fields = []
> +        for token in func_tokens:
> +            if token in ['struct', 'union']:
> +                if level == 2:
> +                    fields = ['']
> +                if level == 1:
> +                    kind = token
> +                    if kind == 'union':
> +                        tmp = func_id.replace('.', '_')
> +                        print(" \\")
> +                        print("%sswitch (%s) {" % (prefix, tmp), end='')
> +            elif token == '{':
> +                level += 1
> +                id = ''
> +            elif token == '}':
> +                level -= 1
> +                id = ''
> +                if level == 1 and kind == 'union':
> +                    print(" \\")
> +                    print("%s}" % (prefix,), end='')
> +            elif token == '[':
> +                if level != 2 or arrlvl != 1:
> +                    pass
> +                elif array == '':
> +                    array = ' '
> +                else:
> +                    array = "%s;" % (array,)
> +                arrlvl += 1
> +            elif token == ']':
> +                arrlvl -= 1
> +            elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +                if level == 2 and id == '':
> +                    m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token)
> +                    type = m.groups()[0]
> +                    type = removeprefix(type, 'compat_')

So this pattern is what lead to the introduction of the := operator in
Python 3.8, but we obviously can't use it yet.

At least pre-compiling the regexes will reduce the hit from the double
evaluation here.

~Andrew

P.S. I probably don't want to know why we have to special case evtchn
port, argo port and domain handle.  I think it says more about the this
bodge of a parser than anything else...

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.