Xen project Mailing List

Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script

To: Anthony Perard <anthony.perard@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

Date: Tue, 17 Jan 2023 17:21:32 +0000

Accept-language: en-GB, en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hxV2T10Qrv9JpvFmxil+/4aisuRrhyEXIyRaKhunBxs=; b=WebNrPU592njWIJwPKt9Ly19/26OjR0wPmKemYtN10eF31/ChPqO+VsXFnnl9749ddYLMeYnh7tX6094gF7Lo/1GPkbW+96ZK+by5JjxRnvbP4tZUjh3r5Ud6CSIUuudLE+Dt9iBvDDq8w5mJlp0fOBtMwdY8pB8K1xP0Gi/bemX+lZ2/BqgSDwH0Kfcsf+5Pr83AMLgb9x1PPC6cjmZ8ko4j5vUkvap5i0XgjYyaJoY5yElrfkqeMO6KmXy8qV/MR5CRg36uJ6M0Vljk2PHRc/r/RWmRHfugoaSCnS03IVD97slxZgngpqTJRQF2aXkdjVh/Qn+/SKTetum+e2FPw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B8qO9RT6GRO6FkFea/3ju57yMTwrNPjnYK2keajLxbIYj1MTM3N+r0lLAz9+WU6bizx7WhlUEt6hkYQFgMDewa3ANKuysgYcYygX5DcO6I4a81Rh+1GSrwaO/z7XmxApJFapWrdxq56xm+ycp7DUKTgslXRyx2xRXFpuwMmKgd0GulBnvAJrzfdAvc6SRomKPUURjK9Q2KjfJU9FHKJKKL12Eg705tgHBAcb8iVarErja3cmU9mKRliwjckPHYgd3vUc7kQouU+YasOLflP0pukV9EYscYnvNYp+mPcM8yXCtAfMu59oh3GAoFr1i2E/aDDWxmaY1PDewdYTlLf8dg==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Tue, 17 Jan 2023 17:22:04 +0000

Ironport-data: A9a23:b2mXhK1TmGIwOXQD+/bD5Rlwkn2cJEfYwER7XKvMYLTBsI5bpzIGz 2BNXTyPaf/eYDekfdB2b43n9R5VupKEnIBhQVY+pC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK5ULSfUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS93uDgNyo4GlD5gVnOqgR1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfXV1z5 8A1JT00Nz+Ki763w72ra7lwv5F2RCXrFNt3VnBI6xj8VKxjZK+ZBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxpvS6PkWSd05C0WDbRUvWMSd9YgQCzo WXe8n6iKhobKMae2XyO9XfEaurnzHijBtxJTufQGvhC31uOw2kULkMtbnjnpcGLkxbiXcgPN BlBksYphe1onKCxdfHtUhv9rHOasxo0X9tLD/Z8+AyL0rDT4QuSGi4DVDEpQN4sudIyRDcq/ kSUhN6vDjtq2JWXVHac+7G8vT60fy8PIgcqfjQYRAEI593ipoAbjR/VSNtnVqmvgbXdBjXY0 z2M6i8kiN0uYdUj0qy6+RXCnGiqr52QFAotvF2LAySi8x9zY5Oja8qw81/H4P1cLYGfCF6co HwDnMvY5+cLZX2QqBGwrCw2NOnBz5643Pf03QEH80UJn9h1x0OeQA==

Ironport-hdrordr: A9a23:ltxUOq6MYCaK3sjNYAPXwPXXdLJyesId70hD6mlbQxY9SL3+qy nIppgmPH7P5wr5PUtKpTnuAse9qB/nlKKdg7NhXotKLTOHhILAFugLh+bfKlbbak/DH4BmpM NdWpk7JNrsDUVryebWiTPIderIGeP3lZyVuQ==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHZKdXmSfLi4OljAEm5WNlNB2w+ZK6i3JyA

Thread-topic: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script

On 16/01/2023 6:10 pm, Anthony PERARD wrote: > The get-fields.sh which generate all the include/compat/.xlat/*.h > headers is quite slow. It takes for example nearly 3 seconds to > generate platform.h on a recent machine, or 2.3 seconds for memory.h. > > Rewriting the mix of shell/sed/python into a single python script make > the generation of those file a lot faster. > > No functional change, the headers generated are identical. > > Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx> I'll happily trust the testing you've done, and that the outputs are equivalent. However, I have some general python suggestions. > diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py > new file mode 100644 > index 0000000000..c1b361ac56 > --- /dev/null > +++ b/xen/tools/compat-xlat-header.py > @@ -0,0 +1,468 @@ > +#!/usr/bin/env python > + > +from __future__ import print_function > +import re > +import sys > + > +typedefs = [] > + > +def removeprefix(s, prefix): > + if s.startswith(prefix): > + return s[len(prefix):] > + return s > + > +def removesuffix(s, suffix): > + if s.endswith(suffix): > + return s[:-len(suffix)] > + return s > + > +def get_fields(looking_for, header_tokens): > + level = 1 > + aggr = 0 > + fields = [] > + name = '' > + > + for token in header_tokens: > + if token in ['struct', 'union']: ('struct', 'union') A tuple here rather than a list is marginally more efficient. > + if level == 1: > + aggr = 1 > + fields = [] > + name = '' > + elif token == '{': > + level += 1 > + elif token == '}': > + level -= 1 > + if level == 1 and name == looking_for: > + fields.append(token) > + return fields > + elif re.match(r'^[a-zA-Z_]', token): Here and many other places, you've got re.{search,match} with repeated regexes. This can end up being quite expensive, and we already had one build system slowdown caused by it. Up near typedefs at the top, you want something like: token_re = re.compile('[a-zA-Z_]') to prepare the regex once at the start of day, and and use 'elif token_re.match(token)' here. Another thing to note, the difference between re.search and re.match is that match has an implicit '^' anchor. You need to be aware of this when compiling one regex to be used with both. All of this said, where is 0-9 in the token regex? Have we just got extremely lucky with having no embedded digits in identifiers thus far? > + if not (aggr == 0 or name != ''): > + name = token > + > + if aggr != 0: > + fields.append(token) > + > + return [] > + > +def get_typedefs(tokens): > + level = 1 > + state = 0 > + typedefs = [] This shadows the global typedefs list, but the result of calling this function is simply assigned to it. Looking at the code, the global list is used in several places It would be better to "global typedefs" here, and make this function void, unless you want to modify handle_field() and build_body() to take it in as a regular parameter. I'm pretty sure typedefs actually wants to be a dict rather than a list (will have better "id in typedefs" performance lower down), but that wants matching with code changes elsewhere, and probably wants doing separately. > + for token in tokens: > + if token == 'typedef': > + if level == 1: > + state = 1 > + elif re.match(r'^COMPAT_HANDLE$(.*)$$', token): > + if not (level != 1 or state != 1): > + state = 2 > + elif token in ['[', '{']: > + level += 1 > + elif token in [']', '}']: > + level -= 1 > + elif token == ';': > + if level == 1: > + state = 0 > + elif re.match(r'^[a-zA-Z_]', token): > + if not (level != 1 or state != 2): > + typedefs.append(token) > + return typedefs > + > +def build_enums(name, tokens): > + level = 1 > + kind = '' > + named = '' > + fields = [] > + members = [] > + id = '' > + > + for token in tokens: > + if token in ['struct', 'union']: > + if not level != 2: > + fields = [''] > + kind = "%s;%s" % (token, kind) > + elif token == '{': > + level += 1 > + elif token == '}': > + level -= 1 > + if level == 1: > + subkind = kind > + (subkind, _, _) = subkind.partition(';') > + if subkind == 'union': > + print("\nenum XLAT_%s {" % (name,)) > + for m in members: > + print(" XLAT_%s_%s," % (name, m)) > + print("};") > + return > + elif level == 2: > + named = '?' > + elif re.match(r'^[a-zA-Z_]', token): > + id = token > + k = kind > + (_, _, k) = k.partition(';') > + if named != '' and k != '': > + if len(fields) > 0 and fields[0] == '': > + fields.pop(0) > + build_enums("%s_%s" % (name, token), fields) > + named = '!' > + elif token == ',': > + if level == 2: > + members.append(id) > + elif token == ';': > + if level == 2: > + members.append(id) > + if named != '': > + (_, _, kind) = kind.partition(';') > + named = '' > + if len(fields) != 0: > + fields.append(token) > + > +def handle_field(prefix, name, id, type, fields): > + if len(fields) == 0: > + print(" \\") > + if type == '': > + print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='') > + else: > + k = id.replace('.', '_') > + print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='') > + elif not re.search(r'[{}]', ' '.join(fields)): > + tag = ' '.join(fields) > + tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag) > + print(" \\") > + print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), > end='') > + else: > + func_id = id > + func_tokens = fields > + kind = '' > + array = "" > + level = 1 > + arrlvl = 1 > + array_type = '' > + id = '' > + type = '' > + fields = [] > + for token in func_tokens: > + if token in ['struct', 'union']: > + if level == 2: > + fields = [''] > + if level == 1: > + kind = token > + if kind == 'union': > + tmp = func_id.replace('.', '_') > + print(" \\") > + print("%sswitch (%s) {" % (prefix, tmp), end='') > + elif token == '{': > + level += 1 > + id = '' > + elif token == '}': > + level -= 1 > + id = '' > + if level == 1 and kind == 'union': > + print(" \\") > + print("%s}" % (prefix,), end='') > + elif token == '[': > + if level != 2 or arrlvl != 1: > + pass > + elif array == '': > + array = ' ' > + else: > + array = "%s;" % (array,) > + arrlvl += 1 > + elif token == ']': > + arrlvl -= 1 > + elif re.match(r'^COMPAT_HANDLE$(.*)$$', token): > + if level == 2 and id == '': > + m = re.match(r'^COMPAT_HANDLE$(.*)$$', token) > + type = m.groups()[0] > + type = removeprefix(type, 'compat_') So this pattern is what lead to the introduction of the := operator in Python 3.8, but we obviously can't use it yet. At least pre-compiling the regexes will reduce the hit from the double evaluation here. ~Andrew P.S. I probably don't want to know why we have to special case evtchn port, argo port and domain handle. I think it says more about the this bodge of a parser than anything else...

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.