[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
On 16/01/2023 6:10 pm, Anthony PERARD wrote: > The get-fields.sh which generate all the include/compat/.xlat/*.h > headers is quite slow. It takes for example nearly 3 seconds to > generate platform.h on a recent machine, or 2.3 seconds for memory.h. > > Rewriting the mix of shell/sed/python into a single python script make > the generation of those file a lot faster. > > No functional change, the headers generated are identical. > > Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx> I'll happily trust the testing you've done, and that the outputs are equivalent. However, I have some general python suggestions. > diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py > new file mode 100644 > index 0000000000..c1b361ac56 > --- /dev/null > +++ b/xen/tools/compat-xlat-header.py > @@ -0,0 +1,468 @@ > +#!/usr/bin/env python > + > +from __future__ import print_function > +import re > +import sys > + > +typedefs = [] > + > +def removeprefix(s, prefix): > + if s.startswith(prefix): > + return s[len(prefix):] > + return s > + > +def removesuffix(s, suffix): > + if s.endswith(suffix): > + return s[:-len(suffix)] > + return s > + > +def get_fields(looking_for, header_tokens): > + level = 1 > + aggr = 0 > + fields = [] > + name = '' > + > + for token in header_tokens: > + if token in ['struct', 'union']: ('struct', 'union') A tuple here rather than a list is marginally more efficient. > + if level == 1: > + aggr = 1 > + fields = [] > + name = '' > + elif token == '{': > + level += 1 > + elif token == '}': > + level -= 1 > + if level == 1 and name == looking_for: > + fields.append(token) > + return fields > + elif re.match(r'^[a-zA-Z_]', token): Here and many other places, you've got re.{search,match} with repeated regexes. This can end up being quite expensive, and we already had one build system slowdown caused by it. Up near typedefs at the top, you want something like: token_re = re.compile('[a-zA-Z_]') to prepare the regex once at the start of day, and and use 'elif token_re.match(token)' here. Another thing to note, the difference between re.search and re.match is that match has an implicit '^' anchor. You need to be aware of this when compiling one regex to be used with both. All of this said, where is 0-9 in the token regex? Have we just got extremely lucky with having no embedded digits in identifiers thus far? > + if not (aggr == 0 or name != ''): > + name = token > + > + if aggr != 0: > + fields.append(token) > + > + return [] > + > +def get_typedefs(tokens): > + level = 1 > + state = 0 > + typedefs = [] This shadows the global typedefs list, but the result of calling this function is simply assigned to it. Looking at the code, the global list is used in several places It would be better to "global typedefs" here, and make this function void, unless you want to modify handle_field() and build_body() to take it in as a regular parameter. I'm pretty sure typedefs actually wants to be a dict rather than a list (will have better "id in typedefs" performance lower down), but that wants matching with code changes elsewhere, and probably wants doing separately. > + for token in tokens: > + if token == 'typedef': > + if level == 1: > + state = 1 > + elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token): > + if not (level != 1 or state != 1): > + state = 2 > + elif token in ['[', '{']: > + level += 1 > + elif token in [']', '}']: > + level -= 1 > + elif token == ';': > + if level == 1: > + state = 0 > + elif re.match(r'^[a-zA-Z_]', token): > + if not (level != 1 or state != 2): > + typedefs.append(token) > + return typedefs > + > +def build_enums(name, tokens): > + level = 1 > + kind = '' > + named = '' > + fields = [] > + members = [] > + id = '' > + > + for token in tokens: > + if token in ['struct', 'union']: > + if not level != 2: > + fields = [''] > + kind = "%s;%s" % (token, kind) > + elif token == '{': > + level += 1 > + elif token == '}': > + level -= 1 > + if level == 1: > + subkind = kind > + (subkind, _, _) = subkind.partition(';') > + if subkind == 'union': > + print("\nenum XLAT_%s {" % (name,)) > + for m in members: > + print(" XLAT_%s_%s," % (name, m)) > + print("};") > + return > + elif level == 2: > + named = '?' > + elif re.match(r'^[a-zA-Z_]', token): > + id = token > + k = kind > + (_, _, k) = k.partition(';') > + if named != '' and k != '': > + if len(fields) > 0 and fields[0] == '': > + fields.pop(0) > + build_enums("%s_%s" % (name, token), fields) > + named = '!' > + elif token == ',': > + if level == 2: > + members.append(id) > + elif token == ';': > + if level == 2: > + members.append(id) > + if named != '': > + (_, _, kind) = kind.partition(';') > + named = '' > + if len(fields) != 0: > + fields.append(token) > + > +def handle_field(prefix, name, id, type, fields): > + if len(fields) == 0: > + print(" \\") > + if type == '': > + print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='') > + else: > + k = id.replace('.', '_') > + print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='') > + elif not re.search(r'[{}]', ' '.join(fields)): > + tag = ' '.join(fields) > + tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag) > + print(" \\") > + print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), > end='') > + else: > + func_id = id > + func_tokens = fields > + kind = '' > + array = "" > + level = 1 > + arrlvl = 1 > + array_type = '' > + id = '' > + type = '' > + fields = [] > + for token in func_tokens: > + if token in ['struct', 'union']: > + if level == 2: > + fields = [''] > + if level == 1: > + kind = token > + if kind == 'union': > + tmp = func_id.replace('.', '_') > + print(" \\") > + print("%sswitch (%s) {" % (prefix, tmp), end='') > + elif token == '{': > + level += 1 > + id = '' > + elif token == '}': > + level -= 1 > + id = '' > + if level == 1 and kind == 'union': > + print(" \\") > + print("%s}" % (prefix,), end='') > + elif token == '[': > + if level != 2 or arrlvl != 1: > + pass > + elif array == '': > + array = ' ' > + else: > + array = "%s;" % (array,) > + arrlvl += 1 > + elif token == ']': > + arrlvl -= 1 > + elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token): > + if level == 2 and id == '': > + m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token) > + type = m.groups()[0] > + type = removeprefix(type, 'compat_') So this pattern is what lead to the introduction of the := operator in Python 3.8, but we obviously can't use it yet. At least pre-compiling the regexes will reduce the hit from the double evaluation here. ~Andrew P.S. I probably don't want to know why we have to special case evtchn port, argo port and domain handle. I think it says more about the this bodge of a parser than anything else...
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |