[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen stable-4.17] Revert "tools/xenstore: simplify loop handling connection I/O"



commit 2f8851c37f88e4eb4858e16626fcb2379db71a4f
Author:     Jason Andryuk <jandryuk@xxxxxxxxx>
AuthorDate: Thu Jan 26 11:00:24 2023 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Thu Jan 26 11:00:24 2023 +0100

    Revert "tools/xenstore: simplify loop handling connection I/O"
    
    I'm observing guest kexec trigger xenstored to abort on a double free.
    
    gdb output:
    Program received signal SIGABRT, Aborted.
    __pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) 
at ./nptl/pthread_kill.c:44
    44    ./nptl/pthread_kill.c: No such file or directory.
    (gdb) bt
        at ./nptl/pthread_kill.c:44
        at ./nptl/pthread_kill.c:78
        at ./nptl/pthread_kill.c:89
        at ../sysdeps/posix/raise.c:26
        at talloc.c:119
        ptr=ptr@entry=0x559fae724290) at talloc.c:232
        at xenstored_core.c:2945
    (gdb) frame 5
        at talloc.c:119
    119            TALLOC_ABORT("Bad talloc magic value - double free");
    (gdb) frame 7
        at xenstored_core.c:2945
    2945                talloc_increase_ref_count(conn);
    (gdb) p conn
    $1 = (struct connection *) 0x559fae724290
    
    Looking at a xenstore trace, we have:
    IN 0x559fae71f250 20230120 17:40:53 READ 
(/local/domain/3/image/device-model-dom
    id )
    wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 
disc
    ard
    wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 
disc
    ard
    wrl: dom    0      0  msec      10000 credit     1000000 reserve          0 
disc
    ard
    wrl: dom    3      0  msec      10000 credit     1000000 reserve          0 
disc
    ard
    OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
    wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 
disc
    ard
    wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 
disc
    ard
    IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
    DESTROY watch 0x559fae73f630
    DESTROY watch 0x559fae75ddf0
    DESTROY watch 0x559fae75ec30
    DESTROY watch 0x559fae75ea60
    DESTROY watch 0x559fae732c00
    DESTROY watch 0x559fae72cea0
    DESTROY watch 0x559fae728fc0
    DESTROY watch 0x559fae729570
    DESTROY connection 0x559fae724290
    orphaned node /local/domain/3/device/suspend/event-channel deleted
    orphaned node /local/domain/3/device/vbd/51712 deleted
    orphaned node /local/domain/3/device/vkbd/0 deleted
    orphaned node /local/domain/3/device/vif/0 deleted
    orphaned node /local/domain/3/control/shutdown deleted
    orphaned node /local/domain/3/control/feature-poweroff deleted
    orphaned node /local/domain/3/control/feature-reboot deleted
    orphaned node /local/domain/3/control/feature-suspend deleted
    orphaned node /local/domain/3/control/feature-s3 deleted
    orphaned node /local/domain/3/control/feature-s4 deleted
    orphaned node /local/domain/3/control/sysrq deleted
    orphaned node /local/domain/3/data deleted
    orphaned node /local/domain/3/drivers deleted
    orphaned node /local/domain/3/feature deleted
    orphaned node /local/domain/3/attr deleted
    orphaned node /local/domain/3/error deleted
    orphaned node /local/domain/3/console/backend-id deleted
    
    and no further output.
    
    The trace shows that DESTROY was called for connection 0x559fae724290,
    but that is the same pointer (conn) main() was looping through from
    connections.  So it wasn't actually removed from the connections list?
    
    Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
    connection I/O" fixes the abort/double free.  I think the use of
    list_for_each_entry_safe is incorrect.  list_for_each_entry_safe makes
    traversal safe for deleting the current iterator, but RELEASE/do_release
    will delete some other entry in the connections list.  I think the
    observed abort is because list_for_each_entry has next pointing to the
    deleted connection, and it is used in the subsequent iteration.
    
    Add a comment explaining the unsuitability of list_for_each_entry_safe.
    Also notice that the old code takes a reference on next which would
    prevents a use-after-free.
    
    This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.
    
    This is XSA-425/CVE-2022-42330.
    
    Fixes: e8e6e42279a5 ("tools/xenstore: simplify loop handling connection 
I/O")
    Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx>
    Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
    Reviewed-by: Julien Grall <jgrall@xxxxxxxxxx>
---
 tools/xenstore/xenstored_core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 476d5c6d51..56dbdc2530 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2935,8 +2935,23 @@ int main(int argc, char *argv[])
                        }
                }
 
-               list_for_each_entry_safe(conn, next, &connections, list) {
-                       talloc_increase_ref_count(conn);
+               /*
+                * list_for_each_entry_safe is not suitable here because
+                * handle_input may delete entries besides the current one, but
+                * those may be in the temporary next which would trigger a
+                * use-after-free.  list_for_each_entry_safe is only safe for
+                * deleting the current entry.
+                */
+               next = list_entry(connections.next, typeof(*conn), list);
+               if (&next->list != &connections)
+                       talloc_increase_ref_count(next);
+               while (&next->list != &connections) {
+                       conn = next;
+
+                       next = list_entry(conn->list.next,
+                                         typeof(*conn), list);
+                       if (&next->list != &connections)
+                               talloc_increase_ref_count(next);
 
                        if (conn_can_read(conn))
                                handle_input(conn);
--
generated by git-patchbot for /home/xen/git/xen.git#stable-4.17



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.