[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Minios-devel] [UNIKRAFT PATCH 4/5] lib/cpio: Add CPIO extraction functionality



On 28.01.20 05:02, Robert Hrusecky wrote:
Modeled after the FreeBSD libarchive:
https://github.com/freebsd/freebsd/blob/master/contrib/libarchive/libarchive/archive_read_support_format_cpio.c

The implementation is mostly complete except that it does not yet
support symlinks

Signed-off-by: Robert Hrusecky <roberth@xxxxxxxxxxxxx>
Signed-off-by: Omar Jamil <omarj2898@xxxxxxxxx>
Signed-off-by: Sachin Beldona <sachinbeldona@xxxxxxxxxx>
---
  lib/cpio/cpio.c            | 239 +++++++++++++++++++++++++++++++++++++
  lib/cpio/exportsyms.uk     |   2 +-
  lib/cpio/include/uk/cpio.h |  57 +++++++++
  3 files changed, 297 insertions(+), 1 deletion(-)

diff --git a/lib/cpio/cpio.c b/lib/cpio/cpio.c
index e69de29..e27e99b 100644
--- a/lib/cpio/cpio.c
+++ b/lib/cpio/cpio.c
@@ -0,0 +1,239 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * Authors: Robert Hrusecky <roberth@xxxxxxxxxxxxx>
+ *          Omar Jamil <omarj2898@xxxxxxxxx>
+ *          Sachin Beldona <sachinbeldona@xxxxxxxxxx>
+ *
+ * Copyright (c) 2017, NEC Europe Ltd., NEC Corporation. All rights reserved.

You should put you or your university as copyright holder depending on what applies in your case, no NEC. Please make sure that you also include original copyright when you got parts of the code from somewhere else.

+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the copyright holder nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.

Please remove this last sentence because it is in conflict with the BSD license. You probably got this from somewhere else in Unikraft. We actually need to fix this for the whole repo. This is somehow a copy&paste mistake.

+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdbool.h>

I did not see stdbool.h used. You can remove the header.
You should include stdint.h or inttypes.h because you use uint32_t.

+
+#include <uk/cpio.h>
+#include <uk/essentials.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+

Could you add here a note that your code currently support only BSD new-style cpio archive format.

+#define CPIO_MAGIC_NEWC "070701"
+#define CPIO_MAGIC_CRC "070702"
+#define FILE_TYPE_MASK 0170000
+#define DIRECTORY_BITS 040000
+#define FILE_BITS 0100000
+
+#define ALIGN_4(ptr) ((void *)ALIGN_UP((uintptr_t)(ptr), 4))
+
+#define IS_FILE_OF_TYPE(mode, bits) (((mode) & (FILE_TYPE_MASK)) == (bits))
+#define IS_FILE(mode) IS_FILE_OF_TYPE((mode), (FILE_BITS))
+#define IS_DIR(mode) IS_FILE_OF_TYPE((mode), (DIRECTORY_BITS))
+
+#define GET_MODE(hdr) ((mode_t)strhex8_to_u32((hdr)->mode))
+
+struct cpio_header {
+       char magic[6];
+       char inode_num[8];
+       char mode[8];
+       char uid[8];
+       char gid[8];
+       char nlink[8];
+       char mtime[8];
+       char filesize[8];
+       char major[8];
+       char minor[8];
+       char ref_major[8];
+       char ref_minor[8];
+       char namesize[8];
+       char chksum[8];
+};
+
+static bool valid_magic(struct cpio_header *header)
+{
+       return memcmp(header->magic, CPIO_MAGIC_NEWC, 6) == 0
+              || memcmp(header->magic, CPIO_MAGIC_CRC, 6) == 0;
+}
+
+/* Function to convert len digits of hexadecimal string loc
+ * to an integer.
+ * Returns the converted unsigned integer value on success.
+ * Returns 0 on error.
+ */
+static unsigned int snhex_to_int(size_t len, char *loc)

I would change the signature to:
static unsinged int snhex_to_uint(const char *buf, size_t count)

By using `const char *`, the compiler starts warn yu when you start modifying the input string.

+{
+       int val = 0;

val should should match with the return type: unsigned int.

+       size_t i;
+

+UK_ASSERT(buf);
...to make sure there is no programming error.

+       for (i = 0; i < len; i++) {
+               val *= 16;
+               if (*(loc + i) >= '0' && *(loc + i) <= '9')
+                       val += (*(loc + i) - '0');
+               else if (*(loc + i) >= 'A' && *(loc + i) <= 'F')
+                       val += (*(loc + i) - 'A') + 10;
+               else if (*(loc + i) >= 'a' && *(loc + i) <= 'f')
+                       val += (*(loc + i) - 'a') + 10;
+               else
+                       return 0;
+       }
+       return val;
+}
+
+static uint32_t strhex8_to_u32(char *loc)
+{
+       return snhex_to_int(8, loc);
+}

You can define this as a macro instead. Then, you keep input data types:

#define s8hex_to_u32(buf)       ((uint32_t) snhex_to_uint((buf), 8))

+
+static inline char *filename(struct cpio_header *header)

const char *? Could also a macro instead (optional).

+{
+       return (char *)header + sizeof(struct cpio_header);
+}
+
+static char *absolute_path(char *path_to_prepend, char *path)

You should declare both input parameters also as const char *, you use them read-only. I agree that the output type is char *, you are allocating memory and do not care what the caller does with it. Your function will not use it ever again. "const char *prefix", or "const char *base" would be shorter than "const char *path_to_prepend".

+{
+       int dir_slash_included =
+           *(path_to_prepend + strlen(path_to_prepend) - 1) == '/' ? 1 : 2;
+       char *abs_path = (char *)malloc(strlen(path) + strlen(path_to_prepend)
+                                       + dir_slash_included);

I would separate variable declaration from computation, this makes it a bit more readable. You could add UK_ASSERTs to make sure prefix and path are valid. Since you use path_to_prepend multiple times, I would pre-compute the value once and then use it for the rest of the function lifetime. This reduces the number of the costly calls to strlen. dir_slash_included sounds like a boolean that tells you if a slash is included or not. For you if-cases and length computation you actually want to have the negated. For readability, I would call it add_slash instead.

        int add_slash;
        size_t prefix_len;
        size_t path_len;
        size_t abs_path_len;
        char *abs_path;

        prefix_len = strlen(prefix);
        path_len = strlen(path);

In order for readability, you can access the char buffers also like arrays:

        add_slash = prefix[prefix_len - 1] == '/' ? 0 : 1;
        abs_path_len = prefix_len + add_slash + path_len + 1);

        abs_path = malloc(abs_path_len);

> +  if (abs_path == NULL)
> +          return NULL;

The following block...

+       memcpy(abs_path, path_to_prepend, strlen(path_to_prepend) > +        if 
(dir_slash_included == 2)
+               *(abs_path + strlen(path_to_prepend)) = '/';
+       memcpy(abs_path + strlen(path_to_prepend) + dir_slash_included - 1,
+              path, strlen(path));
+       *(abs_path + strlen(path) + strlen(path_to_prepend) + dir_slash_included
+         - 1) = '\0';

...would look then:

        memcpy(abs_path, prefix, prefix_len);
        if (add_slash)
                abs_path[prefix_len] = '/';
        memcpy(&abs_path[prefix_len + add_slash], path, path_len);

        abs_path[abs_path_len - 1] = '\0';

+       return abs_path;
+}
+
+static enum cpio_error read_section(struct cpio_header **header_ptr,
+                                   char *mount_loc, uintptr_t last)

I would call it "const char *dest" instead of "char *mount_loc". mount_loc implies that we mounted something in this library.

+{
+       if (strcmp(filename(*header_ptr), "TRAILER!!!") == 0) {
+               *header_ptr = NULL;
+               return CPIO_SUCCESS;
+       }
+
+       if (!valid_magic(*header_ptr)) {
+               *header_ptr = NULL;
+               return -CPIO_INVALID_HEADER;
+       }
+
+       if (mount_loc == NULL) {
+               *header_ptr = NULL;
+               return -CPIO_NO_MOUNT_LOCATION;
+       }

You can also do an UK_ASSERT(dest) for dest instead of the if-case. You catched already the invalid input case in the caller function.

+
+       struct cpio_header *header = *header_ptr;
+       char *path_from_root = absolute_path(mount_loc, filename(header));

Please move variables declarations to the head of the function. You can assign the values still here.

+
+       if (path_from_root == NULL) {
+               *header_ptr = NULL;
+               return -CPIO_NOMEM;
+       }

From this point on you have to make sure that path_from_root is free'd when you leave this function. I usually use jump labels for the return instead of free'ing it in every error case. For this purpose, I would also declare an return variable (function header):
        enum cpio_error error = CPIO_SUCCESS;

+       mode_t header_mode = GET_MODE(header);
+       uint32_t header_filesize = strhex8_to_u32(header->filesize);
+       uint32_t header_namesize = strhex8_to_u32(header->namesize);

Please do variable declarations before the code block. Assign here just the values.

+
+       if ((uintptr_t)header + sizeof(struct cpio_header) > last) {
+               *header_ptr = NULL;
+               return -CPIO_MALFORMED_FILE;Instead of this return, use the 
label from now on:

                error = -CPIO_MALFORMED_FILE;
                goto out;

+       }
+       if (IS_FILE(header_mode) && header_filesize != 0) {
+               uk_pr_debug("Creating file %s...\n", path_from_root);

Suggestion to the message: "Extracting %s...\n". I would also use the info level.

+               int fd = open(path_from_root, O_CREAT | O_RDWR);
+
+               if (fd < 0) {
+                       *header_ptr = NULL;
+                       return -CPIO_FILE_CREATE_FAILED;
                        error = -CPIO_FILE_CREATE_FAILED;
                        goto out;
+               }
+               uk_pr_debug("File %s created\n", path_from_root);

I think this debug message is not needed.

+               char *data_location = (char *)ALIGN_4(
+                   (char *)(header) + sizeof(struct cpio_header)
+                   + header_namesize);
+
+               if ((uintptr_t)data_location + header_filesize > last) {
+                       *header_ptr = NULL;
+                       return -CPIO_MALFORMED_FILE;
                        error = -CPIO_MALFORMED_FAILED;
                        goto out;
+               }
+               uint32_t bytes_to_write = header_filesize;

Can you do the uint32_t declaration in the function header (this is just code style)? Alternatively, you can declare it also

+               int bytes_written = 0;
+
+               while (bytes_to_write > 0) {
+                       if ((bytes_written =
+                                write(fd, data_location + bytes_written,
+                                      bytes_to_write))
+                           < 0) {
+                               *header_ptr = NULL;
+                               return -CPIO_FILE_WRITE_FAILED;
                                error = -CPIO_FILE_WRITE_FAILED;
                                goto out;
+                       }
+                       bytes_to_write -= bytes_written;
+               }
+               if (chmod(path_from_root, header_mode & 0777) < 0)
+                       uk_pr_info("chmod on file %s failed\n", path_from_root);

Use error level insted of info. I would change the format of the message to "Failed to chmod %s\n".

+               if (close(fd) < 0) {
+                       *header_ptr = NULL;
+                       return -CPIO_FILE_CLOSE_FAILED;
                        error = -CPIO_FILE_CLOSE_FAILED;
                        goto out;
+               }
+       } else if (IS_DIR(header_mode)) {
+               if (strcmp(".", filename(header)) != 0
+                   && mkdir(path_from_root, header_mode & 0777) < 0) {

I would split this if-case so that you can also print a debug message for creating a directory (similarly as you did with the file).

+                       *header_ptr = NULL;
+                       return -CPIO_MKDIR_FAILED;
                        error = -CPIO_MKDIR_FAILED;
                        goto out;
+               }
+       }
+       free(path_from_root);

Do the free later... see above...

+       struct cpio_header *next_header = (struct cpio_header *)ALIGN_4(
+           (char *)header + sizeof(struct cpio_header) + header_namesize);

Declare next_header in the function header.

+
+       next_header = (struct cpio_header *)ALIGN_4((char *)next_header
+                                                   + header_filesize);
+       *header_ptr = next_header; > +       return CPIO_SUCCESS;

Add the jump label here instead of returning:

out:
        free(path_from_root);
        return error;

+}
+
+enum cpio_error cpio_extract(char *mount_loc, void *memory_region, size_t len)

I would call "memory_region" just "buf" and rename "mount_loc" to "dest". Maybe call "len" "buflen" instead. Use "const char *" for dest. I would name the function ukcpio_extract() to avoid possible name clashes later. See my comments above in the header file. Don't forget to update exportsyms.uk accordingly. ;-)

+{
+       enum cpio_error error = CPIO_SUCCESS;
+       struct cpio_header *header = (struct cpio_header *)(memory_region);
+       struct cpio_header **header_ptr = &header;
+       uintptr_t end = (uintptr_t)header;
+
+       if (mount_loc == NULL)
+               return -CPIO_NO_MOUNT_LOCATION;

I would rename this error to CPIO_NODEST.
+
+       while (error == CPIO_SUCCESS && header != NULL) {

A minor style thing:
        while (!error && header)
does the same and is quicker to read. ;-)

+               error = read_section(header_ptr, mount_loc, end + len);
+               header = *header_ptr;
+       }
+       return error;
+}
diff --git a/lib/cpio/exportsyms.uk b/lib/cpio/exportsyms.uk
index b0047fa..090dd8d 100644
--- a/lib/cpio/exportsyms.uk
+++ b/lib/cpio/exportsyms.uk
@@ -1 +1 @@
-None
+cpio_extract
diff --git a/lib/cpio/include/uk/cpio.h b/lib/cpio/include/uk/cpio.h
index e69de29..86bc002 100644
--- a/lib/cpio/include/uk/cpio.h
+++ b/lib/cpio/include/uk/cpio.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * Authors: Robert Hrusecky <roberth@xxxxxxxxxxxxx>
+ *          Omar Jamil <omarj2898@xxxxxxxxx>
+ *          Sachin Beldona <sachinbeldona@xxxxxxxxxx>
+ *
+ * Copyright (c) 2017, NEC Europe Ltd., NEC Corporation. All rights reserved.

Do the same copyright change in this header file.

+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the copyright holder nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.

Please remove this sentence, too.

+ */
+
+#ifndef __CPIO_H__

I would prefer to use __UK_CPIO_H__ as header guard. This is to avoid possible name clashes.

+#define __CPIO_H__
+#include <uk/plat/memory.h>
Is <uk/plat/memory.h> is not needed for this header, right? You can remove it.

Since you are adding the header also for C++ code, you should declare your definitions also as extern C:

+#ifdef __cplusplus
+extern "C" {
+#endif
+

In general, I would call cpio_error, ukcpio_error. The errors, I would prefix with UKCPIO_*. Then we should be safe of name clashes.

+enum cpio_error {
+       CPIO_SUCCESS = 0,
+       CPIO_INVALID_HEADER,

I would add a comment to CPIO_INVALID_HEADER that this includes also the case of unsupported headers.

+       CPIO_FILE_CREATE_FAILED,
+       CPIO_FILE_WRITE_FAILED,
+       CPIO_FILE_CHMOD_FAILED,
+       CPIO_FILE_CLOSE_FAILED,
+       CPIO_MKDIR_FAILED,
+       CPIO_NO_MEMREGION,
+       CPIO_MOUNT_FAILED,

I think CPIO_NO_MEMREGION, and CPIO_MOUNT_FAILED is not used, please remove these errors from here. I saw them in your changes of vfscore. However, we have there errno scope and errnos should be used there instead.

+       CPIO_MALFORMED_FILE,
+       CPIO_NOMEM,
+       CPIO_NO_MOUNT_LOCATION

CPIO_NO_MOUNT_LOCATION -> CPIO_NODEST

+};
+
+enum cpio_error cpio_extract(char *loc, void *mem, size_t len);

Please use the same funtion name and argument names as used in the function declaration. I would also call it ukcpio_extract, instead.

Btw, is there a reason why you returned negative error values? I think this is not needed since you have return type "enum cpio_error". Negative errors are often used to keep the positive number space to return something meaningful on success (e.g., number of written bytes on write()). It makes sense to have the success case as 0. This makes it easy to catch error with, for instance, if-cases:

        if (error) {}

or if there was no error:

        if (!error) {}

+#ifdef __cplusplus
+}
+#endif
+#endif /*__CPIO_H__*/


_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/minios-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.