Summary of command line parameter handling in C/C++
Some time ago, when I was going through the Linux kernel code, I came across the way the kernel handles module parameters (moduleparam), and I found it quite ingenious. It made me curious about how command-line arguments in C could be better handled. The code used in this article can be found here aparsing. The code can be compiled and run on Windows, Linux, and Mac OS X. Detailed compilation instructions can be found in the README.md file.
getenv
The standard library provides us with a function getenv
. Literally, this function is used to retrieve environment variables. So as long as we set the required environment variables in advance and retrieve them in the program, we indirectly pass the parameters to the program. Let's take a look at the following code:
The getenv
function is declared on line 4. It takes the name of the variable you want to retrieve as input and returns the value of that variable. If the variable is not found, it returns 0. Lines 10 and 15 are used to retrieve the values of two environment variables respectively. If the variables are valid, their values are printed. It is worth noting that getenv
always returns a string, so the user needs to manually convert it to the desired data type. Therefore, it may not be very convenient to use. Compile and run:
Under Windows:
Under Linux:
Output:
getopt
Linux provides us with a set of functions getopt, getopt_long, getopt_long_only
to handle command-line arguments. The declarations of these three functions are as follows:
getopt
can only handle short options (i.e., single character options), while getopt_long
and getopt_long_only
can handle long options. For a detailed explanation of the functions, you can refer to the manual in Linux. Now, let's illustrate the usage of getopt
and getopt_long
through examples.
It should be noted that these functions are not available on Windows, so I found a source code that can be compiled on Windows and made some minor modifications. The code can be found here.
Let's focus on analyzing the usage of getopt_long
. The first three parameters of getopt_long
are the same as getopt
: the number of command-line arguments argc
, the array of command-line arguments argv
, and the exact format of short parameters optstring
. The format of optstring
consists of individual short parameter characters, followed by a colon :
to indicate a parameter is required, and two colons ::
to indicate an optional parameter. For example, in line 19, the declaration of the short parameters is as follows: the b
parameter does not take any extra arguments, the a
parameter requires an extra argument, and the c
parameter takes an optional argument.
The last two arguments of getopt_long
are used to handle long options, and the structure of option
is:
struct option {
const char *name; // Long parameter name
int has_arg; // Whether it takes an additional argument
int *flag; // Specify how to return the function call result
int val; // The returned value
};
name
can still be set to a single character length.
has_arg
can be set to no_argument, required_argument, optional_argument
, which respectively indicate no argument, required argument, optional argument.
flag
and val
are used together. If flag = NULL
, getopt_long
will directly return val
. Otherwise, if flag
is a valid pointer, getopt_long
will perform an operation similar to *flag = val
, which sets the variable pointed to by flag
to the value of val
.
If getopt_long
finds a match for a short option, it will return the character value of that option. If it finds a match for a long option, it will return val
(flag = NULL
) or return 0
(flag != NULL; *flag = val;
); if it encounters a non-option character, it will return ?
; if all options have been processed, it will return -1
.
By utilizing the feature of return values, we can achieve the same effect of having the same meaning for long options and short options. For example, the first parameter of long_options
is set to add
with val
value as the character 'a'
. In this case, when checking the return value, both --add
and -a
will enter the same processing branch and be treated as having the same meaning.
The last piece of the jigsaw puzzle is the usage of optind
and optarg
. optind
represents the position of the next argument to be processed in argv
, while optarg
points to the additional argument string.
Compile and run the code:
$ .\getopt_test -a 1 -b -c4 --add 2 --verbose --verbose=3 -123 -e --e
option a with value '1'
option b
option c with value '4'
option a with value '2'
option verbose
option verbose with arg 3
option 1
option 2
option 3
.\getopt_test: invalid option -- e
.\getopt_test: unrecognized option `--e'
The meanings of -a
and --add
are the same. For short parameters, the optional parameter follows immediately after, for example -c4
. For long parameters, the optional parameter needs to be specified with an equal sign, for example --verbose=3
.
mobuleparam
Okay, finally we have come to the method that triggered this article in the first place. The Linux kernel uses a clever technique to pass parameters to kernel modules, and that technique is called moduleparam
. Here, I'll explain briefly how moduleparam
works in the Linux kernel, but for a more detailed explanation, you can refer to the code. Although I have borrowed some handling techniques from moduleparam
, there are some differences between my method, which I'll refer to as small moduleparam
, and the one used in the Linux kernel, which will still be called moduleparam
.
First, let's take a look at the usage of moduleparam
, which is declared inside the module:
Then enter parameters when loading the module:
The variable enable_debug
has been correctly set to 1
, making it convenient to use. There is also very little additional code that needs to be added, and the code can be written in a concise and elegant manner. There is no need to write multiple loop conditions like getenv
and getopt
, and it also includes built-in type conversion. So, when I see it, I think it would be even better if we can use this method to handle command-line arguments.
Now let's take a look at the core implementation of moduleparam
:
module_param
is a macro that actually establishes a structure called kernel_param
that reflects the incoming variable. This structure, which contains enough information to access and set the variable, is stored in a section called __param
, as shown in lines 20-24. Once the structure is saved, the kernel will locate the position and number of structures in the elf file's section __param
when loading the module. The parameters are then individually set based on their names and param_set_fn
. The method of locating a specific named section is platform-specific, and the Linux kernel implements it by processing the elf file. In Linux, the readelf
command can be used to view information about elf files. If interested, you can refer to the readelf
help information.
The above mentioned method of the Linux kernel is platform-specific, but I am looking for a platform-independent way to handle parameters. So we need to make some changes to the original moduleparam
approach by removing the __section__ ("__param")
declaration, as we don't want to read the section
of the elf file in a complicated way. Let's take a look at the modified usage below:
To preserve the structure of each reflection, I have added a macro init_module_param(num)
to declare the space for storing the structure. num
is the number of parameters, and if the actual number of parameters declared exceeds num
, the program will trigger an assertion error. The declaration of module_param
is slightly different from the original, removing the last parameter that represents access permissions, so no access control is done. In addition, a macro module_param_bool
has been added to handle variables that represent bool
. This is not necessary in the Linux version, as it uses the gcc built-in function __builtin_types_compatible_p
to determine the variable type. Unfortunately, MSVC does not have this function, so I had to remove this functionality and add a macro instead. module_param_array
and module_param_string
are used to handle arrays and strings, which were already present in the original version.
After declaring the parameters, it's time to process the incoming arguments. Use the macro parse_params
and pass in argc, argv
. The third parameter is a pointer to the callback function for handling unknown parameters. You can pass NULL
, which will interrupt the processing of positional arguments and return an error code.
Compile and run code:
.\moduleparam_test.exe error=0 test=101 btest=1 latest=1,2,3 strtest=\"Hello World!\"
Parsing ARGS: error=0 test=101 btest=1 latest=1,2,3 strtest="Hello World!"
find unknown param: error
test = 101
btest = Y
latest = 1,2,3
strtest = Hello World!
You can see that numerical values, arrays, and strings can all be read and converted correctly. If there are parameters that cannot be converted, an error code will be returned and relevant information will be printed. We can easily add a few lines of code to handle parameter reading and conversion, which makes it very elegant to use. For more detailed implementation, you can directly refer to the code here.
Summary
This time we summarized three methods for handling command line arguments in C/C++: getenv
, getopt
, and moduleparam
. Each method has its own characteristics, and you can choose the appropriate method based on the actual needs in the future.
getenv
is a native multi-platform function that can be used directly. However, it is too primitive and utilizes environment variables, which can cause some pollution to the environment. It is recommended to clear unnecessary environment variables before each use to avoid contamination from previous settings.
getopt
is natively supported on Linux platforms, but not on Windows. Therefore, it requires code implementation to achieve cross-platform usage. The parameter passing follows the standard command line argument format of Linux and supports optional parameters. However, it can be slightly cumbersome to use, often requiring loops and conditional statements to handle different parameters, and it is not particularly friendly to numerical types of parameters.
moduleparam
is a command line argument processing tool inspired by themoduleparam
implementation in the Linux kernel. It supports cross-platform usage and is easy to use. It can perform type conversion on different types of parameters. The drawback is that each parameter requires a corresponding variable for storage.
Original: https://wiki.disenone.site/en
This post is protected by CC BY-NC-SA 4.0 agreement, should be reproduced with attribution.
Visitors. Total Visits. Page Visits.
This post is translated using ChatGPT, please feedback if any omissions.