2.5. 数组 - republic

在之前的 1.5.1. 数组简介中，我们介绍了静态声明的一维数组，并讨论了将数组传递给函数的语义。在 2.4. 动态内存分配的章节中，我们介绍了动态分配的一维数组并讨论了将它们传递给函数的语义。

在这一章节中, 我们会更深入地了解 C 中的数组，更详细地描述静态和动态分配的数组，并讨论二维数组。

2.5.1. 一维数组

静态分配

在进入新内容之前，我们通过一个示例简要总结静态数组。有关静态声明的一维数组的更多详细信息，请参阅1.5.1. 数组简介。

静态声明的数组分配在栈上(对于局部变量)或内存的数据段(对于全局变量, 堆上)。程序员可以通过指定数组变量的类型(存储在每个索引处的类型)及其总容量(元素数量)来声明数组变量。

当将数组传递给函数时，C 会将首地址(基址)的值复制到函数参数。也就是说，形参和实参都引用相同的内存位置 ——形参指针指向内存中实参的数组元素。因此，通过数组形参修改存储在数组中的值会修改存储在实参数组中的值。

**形参(parameter)** vs **实参(argument)**

形参(Parameter)是在函数声明中的变量。
实参(Argument)是传递给函数的变量的实际值。
What's the difference between an argument and a parameter?

以下是静态数组声明和使用的一些示例：

// declare arrays specifying their type and total capacity
float averages[30];   // array of float, 30 elements
char  name[20];       // array of char, 20 elements
int i;

// access array elements
for (i = 0; i < 10; i++) {
    averages[i] = 0.0 + i;
    name[i] = 'a' + i;
}
name[10] = '\0';    // name is being used for storing a C-style string

// prints: 3 d abcdefghij
printf("%g %c %s\n", averages[3], name[3], name);

strcpy(name, "Hello");
printf("%s\n", name);  // prints: Hello

动态分配

在本章的动态内存分配部分，我们介绍了动态分配的一维数组，包括它们的访问语法以及将动态分配的数组传递给函数的语法和语义。在这里，我们通过一个示例对该信息进行简短回顾。

调用该malloc函数会在运行时在堆上动态分配一个数组。分配的堆空间的地址可以分配给全局或局部指针变量，然后该指针变量指向数组的第一个元素(首地址)。要动态分配空间，给malloc传递为数组分配的总字节数(使用sizeof运算符获取特定类型的大小)。一次malloc调用即可在堆上分配所请求大小的连续空间块。例如：

// declare a pointer variable to point to allocated heap space
int    *p_array;
double *d_array;

// call malloc to allocate the appropriate number of bytes for the array

p_array = malloc(sizeof(int) * 50);      // allocate 50 ints
d_array = malloc(sizeof(double) * 100);  // allocate 100 doubles

// always CHECK RETURN VALUE of functions and HANDLE ERROR return values
if ( (p_array == NULL) || (d_array == NULL) ) {
    printf("ERROR: malloc failed!\n");
    exit(1);
}

// use [] notation to access array elements
for (i = 0; i < 50; i++) {
    p_array[i] = 0;
    d_array[i] = 0.0;
}

// free heap space when done using it
free(p_array);
p_array = NULL;

free(d_array);
d_array = NULL;

数组内存布局

无论数组是静态声明的还是通过单次调用动态分配的malloc，数组元素都表示连续的内存位置(地址)：

 array [0]:  base address
 array [1]:  next address
 array [2]:  next address
   ...            ...
 array [99]: last address

元素的位置i位于距i数组基地址的偏移处。第 i 个元素的确切地址取决于数组中存储的类型的字节数。例如，考虑以下数组声明：

int  iarray[6];  // an array of six ints, each of which is four bytes
char carray[4];  // an array of four chars, each of which is one byte

它们各个数组元素的地址可能如下所示：

 addr   element
 ----   -------
 1230:  iarray[0]
 1234:  iarray[1]
 1238:  iarray[2]
 1242:  iarray[3]
 1246:  iarray[4]
 1250:  iarray[5]
     ...
 1280:  carray[0]
 1281:  carray[1]
 1282:  carray[2]
 1283:  carray[3]

在此示例中，1230是iarray的基地址和1280是carray的基地址。请注意，每个数组的各个元素都分配到连续的内存地址：的每个元素iarray存储一个 4 字节int值，因此其元素地址相差 4，而的每个元素carray存储一个 1 字节char值，因此其地址相差 1。无法保证局部变量集被分配到栈上的连续内存位置(因此，iarray的结尾和carray的开头之间的地址可能存在间隙，如此例所示。)

note

定义数组的总容量时通常使用常量，而不是使用文字数值。常量是 C 文字值的别名，用于代替文字以使代码更易于阅读并更容易更新。请参阅 C Constants(常量) 以了解有关定义和使用 C 常量的更多信息。
下面是定义和使用常量 (N) 作为数组维数的示例：
#define N 20 
int main(void) { 
int array[N]; // an array of 20 ints 
int *d_arr, i; // dynamically alloc array of 20 ints 
d_arr = malloc(sizeof(int)*N); 
if(d_arr == NULL) {
exit(1); 
} 
for(i=0; i < N; i++) {
array[i] = i; 
d_arr[i] = i*2; 
} 
... 
}

2.5.2. 二维数组

C 支持多维数组，但我们将多维数组的讨论限制为二维 (2D) 数组，因为一维和二维数组是 C 程序员最常用的。

静态分配的二维数组

要静态声明多维数组变量，请指定每个维度的大小。例如：

int   matrix[50][100];
short little[10][10];

这里，matrix是一个50 行100 列int 类型的二维数组，little是一个10 行 10 列short类型的二维数组。

要访问单个元素，请指定行索引和列索引：

int   val;
short num;

val = matrix[3][7];  // get int value in row 3, column 7 of matrix
num = little[8][4];  // get short value in row 8, column 4 of little

图 1将 2D 数组展示为整数值矩阵，其中 2D 数组中的特定元素通过行和列索引值进行索引。
Accessing matrix[2][3] is like indexing into a grid at row 2 and column 3.

图 1. 用矩阵表示的二维数组。访问矩阵 [2][3] 就像在第 2 行和第 3 列处对网格进行索引。

程序通常通过嵌套循环迭代来访问二维数组的元素。例如，以下嵌套循环将matrix所有元素初始化为 0：

int i, j;

for (i = 0; i < 50; i++) {  // for each row i
    for (j = 0; j < 100; j++) { // iterate over each column element in row i
        matrix[i][j] = 0;
    }
}

二维数组参数

将一维数组参数传递给函数的相同规则也适用于传递二维数组参数：形参(parameter)获取二维数组基地址的值 (&arr[0][0])。换句话说，形参(parameter)指向实参(argument)的数组元素，因此函数可以更改存储在传递的数组中的值。

对于多维数组参数，您必须指示该参数(parameter)是多维数组，但您可以不指定第一个维度的大小(为了良好的通用设计)。必须完全指定其他维度的大小，以便编译器可以生成数组中的正确偏移量。这是一个 2D 示例：

// a C constant definition: COLS is defined to be the value 100
#define COLS  (100)

/*
 * init_matrix: initializes the passed matrix elements to the
 *              product of their index values
 *   m: a 2D array (the column dimension must be 100)
 *   rows: the number of rows in the matrix
 *   return: does not return a value
 */
void init_matrix(int m[][COLS], int rows) {
    int i, j;
    for (i = 0; i < rows; i++) {
        for (j = 0; j < COLS; j++) {
            m[i][j] = i*j;
        }
    }
}

int main(void) {
    int matrix[50][COLS];
    int bigger[90][COLS];

    init_matrix(matrix, 50);
    init_matrix(bigger, 90);
    ...

matrix和bigger数组都可以作为参数传递给 init_matrix函数，因为它们具有与参数定义相同的列维度。

[!NOTE] 必须在 2D 数组的参数定义中指定列维度，以便编译器可以计算从 2D 数组基地址到特定行元素开头的偏移量。偏移计算根据内存中二维数组的布局进行。

二维数组内存布局

静态分配的二维数组在内存中按行优先顺序排列，这意味着第 0 行的所有元素排在前面，然后是第 1 行的所有元素，依此类推。例如，给出以下二维整数数组的声明：

int arr[3][4];  // int array with 3 rows and 4 columns

它在内存中的布局可能如图2所示。

Declaring an array as "int arr[3][4]" yields three rows, each of which has four elements. Row 0 consists of arr[0][0], arr[0][1], arr[0][2], and arr[0][3]. Row 1 consists of arr[1][0], arr[1][1], etc.

图 2. 按行优先顺序排列的二维数组的布局。

请注意，所有数组元素都分配到连续的内存地址。也就是说，二维数组的基地址是[0][0] 元素 (&arr[0][0] )的内存地址，后续元素按行优先顺序连续存储(例如，整个第 1 行紧接着整个第 2 行，以此类推)。

动态分配的二维数组

动态分配的二维数组可以通过两种方式分配。对于 N x M 2D 阵列，可以：

每次malloc调用，会分配一大块堆空间来存储所有 N x M 数组元素。
多次调用malloc，分配数组的数组。首先，分配一个由 N 个 指向元素类型的指针组成的 1D 数组，并为 2D 数组中的每一行分配一个 1D 指针数组。然后，分配 N 个 大小为 M 的一维数组来存储二维数组中每行的列值集。将这 N 个数组(长度为M)中的第一个元素的地址(首地址)分配给第一个长度为N的数组。

变量声明、分配代码和数组元素访问语法根据程序员选择使用这两种方法中的哪一种而有所不同。

方法1:内存高效分配

在此方法中，一次调用allocate 即可分配存储_N_ x M_值数组malloc所需的字节总数。此方法的优点是内存效率更高，因为所有_N x M 元素的整个空间将立即分配在连续的内存位置中。

调用malloc返回分配空间的起始地址(数组的基地址)，该地址(如一维数组)应存储在指针变量中。事实上，使用此方法分配一维或二维数组之间没有语义差异：调用malloc在内存中返回堆上连续块分配的请求字节数的起始地址。因为使用此方法分配 2D 数组看起来就像分配 1D 数组一样，所以程序员必须在该单个堆内存空间块的顶部显式映射 2D 行和列索引(编译器没有行或列的隐式概念，因此无法将双索引语法解释到这个 malloc 分配的空间中）。

下面是使用方法 1 动态分配 2D 数组的 C 代码片段示例：

#define N 3
#define M 4

int main(void) {
    int *two_d_array;    // the type is a pointer to an int (the element type)

    // allocate in a single malloc of N x M int-sized elements:
    two_d_array = malloc(sizeof(int) * N * M);

    if (two_d_array == NULL) {
        printf("ERROR: malloc failed!\n");
        exit(1);
    }

    ...

图 3显示了使用此方法分配 2D 数组的示例，并说明了调用malloc.

We can allocate an array with malloc(sizeof(int) * (3*4)) and store the base address in a stack pointer variable. Because malloc returns a contiguous chunk of memory, we can treat the memory as a collection of rows and columns in row-major order like a statically allocated array.

图 3. 通过一次调用 malloc 分配 2D 数组的结果。

与一维动态分配数组一样，二维数组的指针变量也是在堆上分配的。然后为该指针分配调用malloc返回的值，该值表示堆内存中 N x M 个 int类型的连续块的基地址。

由于此方法为 2D 数组使用单个 malloc 空间，因此内存分配尽可能高效(malloc整个 2D 数组只需要一次调用)。这是访问内存的更有效方法，因为所有元素都位于连续内存中，每次访问仅需要来自指针变量的单级间接访问。

但是，C 编译器不知道使用此方法进行 2D 或 1D 数组分配之间的区别。因此，使用此方法分配 2D 数组时，不能使用静态声明的 2D 数组的双索引语法([i][j]) 。相反，程序员必须使用行和列索引值的函数([i*M + j] M其中是列维度)显式计算到堆内存的连续块中的偏移量。

下面是程序员如何构造代码来初始化 2D 数组的所有元素的示例：

// access using [] notation:
//   cannot use [i][j] syntax because the compiler has no idea where the
//   next row starts within this chunk of heap space, so the programmer
//   must explicitly add a function of row and column index values
//   (i*M+j) to map their 2D view of the space into the 1D chunk of memory
for (i = 0; i < N; i++) {
    for (j = 0; j < M; j++) {
        two_d_array[i*M + j] = 0;
    }
}

单个malloc和函数参数

通过单个malloc分配的int类型数组的基址是一个指向int的指针，因此可以将其传递给带有(int *)参数的函数。此外，该函数必须传递行和列维度，以便它可以正确计算二维数组的偏移量。例如：

/*
 * initialize all elements in a 2D array to 0
 *  arr: the array
 *  rows: number of rows
 *  cols: number of columns
 */
void init2D(int *arr, int rows, int cols) {
    int i, j;
    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++) {
            arr[i*cols + j] = 0;
        }
    }
}

int main(void) {
    int *array;
    array = malloc(sizeof(int) * N * M);
    if (array != NULL) {
        init2D(array, N, M);
    }
    ...

方法 2:程序员友好的方法

动态分配 2D 数组的第二种方法将数组存储为 N 个 1D 数组的数组(每行一个 1D 数组)。它需要 N+1 次 调用 malloc：一次malloc用于行数组的数组，一次malloc用于 N 行列数组中的每一个。因此，行内的 元素位置是连续的，但元素在 2D 数组的行之间不连续。分配和元素访问不如方法 1 高效，并且变量的类型定义可能有点混乱。但是，使用此方法，程序员可以使用双索引语法来访问 2D 数组的各个元素(第一个索引是行数组的索引，第二个索引是该行内列元素数组的索引) 。

以下是使用方法 2 分配 2D 数组的示例(为了便于阅读，删除了错误检测和处理代码)：

// the 2D array variable is declared to be `int **` (a pointer to an int *)
// a dynamically allocated array of dynamically allocated int arrays
// (a pointer to pointers to ints)
int **two_d_array;
int i;

// allocate an array of N pointers to ints
// malloc returns the address of this array (a pointer to (int *)'s)
two_d_array = malloc(sizeof(int *) * N);

// for each row, malloc space for its column elements and add it to
// the array of arrays
for (i = 0; i < N; i++) {
// malloc space for row i's M column elements
    two_d_array[i] = malloc(sizeof(int) * M);
}

在此示例中，请注意传递给malloc的调用的变量类型和大小。为了引用动态分配的二维数组，程序员声明一个int **类型的变量(two_d_array)，该变量将存储动态分配的元素值int *数组的地址。two_d_array中的每个元素存储动态分配 int类型的数组的地址(two_d_array[i]的类型是int *)。

图 4显示了上述示例对进行 N+1 次 调用malloc后内存的情况。

two_d_array is a stack variable that points to a dynamically allocated array of pointers. Each of those pointers points to a 1D array of integers.

图 4. 使用 N+1 malloc 调用分配 2D 数组后的内存排列。

请注意，使用此方法时，只有作为单个调用的一部分分配的元素malloc在内存中是连续的。也就是说，每行内的元素是连续的，但不同行(甚至相邻行)的元素不是连续的。

分配后，可以使用双索引表示法访问二维数组的各个元素。第一个索引指定外部指针数组中的元素int *（哪一行），第二个索引指定内部数组中的元素int(行中的哪一列)。

int i, j;

for (i = 0; i < N; i++) {
    for (j = 0; j < M; j++) {
        two_d_array[i][j] = 0;
    }
}

要了解双索引的计算方式，请考虑表达式以下部分的类型和值：

   two_d_array: an array of int pointers, it stores the base address of an
             array of (int *) values. Its type is int** (a pointer to int *).

two_d_array[i]: the ith index into the array of arrays, it stores an (int *)
             value that represents the base address of an array of (int)
             values.  Its type is int*.

 two_d_array[i][j]: the jth element pointed to by the ith element of the array of
                 arrays, it stores an int value (the value in row i, column j
                 of the 2D array).  Its type is int.

数组的数组和函数参数

数组实参的类型是int **（指向 int 类型指针的指针)，并且函数形参与其参数(数组实参)的类型匹配。此外，行和列的大小应传递给函数。因为这是与方法 1 不同的类型，所以两种数组类型不能使用公共函数(它们不是相同的 C 类型)。

下面是一个示例函数，它采用方法 2(数组的数组)二维数组作为参数：

/*
 * initialize a 2D array
 * arr: the array
 * rows: number of rows
 * cols: number of columns
 */
void init2D_Method2(int **arr, int rows, int cols) {
    int i,j;

    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++) {
            arr[i][j] = 0;
        }
    }
}

/*
 * main: example of calling init2D_Method2
 */
int main(void) {
    int **two_d_array;

    // some code to allocate the row array and multiple col arrays
    // ...

    init2D_Method2(two_d_array, N, M);
    ...

这里，函数实现可以使用双索引语法。与静态声明的二维数组不同，行和列的维度都需要作为参数传递：参数rows指定最外层数组(行数组的数组)的边界，参数cols指定内部数组(数组列中每一行的值)。