C/C++ 一次栈区使用大数组引发的段错误

  最近在写一个功能的单元测试。写完第一个子模块相关的测试代码后运行没有问题,当写完第二个子模块的测试代码后运行直接Segmentation fault崩溃了。gdb 调试core dump文件发现在一行打印语句处崩溃的,我想难不成非法访问内存了。于是又重新检查了相关代码,发现用到的变量都是局部变量,而且都是静态开辟,没理由会Segmentation fault才对。于是想着使用valgrind 工具检查下程序内存,结果一下就定位到错误了,原来是在main函数内定义了一个大数组,造成栈溢出。由于没有在定义数组那行崩溃,而是在间隔很远的打印语句那行崩溃的,所以最开始没有排查出来。

该图为C/C++ 程序的内存布局,这里简单讲解下里面的栈区。栈区存储函数调用相关数据,由编译器分配和管理。栈区是向下增长的。

  • 局部变量:在函数内部定义的非静态局部变量
  • 函数参数:传递给函数的参数。
  • 调用上下文:函数调用时的返回地址、寄存器状态等信息。

linux中,栈区大小默认为8M,可以通过ulimit -s命令查看或设置。

下面是3个几乎相同的代码,想一想它们运行起来会崩溃吗

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// 例1
#include <iostream>

int main()
{
char buf[1024*1024*100];

return 0;
}


// 例2
#include <iostream>

int main()
{
std::cout << "Hello World" << std::endl;
char buf[1024*1024*100];

return 0;
}


// 例3
#include <iostream>

int main()
{
char buf[1024*1024*100];
std::cout << "Hello World" << std::endl;

return 0;
}

gcc 4.8.5 中例1正常退出,例2与例3都Segmentation fault

对例2使用valgrind 内存检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[user1@develop tmp]$ valgrind --tool=memcheck ./a.out 
==23183== Memcheck, a memory error detector
==23183== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==23183== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==23183== Command: ./a.out
==23183==
==23183== Warning: client switching stacks? SP change: 0x1fff000220 --> 0x1ff8c00218
==23183== to suppress, use: --max-stackframe=104857608 or greater
==23183== Invalid write of size 8
==23183== at 0x400885: main (test.cpp:5)
==23183== Address 0x1ff8c00218 is on thread 1's stack
==23183==
==23183==
==23183== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==23183== Access not within mapped region at address 0x1FF8C00218
==23183== at 0x400885: main (test.cpp:5)
==23183== If you believe this happened as a result of a stack
==23183== overflow in your program's main thread (unlikely but
==23183== possible), you can try to increase the size of the
==23183== main thread stack using the --main-stacksize= flag.
==23183== The main thread stack size used in this run was 8388608.
==23183== Invalid write of size 8
==23183== at 0x4A24710: _vgnU_freeres (vg_preloaded.c:59)
==23183== Address 0x1ff8c00210 is on thread 1's stack
==23183==
==23183==
==23183== Process terminating with default action of signal 11 (SIGSEGV)
==23183== Access not within mapped region at address 0x1FF8C00210
==23183== at 0x4A24710: _vgnU_freeres (vg_preloaded.c:59)
==23183== If you believe this happened as a result of a stack
==23183== overflow in your program's main thread (unlikely but
==23183== possible), you can try to increase the size of the
==23183== main thread stack using the --main-stacksize= flag.
==23183== The main thread stack size used in this run was 8388608.
==23183==
==23183== HEAP SUMMARY:
==23183== in use at exit: 0 bytes in 0 blocks
==23183== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==23183==
==23183== All heap blocks were freed -- no leaks are possible
==23183==
==23183== For lists of detected and suppressed errors, rerun with: -s
==23183== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault

对例3使用valgrind 内存检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[user1@develop tmp]$ valgrind --tool=memcheck ./a.out 
==23334== Memcheck, a memory error detector
==23334== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==23334== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==23334== Command: ./a.out
==23334==
==23334== Warning: client switching stacks? SP change: 0x1fff000220 --> 0x1ff8c00218
==23334== to suppress, use: --max-stackframe=104857608 or greater
==23334== Invalid write of size 8
==23334== at 0x400885: main (test.cpp:6)
==23334== Address 0x1ff8c00218 is on thread 1's stack
==23334==
==23334==
==23334== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==23334== Access not within mapped region at address 0x1FF8C00218
==23334== at 0x400885: main (test.cpp:6)
==23334== If you believe this happened as a result of a stack
==23334== overflow in your program's main thread (unlikely but
==23334== possible), you can try to increase the size of the
==23334== main thread stack using the --main-stacksize= flag.
==23334== The main thread stack size used in this run was 8388608.
==23334== Invalid write of size 8
==23334== at 0x4A24710: _vgnU_freeres (vg_preloaded.c:59)
==23334== Address 0x1ff8c00210 is on thread 1's stack
==23334==
==23334==
==23334== Process terminating with default action of signal 11 (SIGSEGV)
==23334== Access not within mapped region at address 0x1FF8C00210
==23334== at 0x4A24710: _vgnU_freeres (vg_preloaded.c:59)
==23334== If you believe this happened as a result of a stack
==23334== overflow in your program's main thread (unlikely but
==23334== possible), you can try to increase the size of the
==23334== main thread stack using the --main-stacksize= flag.
==23334== The main thread stack size used in this run was 8388608.
==23334==
==23334== HEAP SUMMARY:
==23334== in use at exit: 0 bytes in 0 blocks
==23334== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==23334==
==23334== All heap blocks were freed -- no leaks are possible
==23334==
==23334== For lists of detected and suppressed errors, rerun with: -s
==23334== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault

分析

1
2
==23334== Warning: client switching stacks?  SP change: 0x1fff000220 --> 0x1ff8c00218
==23334== to suppress, use: --max-stackframe=104857608 or greater

这里 Valgrind检测到SP (栈顶寄存器)的值有了非常大的跳跃。0x1fff000220 是进入main函数之前的栈顶指针值,即*__libc_start_main* 函数(GCC 中是由它调用我们写的main函数)。而0x1ff8c00218是编译器尝试为 main函数内相关局部变量分配空间之后的栈顶指针位置。远超过默认的8M栈大小。当我们执行到打印语句时,访问了0x1ff8c00218这个地址,这是一个非法的、未映射的地址。它已经远远超出了操作系统为栈区分配的8MB空间边界。所以非法访问内存,操作系统向程序发送SIGSEGV导致崩溃。

说明:当使用clang 17.0.0 测试时,例1也会段错误。

结论:以前只注意到函数递归过深很容易造成栈溢出,而忽略在栈区分配大数组场景。如果需要一个大数组还是动态申请内存或直接使用std::vector方式更靠谱。