CISS-150 Project 3 (10 points)
(Updated February 23, 2024)
Overview
This project will introduce you to the many layers of memory management. It is for the layperson and is not intended to be exhaustive.
Memory is a commodity that needs to be efficiently managed. To understand why we will begin to investigate how the C programming environment organizes its memory. Ubuntu Linux will be the basis of this project.
Learning outcomes
- Planning and design.
- Enhancing existing virtualization/OS knowledge.
- Operating in a new programming language.
- Understanding basic Linux memory management principles.
Setting Up the Linux System
Begin by installing the C build environment. Run the following:
student@student-vm:~$ sudo apt update
student@student-vm:~$ sudo apt install build-essential
Once this install is complete, you can access the C compiler. This is necessary for when we need to compile a C program in a future step.
While there are several additional packages installed during this step, the only one we need is the compiler.
Memory Layout
The operating system executes a program by allocating an amount of memory and loading the program into that memory. Linux uses the ELF (Executable and Linkable Format) file layout to define the components of a program and how they should be loaded into memory. You can use the following command to read more about ELF.
student@student-vm:~$ man elf
About 400 lines into the man page, you will see details regarding the sections (sometimes called segments). We are specifically interested in the ones noted in the diagram below.
Highest Memory Location |-----------| | argv | Command line arguments and | env | system environment variables. |-----------| | Stack | Functions and automatic variables |...........| | | | The stack "grows" toward lower memory | | | as space is allocated by pushing values | v | and subtracting space from RSP for local | | variables. | | | ^ | | | | The heap "grows" toward higher memoery | | | as space is allocated by tools like malloc. |...........| | Heap | Memory allocated at runtime (malloc) _end --> |-----------| <-- program break | .bss | Uninitialized data - globals and static. (BSS) _edata --> |-----------| | .data | Initialized data - globals and static. (DS) _etext --> |-----------| | .text | Code (TEXT, Code Segment) _init --> |-----------| Lowest Memory Location
As we will discover, programs can be executed anywhere in physical memory, so part of the design is to make sections of the program relocatable. When viewing the details of the memory of an ELF file, it is essential to realize that the addresses shown are offsets.
Even though the executable format contains many possible segments, C generally uses only a few. The compiler can add in many other segments, but the above layout is enough.
Section Name | Purpose |
---|---|
.text (Text) |
The text segment, text section, .text , code segment. It goes by many names, and this is where the bulk of the program code lives. The boundary is typically marked by _init and _etext . |
.data (Data) |
The data segment, DS, .data , etc. This also goes by many names, and this is where global and static data lives that have been pre-initialized. The boundary is marked by _edata |
.bss (BSS) |
Rarely known by names other than BSS or .bss . This is where uninitialized global and static data live. These are initialized to zero at runtime. The boundary is marked by _end . |
Stack | This is where the runtime stack lives for function call return addresses and automatic (local in function) variables. As the stack’s memory is consumed, it grows to lower memory toward the heap. |
Heap | This is where runtime memory requests are satisfied. As the heap’s memory is consumed, it grows to higher memory toward the stack. This also affects the program break (see sbrk()) which identifies the current upper bound of the heap. The heap is located after the end of the last data segment, typically .bss . The value of |
Now, the use of these sections is system-dependent. Although the documentation describes its intended use, the environment may diverge from this.
The C Code
Using the gedit
command, create the following file. (Remember that you can open this page in Ubuntu and then copy and paste it from within your VM.)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
extern void *_etext, *etext, *edata, *end;
int gd;
int igd = 42;
void mapstack(int iter) {
int msvar;
if ( iter < 6 ) {
printf("iter = %d\n&iter = %p\n&msvar = %p\n", iter, &iter, &msvar);
mapstack(iter + 1);
}
}
int main(int argc, char **argv) {
int d1;
int d2 = 35;
static int sd;
static int isd = 15;
char s[5];
char *p1;
char *p2;
printf("The address of break is %p\n", sbrk(0));
printf("The address of etext is %p\n", &etext);
printf("The address of _etext is %p\n", &_etext);
printf("The address of edata is %p\n", &edata);
printf("The address of end is %p\n\n", &end);
printf("The address of main is %p\n", main);
printf("The address of gd is %p\n", &gd);
printf("The address of igd is %p\n", &igd);
printf("The address of sd is %p\n", &sd);
printf("The address of isd is %p\n", &isd);
printf("The address of d1 is %p\n", &d1);
printf("The address of d2 is %p\n", &d2);
printf("The address of s is %p\n", &s);
p1 = malloc(20 * sizeof(char));
printf("\n\nThe address of p1 is %p\n", &p1);
printf("The address in p1 is %p\n", p1);
p2 = malloc(20 * sizeof(char));
printf("The address of p2 is %p\n", &p2);
printf("The address in p2 is %p\n", p2);
printf("The address of break is %p\n\n\n", sbrk(0));
printf("The address of argv is %p\n", argv);
if ( argc > 0 )
printf("The address of argv[0] is %p\n", argv[0]);
printf("\nCalling mapstack()...\n");
mapstack(1);
printf("\n\nThis is a detailed memory map.\n");
FILE *fd = fopen("/proc/self/maps", "r");
if (fd) {
char line[256];
while (fgets(line, sizeof(line), fd)) {
printf("%s", line);
}
fclose(fd);
}
}
Here is a brief description of the details of the program.
- Lines 7-8: Creates two global vars, one initialized, one not.
- Lines 10-17: A recursive function to map the stack growing toward lower memory.
- Lines 21-27: Local variables in
main()
, some initialized, some not. - Line 29: Show the current break.
- Lines 44-49: Allocates two pointers and shows the heap growing toward higher memory.
- Line 50: Show the current break after allocation – it should have moved.
To compile the program, use the following:
student@student-vm:~$ cc -o cmem cmem.c
Then execute the program with:
student@student-vm:~$ ./cmem
Data Collection
You will compile and run the program listed above. When complete, capture the addresses shown in the output. Then run the following, noting the details of the first three columns.
student@student-vm:~$ size cmem
text data bss dec hex filename
2479 624 16 3119 c2f cmem
Now run the following to give details of the ELF file that is your executable, with the memory offsets sorted.
student@student-vm:~$ readelf -S cmem
student@student-vm:~$ nm cmem | sort
The readelf
command will list all the sections and their offsets. The nm
command provides details on memory offset for sections and named objects that are in the program itself. These are based on some base segment values.
man nm
to understand the output.]
Capture that information as well. Perform an analysis of the variables and functions used in cmem.c
. Identify variables in the BSS, DS, Heap, and Stack. Try to map out (as best you can) where you think the boundaries are for the memory sections based on the addresses provided by the tools.
nm
command shows offsets. To determine the boundaries, you will need to do a little hexadecimal math.Put all of your output from the program, size
and nm
into a Word document. Include an analysis of your findings and use the layout given earlier.
Submit the Word document to the Learning Management System when complete.