summaryrefslogtreecommitdiff
path: root/docs/keep_data_small.txt
blob: fcd8df4a93877ed1c9d0dde4d535bd4964f741bd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
		Keeping data small

When many applets are compiled into busybox, all rw data and
bss for each applet are concatenated. Including those from libc,
if static bbox is built. When bbox is started, _all_ this data
is allocated, not just that one part for selected applet.

What "allocated" exactly means, depends on arch.
On nommu it's probably bites the most, actually using real
RAM for rwdata and bss. On i386, bss is lazily allocated
by COWed zero pages. Not sure about rwdata - also COW?

In order to keep bbox NOMMU and small-mem systems friendly
we should avoid large global data in our applets, and should
minimize usage of libc functions which implicitly use
such structures in libc.

Small experiment measures "parasitic" bbox memory consumption.
Here we start 1000 "busybox sleep 10" in parallel.
bbox binary is practically allyesconfig static one,
built against uclibc:

bash-3.2# nmeter '%t %c %b %m %p %[pn]'
23:17:28 ..........    0    0 168M    0  147
23:17:29 ..........    0    0 168M    0  147
23:17:30 U.........    0    0 168M    1  147
23:17:31 SU........    0 188k 181M  244  391
23:17:32 SSSSUUU...    0    0 223M  757 1147
23:17:33 UUU.......    0    0 223M    0 1147
23:17:34 U.........    0    0 223M    1 1147
23:17:35 ..........    0    0 223M    0 1147
23:17:36 ..........    0    0 223M    0 1147
23:17:37 S.........    0    0 223M    0 1147
23:17:38 ..........    0    0 223M    1 1147
23:17:39 ..........    0    0 223M    0 1147
23:17:40 ..........    0    0 223M    0 1147
23:17:41 ..........    0    0 210M    0  906
23:17:42 ..........    0    0 168M    1  147
23:17:43 ..........    0    0 168M    0  147

This requires 55M of memory. Thus 1 trivial busybox applet
takes 55k of memory.


		Example 1

One example how to reduce global data usage is in
archival/libunarchive/decompress_unzip.c:

/* This is somewhat complex-looking arrangement, but it allows
 * to place decompressor state either in bss or in
 * malloc'ed space simply by changing #defines below.
 * Sizes on i386:
 * text    data     bss     dec     hex
 * 5256       0     108    5364    14f4 - bss
 * 4915       0       0    4915    1333 - malloc
 */
#define STATE_IN_BSS 0
#define STATE_IN_MALLOC 1

(see the rest of the file to get the idea)

This example completely eliminates globals in that module.
Required memory is allocated in inflate_gunzip() [its main module]
and then passed down to all subroutines which need to access 'globals'
as a parameter.


		Example 2

In case you don't want to pass this additional parameter everywhere,
take a look at archival/gzip.c. Here all global data is replaced by
single global pointer (ptr_to_globals) to allocated storage.

In order to not duplicate ptr_to_globals in every applet, you can
reuse single common one. It is defined in libbb/messages.c
as struct globals *const ptr_to_globals, but the struct globals is
NOT defined in libbb.h. You first define your own struct:

struct globals { int a; char buf[1000]; };

and then declare that ptr_to_globals is a pointer to it:

#define G (*ptr_to_globals)

ptr_to_globals is declared as constant pointer.
This helps gcc understand that it won't change, resulting in noticeably
smaller code. In order to assign it, use PTR_TO_GLOBALS macro:

	PTR_TO_GLOBALS = xzalloc(sizeof(G));

Typically it is done in <applet>_main().

Now you can reference "globals" by G.a, G.buf and so on, in any function.


		bb_common_bufsiz1

There is one big common buffer in bss - bb_common_bufsiz1. It is a much
earlier mechanism to reduce bss usage. Each applet can use it for
its needs. Library functions are prohibited from using it.

'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:

#define G (*(struct globals*)&bb_common_bufsiz1)

Be careful, though, and use it only if
sizeof(struct globals) <= sizeof(bb_common_bufsiz1).


		Drawbacks

You have to initialize it by hand. xzalloc() can be helpful in clearing
allocated storage to 0, but anything more must be done by hand.

All global variables are prefixed by 'G.' now. If this makes code
less readable, use #defines:

#define dev_fd (G.dev_fd)
#define sector (G.sector)


		Word of caution

If applet doesn't use much of global data, converting it to use
one of above methods is not worth the resulting code obfuscation.
If you have less than ~300 bytes of global data - don't bother.


		gcc's data alignment problem

The following attribute added in vi.c:

static int tabstop;
static struct termios term_orig __attribute__ ((aligned (4)));
static struct termios term_vi __attribute__ ((aligned (4)));

reduced bss size by 32 bytes, because gcc sometimes aligns structures to
ridiculously large values. asm output diff for above example:

 tabstop:
        .zero   4
        .section        .bss.term_orig,"aw",@nobits
-       .align 32
+       .align 4
        .type   term_orig, @object
        .size   term_orig, 60
 term_orig:
        .zero   60
        .section        .bss.term_vi,"aw",@nobits
-       .align 32
+       .align 4
        .type   term_vi, @object
        .size   term_vi, 60

gcc doesn't seem to have options for altering this behaviour.