Tuesday 25 August 2009

SQLite: A Lesson In Low-Defect Software

[EDIT: Minor edits to fix HTML formatting, style, corrected footnote credit, and code indentation the editor kills on every edit.]

While looking for some old papers and presentations on SQLite (remember my disk died a few months ago without proper backup), even though it seems the old ones are offline or lost, there's a new presentation on their experience producing excellent software.


It seems this presentation wasn't picked up properly so here's a shortened review with more detail on statement and branch testing with GCC. This is mind-blowing stuff, at least for me. Almost everything in this post comes from the paper. I'm trying to stay as close as possible to the original with only a few things added to make it a bit easier to understand at first sight and allow straight copy-paste to play with. I also skipped the less extraordinary SCM recommendations at the end.
  • Use comments on your source code to engage the other half of your brain
    • One side does "math" and the other "language".
  • Safety != Reliability. Safety: no harm; reliability: no failures.
  • Safe Software = Extremely Reliable Software.
  • What Programming Languages Does The World's Most Reliable Software Use? Avionics: ADA, C; Space Shuttle: HAL/S. Not Python, TCL/TK, .NET, LUA, Java (safe but not reliable.)
  • DO-178B and ED-12B "Software Considerations in Airborne Systems and Equipment Certification": Development process matters, not the programming language. Captures best practices.
  • [Remember from An Introduction to SQLite: What makes SQLite great is the extensive testing (60%).]
  • Use your whole brain.
  • Full coverage testing.
  • Good configuration management.
  • Don't just fix bugs, fix your process.
  • Comment: each function or procedure and major code blocks.
  • Comment: all variables and constant declarations.
  • SQLite has a code to comment ratio of 2:1.
  • Full coverage testing:
    • Automated tests that exercise all features of the program: all entry points, subroutines, all branches and conditions, all cases, all boundary values.
    • DO-178B places special emphasis on testing.
    • Statement Coverage: Tests cause every line of code to run at least once.
    • Branch Coverage: Tests cause every machine-language branch operation
      to evaluate to both TRUE and FALSE at least once.
    • (See section below on how to do test coverage with GCC.)
  • Fly what you test! (Measuring coverage validates your tests, not your product.)
  • Defensive Programming (See section below.)

Testing in SQLite

  • 99% Statement Coverage
  • 95% Branch Coverage
  • Goal: 100% branch coverage by Dec 2009
  • Striving for 100% test coverage has been [their] most effective method for finding bugs.
  • Testing in C/TCL (1M), C (2.3M), SQL logic (5.8M)
  • Crash testing.
  • I/O error and out-of-memory testing.
  • Fuzz testing.
  • Valgrind (memory: usage, profiling, leak tracking.)
  • Most bugs are found internally – before release.
  • External bugs are mostly build problems.
  • [SQLite team] doesn't ship “alpha” or “beta” releases, all SQLite releases are production-ready.
  • It is rare for users to find "wrong answers."

Test Coverage with GCC



Consider this snippet:
 1 int exampleFunction(int a, int b){
 2 
 3    int ans = 0;
 4 
 5    if ( a>b && a<2*b ){
 6       ans = a;
 7    } else {
 8       ans = b;
 9    }
10 
11    return ans;
12 }
There are 6 statements in lines 1, 3, 5, 6, 8, 11. To test this function you could run this test:
exampleFunction(1,1);

That would cover statements in lines 1, 3, 5, 8, 11, but not line 6 (because !(a>b)). The following test addresses this statement:
exampleFunction(3,2);

Now that would get all statements covered (line 6, ans = a.) But what about all possible branches? There are 4 possible branches:
  1. a>b: Taken on test 2.
  2. !(a>b): Taken on test 1.
  3. a<2*b: Taken on test 2.
  4. !(a<2*b): Never taken.
So to match branch 4 (a>b && !(a<2*b)):
exampleFunction(4,2);
VoilĂ ! All statements and branches tested.

It seems gcc has some great features to help. With -fprofile-arcs and -ftest-coverage it will generate branch coverage data, from gcc Debugging Options manual:
-fprofile-arcs
Add code so that program flow arcs are instrumented. During execution the program records how many times each branch and call is executed and how many times it is taken or returns. When the compiled program exits it saves this data to a file called aux-name.gcda for each source file. The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage).
[...]
-ftest-coverage
Produce a notes file that the gcov code-coverage utility can use to show program coverage. Each source file's note file is called aux-name.gcno. Refer to the -fprofile-arcs option above for a description of auxname and instructions on how to generate test coverage data. Coverage data will match the source files more closely, if you do not optimize.
So let's make this test with testme.c with only the first test:
int exampleFunction(int a, int b){

    int ans = 0;

    if ( a>b && a<2*b ){
        ans = a;
    } else {
        ans = b;
    }

    return ans;

}

int main() {
    exampleFunction(1,1);
    return 0;
}
Compiling and running:
$ gcc -g -fprofile-arcs -ftest-coverage testme.c -o testme && ./testme \
&& gcov -c testme.c && cat testme.c.gcov
File 'testme.c'
Lines executed:88.89% of 9
testme.c:creating 'testme.c.gcov'

-:    0:Source:testme.c
-:    0:Graph:testme.gcno
-:    0:Data:testme.gcda
-:    0:Runs:1
-:    0:Programs:1
1:    1:int exampleFunction(int a, int b){
-:    2:
1:    3:   int ans = 0;
-:    4:
1:    5:   if ( a>b && a<2*b ){
#####:    6:      ans = a;
-:    7:   } else {
1:    8:      ans = b;
-:    9:   }
-:   10:
1:   11:   return ans;
-:   12:}
-:   13:
1:   14:int main() {
1:   15:    exampleFunction(1,1);
1:   16:    return 0;
-:   17:}
The first column on the report shows how many times the source line was evaluated. The second column is the source line number. Note the ##### mark on the missed statement. Now if we add the second test:
$ gcc -g -fprofile-arcs -ftest-coverage testme.c -o testme && ./testme \
&& gcov -c testme.c && cat testme.c.gcov
File 'testme.c'
Lines executed:100.00% of 10
testme.c:creating 'testme.c.gcov'

-:    0:Source:testme.c
-:    0:Graph:testme.gcno
-:    0:Data:testme.gcda
-:    0:Runs:1
-:    0:Programs:1
2:    1:int exampleFunction(int a, int b){
-:    2:
2:    3:   int ans = 0;
-:    4:
3:    5:   if ( a>b && a<2*b ){
1:    6:      ans = a;
-:    7:   } else {
1:    8:      ans = b;
-:    9:   }
-:   10:
2:   11:   return ans;
-:   12:}
-:   13:
1:   14:int main() {
1:   15:    exampleFunction(1,1);
1:   16:    exampleFunction(3,2);
1:   17:    return 0;
-:   18:}
It now shows the statement in line 6 was taken.

Let's do the branch test with only the first test. Note the -b flag to gcov is the only command line change:

$ gcc -g -fprofile-arcs -ftest-coverage testme.c -o testme && ./testme \
&& gcov -b -c testme.c && cat testme.c.gcov
File 'testme.c'
Lines executed:88.89% of 9
Branches executed:50.00% of 4
Taken at least once:25.00% of 4
Calls executed:100.00% of 1
testme.c:creating 'testme.c.gcov'

-:    0:Source:testme.c
-:    0:Graph:testme.gcno
-:    0:Data:testme.gcda
-:    0:Runs:1
-:    0:Programs:1
function exampleFunction called 1 returned 100% blocks executed 60%
1:    1:int exampleFunction(int a, int b){
-:    2:
1:    3:   int ans = 0;
-:    4:
1:    5:   if ( a>b && a<2*b ){
branch  0 taken 0 (fallthrough)
branch  1 taken 1
branch  2 never executed
branch  3 never executed
#####:    6:      ans = a;
-:    7:   } else {
1:    8:      ans = b;
-:    9:   }
-:   10:
1:   11:   return ans;
-:   12:}
-:   13:
function main called 1 returned 100% blocks executed 100%
1:   14:int main() {
1:   15:    exampleFunction(1,1);
call    0 returned 1
1:   16:    return 0;
-:   17:}
The report says the branches taken at least once are 25%. Further in the report, after line 5: it lists the 4 branches from that line. Branch 0 (a>b) was evaluated but not taken (because a=b.) Branch 1 (!(a>b)) was taken. Since branches 2 and 3 depend on branch 0 those weren't even executed. Now let's run with the second test:
$ gcc -g -fprofile-arcs -ftest-coverage testme.c -o testme && ./testme \
&& gcov -b -c testme.c && cat testme.c.gcov
File 'testme.c'
Lines executed:100.00% of 10
Branches executed:100.00% of 4
Taken at least once:75.00% of 4
Calls executed:100.00% of 2
testme.c:creating 'testme.c.gcov'

-:    0:Source:testme.c
-:    0:Graph:testme.gcno
-:    0:Data:testme.gcda
-:    0:Runs:1
-:    0:Programs:1
function exampleFunction called 2 returned 100% blocks executed 100%
2:    1:int exampleFunction(int a, int b){
-:    2:
2:    3:   int ans = 0;
-:    4:
3:    5:   if ( a>b && a<2*b ){
branch  0 taken 1 (fallthrough)
branch  1 taken 1
branch  2 taken 1 (fallthrough)
branch  3 taken 0
1:    6:      ans = a;
-:    7:   } else {
1:    8:      ans = b;
-:    9:   }
-:   10:
2:   11:   return ans;
-:   12:}
-:   13:
function main called 1 returned 100% blocks executed 100%
1:   14:int main() {
1:   15:    exampleFunction(1,1);
call    0 returned 1
1:   16:    exampleFunction(3,2);
call    0 returned 1
1:   17:    return 0;
Here we see the only remaining branch not taken is 3 (but it was executed). Let's run with the third test:

$ gcc -g -fprofile-arcs -ftest-coverage testme.c -o testme && ./testme \
&& gcov -b -c testme.c && cat testme.c.gcov
File 'testme.c'
Lines executed:100.00% of 11
Branches executed:100.00% of 4
Taken at least once:100.00% of 4
Calls executed:100.00% of 3
testme.c:creating 'testme.c.gcov'

-:    0:Source:testme.c
-:    0:Graph:testme.gcno
-:    0:Data:testme.gcda
-:    0:Runs:1
-:    0:Programs:1
function exampleFunction called 3 returned 100% blocks executed 100%
3:    1:int exampleFunction(int a, int b){
-:    2:
3:    3:   int ans = 0;
-:    4:
4:    5:   if ( a>b && a<2*b ){
branch  0 taken 2 (fallthrough)
branch  1 taken 1
branch  2 taken 1 (fallthrough)
branch  3 taken 1
1:    6:      ans = a;
-:    7:   } else {
2:    8:      ans = b;
-:    9:   }
-:   10:
3:   11:   return ans;
-:   12:}
-:   13:
function main called 1 returned 100% blocks executed 100%
1:   14:int main() {
1:   15:    exampleFunction(1,1);
call    0 returned 1
1:   16:    exampleFunction(3,2);
call    0 returned 1
1:   17:    exampleFunction(4,2);
call    0 returned 1
1:   18:    return 0;
Great, 100% coverage now!


Defensive Programming


Input variable boundary checking would create many branches never taken on tests. Consider for example checking the value of input nBytes doesn't get over a limit 0x7fffff00:
void *sqlite3InternalMalloc(int nBytes){
    if( nBytes<=0 || nBytes>=0x7fffff00 ){
        return 0;
    }else{
        return sqlite3LowLevelMalloc(nBytes);
    }
}
That branch can't be tested. SQLite has some interesting macros NEVER() and ALWAYS():
#if defined(SQLITE_COVERAGE_TEST) 
#  define ALWAYS(X)   1 
#  define NEVER(X)    0 

#elif defined(SQLITE_DEBUG) 
#  define ALWAYS(X)   ((X)?1:sqlite3Panic()) 
#  define NEVER(X)    ((X)?sqlite3Panic():0) 

#else
#  define ALWAYS(X)   (X)   //  PASS THROUGH (What you fly)
#  define NEVER(X)    (X)   //  PASS THROUGH (What you fly)
#endif
Applied to the bounds check above:
void *sqlite3InternalMalloc(int nBytes){
    if( nBytes<=0 || NEVER(nBytes>=0x7fffff00) ){
        return 0;
    }else{
        return sqlite3LowLevelMalloc(nBytes);
    }
}

Footnote



AFAIK, testing can't get any better than this. Somebody could do a machine-checked proof, like the Cambridge seL4 team (UNSW/Sydney) did (also mind-blowing by itself, but probably overkill for now.)
 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.