mupuf.org // we are octopimupuf.org

A Return Into the World of Static Analysis With Frama-C

Frama-C is a static analysis tool that does not just match “dangerous” function names or code patterns like RATS, and that does more than Splint’s memory management, control flow checks and reachability analysis. Frama-C uses abstract interpretation to analyse the potential values of variables and detect a whole other bunch of bugs in programs. It also provides a specification language to write assertions or pre-conditions on functions and prove that these assumptions hold. Frama-C is designed for correctness: it will report false positives (for instance fail to validate an assertion on the return value of a function) but never true negatives. It focuses on showing the absence of bugs, by proving assertions respect pre-conditions. This has applications in evaluating the safety of critical systems.

What interests us here is the combination of value analysis and slicing, as the slicing lab with my language-based security students this year was a bit… light! In my defence, I didn’t expect them to actually do their homework! We’ll work through combining value analysis and slicing on code samples, starting up with more basic aspects of Frama-C. This post is in its vast majority inspired from the contents of the Frama-C documentation. In particular, many code samples are taken or derived from the Value analysis documentation.

Update: I’ve received interesting feedback on this article from Julien Signoles, one of the many talented people behind Frama-C. I’ve amended/clarified some of the things I discuss in the post, mostly changing ambiguous vocabulary I used to avoid confusions. Julien also explained in more details some aspects of Frama-C which I had forgotten, and so I’ll try to inject his own wisdom into the original article. Thanks Julien!

This post refers to Frama-C Oxygen on a Linux system. Your mileage may vary slightly from version to version.

Finding undefined functions

You’ll sometimes need to spot undefined functions, or will sometimes want to see how the temporary code generated by Frama-C looks like. When using the app in GUI mode, you’ll always have this information on the middle section of the interface. Otherwise, you may use the following command:

    $ frama-c -metrics <source files>

You can then include your headers by using the -cpp-extra-args CLI argument, although Frama-C recommends extracting the header signatures or even the source code you need to include and putting it into a separate file, to limit the size of the imported code and hence reduce the amount of unnecessary noise in the analyser’s output. Missing functions can also be replaced by instrumentations built in Frama-C, as will be explained later.

From now on, we’ll be referring to a tarball of source files that can be downloaded here. The code in these files is by no means meant to make any sense, it’s barely here for playing and demonstration purposes.

Small notes on the GUI

A few things to note before starting:

  • frama-c launches the CLI client and frama-c-gui launches the GUI one
  • Frama-C’s GUI is frama-c-gui, it contains many options but at times is a bit frustrating
  • With the GUI, make sure to close all files from previous projects before opening a new one, and to include all deps
  • The -main option must sometimes be set (for loop.c, see below), it’s at the top of the Analyzer section of the analysis options
  • Modifying code must be done outside of Frama-C, then you must refresh the analysis. Be careful, it will re-run value analysis which can be slow when the -slevel is high!
  • The console tab will give you most of the global information including the output of value analysis for all variables
  • The information tab will give you local information based on what’s highlighted by the mouse (keyboard navigation does not work, don’t bother): lvals return information on value analysis before and after assignments
  • Right-clicking on lvals and rvals will take you a very long way! This is how you select what to slice, what impact analyses to run, where a variable occurs, etc.

Here is an example of what you will see in the information tab when clicking on loop.c’s line 3 (n = read();):

    Function: loop
    Statement: 1 (line 3 in /home/steve/Teaching/GS10/1314/Labs/3 - Dependencies and Slicing/loop.c)
    Variable n has type "int".
    It is a local variable.
    It is referenced and its address is not taken.
    Before statement:
    n ∈ UNINITIALIZED
    After statement:
        n ∈ [--..--]

Splint and Frama-C: not quite the same thing

Unreachable code

In somecode.c, there are two pieces of unreachable code. First, the userAuthChallenge() function has been “disabled” and systematically returns 1 – the rest of the code in it is ignored. Splint catches this because when it builds the control flow graph (CFG later on), return 1; is equivalent to the EXIT state of the CFG, and so the other lines of code of this function are not included into any CFG block.

However Splint does not detect the second unreachable block. The for loop starting at line 59 in someDataProcessingFunc() never gets executed because the conditional check is set to false in that loop. Splint does not try to verify/compute when conditions are true, it analyses the whole source and will go through all branches of a conditional statement. Because Frama-C can do value analysis, it can figure out when a conditional check will return false (things get a bit harder when the conditional expression is in a loop, hence the -slevel argument that allows to unroll loops and reduce the amount of uncertainty in the analysis).

Out-Of-Bounds memory accesses

How are out-of-bounds reads or writes detected? Splint keeps a record of variables’ types, and it knows the size of statically allocated arrays. It is able to detect out-of-bounds accesses on such variables, but only when the value used to index an array is static. Indeed, Splint cannot process pointer arithmetics leading to invalid memory accesses, or array indexes that contain variables (neither in the case of the loop in oob.c or the variable v in oob-constant-index.c). Splint, in strict mode, will argue about possible out-of-bounds accesses but will never have certainty about them.

Value analysis can also solve such problems. Let’s now type:

    $ frama-c -val oob.c
    $ frama-c -val oob-constant-index.c
    $ frama-c -val oob-fixed.c

If you look at the Frama-C output on oob.c and oob-constant-index.c on one hand and oob-fixed.c on the other hand, you will see that the value analysis will return computed ranges only on the last case and that out-of-bounds writes are properly detected. When there is a bug (or what could be an infinite loop) in a function, the value analysis plugin will indicate that the function does not terminate and return infinite value ranges for all subsequent variables.

This problem also occurs in somecode.c with my simulated crash! This is the file I came up with to keep the lab going last week, when the students had nothing to do because they did their homework! The reason why I couldn’t slice this code was actually because I introduced this simulated crash (without proper annotations telling Frama-C to ignore it), and the out-of-bounds access prevented the value analysis from running (-val is a dependency of the slicing plugin of Frama-C) (Julien confirmed that the value analysis did work but I cannot slice unreachable code or expect it to appear in a slice – though my issue was that I couldn’t enable slicing). I assume at this point I was clicking in the wrong place! Note to self: I need to modify this somecode.c file.

Value analysis

It’s time to get get into the heart of what Frama-C does. The basic idea behind its value analysis plugin (called Value) is that in many simple cases, bounds for the values a variable can take can be easily calculated from the bounds of the other variables assigned to that variable. For instance, if a variable x is assigned 2 and another variable y is assigned a runtime-chosen value between 3 and 6, then the variable z = x+y; must take its values within the interval [5 .. 8]. Value can compute such value ranges easily and correctly (i.e., containing at least all the correct values within the proposed range), for simple forms of dynamically allocated and for statically allocated variables (for code that is sequential and contains no infinite loops). Loops can be managed either by returning over-estimations of the actual value ranges or by unrolling loop iterations to refine the computations.

In our language-based security lab, students are asked to perform backward slices of the two printf() statements in the loop.c file. We’ll see later that slicing does work on this file with Frama-C (as the exact value of n does not matter for the computation of the CFG and PDG), however value analysis does not yield any insights on the final values of x and y as it depends on the runtime value observed for n = read();.

Value analysis with the -slevel option

The -slevel option does two things, as quoted directly from the Frama-C documentation: “it makes the analyzer unroll loops, and it makes it propagate separately the states that come from the then and else branches of a conditional statement”. I forgot long ago the internals of how branch states are separated and then re-merged, and I recall another bunch of options are relevant when it comes to performance-precision trade-offs on assertion proofs, but for the purpose of this post, slevel is all one needs to be aware of.

Let’s now try a value analysis of the file loop-fixed-val.c, with slevel set to 0. Notice the values of x and y inside the while loops and on the printf functions: they’re not properly calculated! Value does not know the exact values and so it returns infinite ranges… Yet it gets i and j right… Strange! If Value makes the assumption (or rather, somewhat demonstrates that confirmed by Julien that it is an assumption) this piece of code is going to finish, then the values of i and j at the end of the loop must be strictly below 0, and it somehow guesses that 0 is the only possible value, most likely from the fact that these get decremented by 1 just before said loops are exited. I am not sure about the very precise reasoning that allows Value to conclude i and j contain 0, but I think it is along those lines.

As for our x and y variables, the loops must be unrolled enough times for them to be computed precisely. Intuitively, there are 24 main loop iterations in this program, and the second loop iterates i - 1 times per first loop iteration. I recall from my own learning of Frama-C that the relationship between slevel and which variables get calculated is somewhat tricky! If we re-run the analysis with any slevel below 298, neither value will be computed. Indeed, there will be precisely (23*24)/2 second loop iterations and 24 first loop iterations, which gives us a total of 300 iterations. An analysis run with slevel = 0 corresponds to one loop being unrolled once, hence slevel = 298 would give us exactly 299 loops being unrolled. Interestingly, this parameter gives me the following final values: y ∈ {276} and x ∈ [--..--]. Switching to slevel = 299 now gives me x ∈ {24} (since the last iteration of the first loop is unrolled).

Generally speaking, it’s ok to overshoot a little and it’s recommended by the Frama-C developers to increment slevel slowly until reaching the desired level of accuracy or until the analysis takes longer than acceptable, whichever comes first.

Value analysis on complicated lines of code

If you now look at ptr.c, you’ll see that it mixes assignments, increments and dereferencing of pointers. What’s going on with this code may be confusing for people who’re not used to C (or who haven’t done any in a while). When confronted with this code, Frama-C will separate the various operations and create intermediate variables for each of them, allowing you to understand the order in which they occur and their impact on the values of variables. You can either use the GUI or use the following command to see intermediate state variables:

    $ frama-c -val -print ptr.c

        [...]
        int main(void)
        {
            int __retres;
            int *ptr;
            int x;
            int z;
            int *tmp_0;
            int *tmp_1;
            x = 4;
            ptr = & x;
            { /*undefined sequence*/    tmp_0 = ptr; ptr ++; ; }
            z = *tmp_0;
            { /*undefined sequence*/ 
                tmp_1 = ptr;
                ptr ++;
                *tmp_1 = 12;
            }
            __retres = 0;
            return (__retres);
        }

Notice the values of tmp_0 and tmp_1. The ++ operator on the right hand side of the variable occurs after the rest of the lval or rval has been evaluated, so here a temporary variable will first be created that contains the same value as ptr. Then ptr will be incremented, and it is tmp_0’s pointed value that will be assigned to z. If the ++ operator had been written before ptr, it’d have been performed before the dereferencing leading to an invalid read when assigning the rval to z. Run a value analysis on ptr-pre.c to see this happen.

Note also that in any case, the ++ operator on a lval would be executed before the rval is assigned, hence the last line on ptr.c could lead to an invalid write – which is correctly detected by Frama-C. It turns out that sometimes such writes are not invalid, for instance we may be filling up an allocated array’s next element, and the code would be correct as long as we stop writing before the end of the array. A -slevel high enough to unroll the whole loop would be required to correctly differentiate buggy from valid programs in this case.

Value analysis with intervals rather than unknown values

We’re still annoyed that we cannot run the analysis for any value of read() because the value is obtained at runtime. Frama-C allows you to analyse values for ranges of input values, by using a built-in internal function that will assign an interval of values to a variable. We need to replace the read() call with a call to the appropriately named Frama_C_interval(int, int) function. In order to perform the value analysis, we now need to include the built-in Frama-C source code, which can be done by adding frama-c -print-share-path/builtin.c to their CLI call, or by importing that file in their GUI interface (usually located at /usr/share/frama-c/builtin.c on Linux systems).

What now changes when running the value analysis? Open loop-frama-c-interval.c to figure it out. The final ranges of possible values for x and y will be given in the console (not in the information tab), provided that the slevel is sufficiently high! To the best of my understanding, it needs to be set high enough to cover all iterations from all values within the interval! For instance, for the interval [-20..10], I need to set slevel = 514 and I have absolutely no idea why this specific number! It is quite obvious however that it increases quickly, and so there are practical limitations to the applications of interval-based value analysis: the time you’re willing to wait for the results!

Slicing with Frama-C

Now onto the actual topic of my students’ lab. I was interested in showing them Frama-C as I had no prior exposure to other slicing-capable open-source tools. Frama-C proposes various types of slicing, though I only wanted to demonstrate backward slices on the printf() statements as a means to validate the slices students had manually computed (the file they used for this exercise was loop.c). Somehow, the options differ whether running the GUI or CLI version of the analyser. In the GUI, one must first enable value analysis and then slicing from the analysis options dialog, and one can then right-click a statement to access a list of possible slicing options. In the CLI, one should instead use the following pattern (note the -then-on that allows printing the output of the slice):

    $ frama-c <source files> <desired slicing mode and appropriate options> -then-on 'Slicing export' -print

The following options are available, though not all of them are documented and there are naming inconsistencies between CLI and GUI:

  • slice calls to: backward slice on all the calls to a selected function, in the whole program, which would return the whole program except the useless j = n; assignment when run on print
  • slice calls into: only slicing the calls themselves this time
  • slice results or -slice-return: which I don’t really understand…
  • slice stmt: backward slice on the selected statement
  • slice lval or -slice-value: slice all the assignments containing a given variable in the lval (?), until the selected statement in the GUI or the end of the entry point function in the CLI
  • slice rd or -slice-rd: slices the read accesses to selected lvalues (and wr for the write accesses… probably
  • slice ctrl: it is quite likely that this option returns the part of a backward slice related to the control flow rather than data dependencies

Note that this tutorial’s been written with Frama-C Oxygen. On the current version, Fluorine, the CLI slicing plugin has had quite a bunch of options added and they are better explained than they used to. The command frama-c -slicing-help will return a full list of available CLI slicing options.

The Frama-C documentation will or built-in CLI help will tell you exactly what format is expected. A backward slice of the print(x); statement would look like:

    /* Generated by Frama-C */
    extern int ( /* missing proto */ read)();
    extern int ( /* missing proto */ print)(int x_0);
    void loop(void)
    {
        int x;
        int i;
        int n;
        n = read();
        x = 0;
        i = n;
        while (i > 0) {
            x ++;
            i --; }
        print(x);
        return;
    }

Program Dependency Graphs

I didn’t even know Frama-C can output the PDG of a program, but of course it needs to compute it prior to slicing so it makes perfect sense being able to output that! PDGs can be exported to the dot format and then generated into PostScript files as follows. Better warn you, the PDG generated by Frama-C looks pretty insane…

    $ frama-c -fct-pdg loop loop.c -main loop    -pdg-dot loop
    $ dot -Tps loop.loop.dot -o loop.ps

According to an anonymous contributor on stackoverflow, you can also use gcc’s -fdump-rtl-expand parameter to generate a PDG, and then use egypt to turn it into a visualisable graph. This might be of help if like me you’re confused by Frama-C’s PDGs.

Impact analysis and data scope

You can also use the impact analysis plugin (available by right-clicking statements in the GUI) to compute forward slices of a statement. Likewise, you can obtain the lines of code where a variable’s content is unchanged by right clicking a variable, clicking on Dependencies and then DataScope. Knowing these shorthands can be quite handy for analysis too, if you’re trying to understand when and how a specific variable is modified or what other code it affects. If using the CLI, the frama-c -impact-help command will tell you what’s available.

Combining slicing and value analysis

You can run analysis on an existing slice with the CLI, and you can even slice a slice if you want to! As you’ve seen from the CLI slicing command above, printing a slice requires appending -then-on 'Slicing export' -print. Basically, slicing causes a new project named ‘Slicing export’ to be generated, which you can later refer to to execute a second command. Performing multiple slices would cause the creation of ‘Slicing export 2’ and so on.

How does this come in handy? Remember that we needed slevel = 299 in our previous example to compute the final value range of x. if we first perform a backwards slice on reads to x, we can get rid of the y-related second loop and spare ourselves a lot of computation time! We can now drop to slevel = 24 to obtain the same value for x! And this is of course one among many applications of slicing…

    $ frama-c loop-fixed-val.c -slice-rd x -main loop -then-on 'Slicing export' -val -slevel 24

To conclude this post, here’s a little question (primarily directed at my students): if I told you that an app receives attacker-controlled input at a specific line of code in a specific variable, what analyses would you do to efficiently discover how this input may influence your app and cause it to be exploited?

Comments