Tools

C++ code coverage profiling with GCC/GCOV

The coverage analysis with GCC/GCOV includes three following steps

  • instrumented application build — libraries, executable(s), and profiling artifacts (*.gcno files) are created
  • the application test run(s) — the runtime coverage statistics (*.gcda files) is collected
  • coverage statistics post-processing with GCOV/LCOV — text/HTML coverage reports are generated

Instrumented application build

To enable the instrumented compilation use GCC/G++ with --coveragee flag. The know-how here is to specify the full path to source files during the compilation in order to be able to perform a cross-profiling and ease the use of LCOV (described below).

$ g++ -c -g -O0 --coverage -o $PWD/obj/myclass.o $PWD/myclass.cpp
$ g++ -c -g -O0 --coverage -o $PWD/obj/main.o $PWD/main.cpp
$ g++ -g -O0 --coverage -o $PWD/bin/myapp $PWD/obj/*.o

The GNU Make has two useful functions to convert filenames to absolute ones: $(abspath ...) and $(relpath ...)

When compiling with --coverage flag, the *.gcno file is created in the same location, as the object file . This file is used by GCOV for post-processing application's statistics collected at runtime and contains profiling arcs information.

$ ls $PWD/obj
main.gcno main.o myclass.gcno myclass.o
$

Coverage statistics collection

An instrumented application collects coverage statistics at runtime and creates a set of *.gcda files (or updates existing ones) on exit. For every *.gcno file created during the build a corresponding *.gcda file is created by the instrumented application upon exit. For *.gcda files to be generated, the application must exit cleanly by either returning from main() or by calling exit().
For client/server applications I typically install SIGTERM handler to ensure a clean application termination.

The directory, where *.gcda files are to be created should exist and be writable by the application. A *.gcda file is created by default in the directory where the corresponding *.gcno file was created during the build.
To find out the exact location the following command can be used

$ strings $PWD/bin/myapp | egrep '.gcda$'
/home/bobah/Work/coverage/obj/main.gcda
/home/bobah/Work/coverage/obj/myclass.gcda

In many cases the coverage statistics should be collected from the application running on the environment (host, user etc.) other then the one where the application was built, so creating *.gcda files in the build directory may be impossible or impractical. To override the location to store the *.gcda files two environment variables can be used: GCOV_PREFIX and GCOV_PREFIX_STRIP. For example, if we want to replace the four leading elements in the path ("/home/bobah/Work/coverage") where myapp stores *.gcda with "/home/bobah/Work/cov_rpt" we would need to define the variables as follows (bash syntax) in the application's environment

export GCOV_PREFIX="/home/bobah/Work/cov_rpt"
export GCOV_PREFIX_STRIP=4

As a result of the override, the file myclass.gcda will be created in /home/bobah/Work/cov_rpt and not in /home/bobah/Work/coverage

Note, that for the post processing of the coverage data the most convenient way is to release source files and *.gcno artifacts to under $GCOV_PREFIX, so that a *.gcda file is created in the same directory as corresponding *.gcno
I do it using rsync like

rsync -acv --filter='+ */' --filter='+ *.cpp' --filter='+ *.h' --filter='+ *.gcno' --filter='- *' /home/bobah/Work/coverage/ /home/bobah/Work/cov_rpt

The coverage data is accumulated during subsequent application runs. To reset the counters either delete all *.gcda files under $GCOV_PREFIX directory

$ find $GCOV_PREFIX -type f -name '*.gcda' -print | xargs /bin/rm -f

Or use LCOV functionality

$ lcov --directory $GCOV_PREFIX --zerocounters

Post-processing runtime coverage statistics

I prefer using LCOV wrapper for the coverage data processing because it generates nicely looking HTML reports.
The essential bit here is that the source code tree is available in exactly the same place as it was during the build. This is required because gcov application behavior can't be manipulated using GCOV_PREFIX/GCOV_PREFIX_STRIP and it expects files exactly as they are stored in *.gcno files.
The data post processing:

$ lcov --directory $GCOV_PREFIX --capture --output-file $GCOV_PREFIX/app.info

The HTML reports generation:

$ genhtml --output-directory $PWD/cov_htmp $GCOV_PREFIX/app.info

Troubleshooting

"stamp mismatch with graph file" error message during gcov/lcov invocation

The *.gcno file contains a time stamp tag. The same tag is put to the runtime coverage report *.gcda file by the application. If the application and *.gcda files are created in the different build runs, gcov will refuse processing them.
The tag can be extracted from a file and compared *.gcda vs *.gcno

$ hexdump -e '"%x\n"' -s8 -n4 myclass.gcda
7ef26ee7
$ hexdump -e '"%x\n"' -s8 -n4 myclass.gcno
7ef26ee7

*.gcda files are not generated

Here there can be 2 possibilities, either the directory where the applkication wants to write *.gcda file does not exist or is not writable, or the application does not exit property by either returning from main() or calling exit(), typically when SIGTERM is not properly handled

Links

C++ Heap Map

A small malloc() and free() interceptor library (heaptrace.so, 45 lines of C++) and minimalistic TCL/TK GUI for visual C++ heap analysis — a powerful tool for disputes settlement and curiosity satisfaction.

An executable under investigation should run the following way (NEdit text editor in the example)

$ LD_PRELOAD=.../heaptrace.so .../nedit | awk '/^[+-]heap /' >.../heaptrace.log
An awk-based filter is suppressing the process's own stdout output.

Because of the preloaded interceptor library a message will be printed to the stdout on each malloc() or free() invocation. The resulting dump would look like

. . .
+heap 0x20b5040 64
+heap 0x20b5090 8176
+heap 0x20b7090 4096
+heap 0x20b80a0 32
+heap 0x20b80d0 8
+heap 0x20b80f0 568
-heap 0x20b80f0 0
+heap 0x20b80f0 2784
-heap 0x20b80f0 0
+heap 0x20b80f0 336
. . .

At the moment of interest the GUI script can be invoked on the resulting dump as

$ .../heapmap.tk .../nedit.allocs.txt
The GUI script will print some diagnostic information
reading nedit.allocs.txt
allocated memory blocks: 16278
input min address: 0x1dc1040
input max address: 0x2058aa0
aligned min address: 0x1dc1000
aligned max address: 0x2059000
page size: 0x1000 (4096)
number of pages: 0x298 (664)
max_pages_per_row=2.23606797749979
pages_per_row=2
allocations drawing done
vertical grid lines done
horizontal grid lines done
and will display the heap map heapmap.tk The map can be zoomed by GUI buttons or via left mouse button area selection and scrolled either with scroll bars or right mouse button drag. Double click on the allocated block highlights all other blocks of the same size on the map. heapmap.tk.zoom

C++ source code for the library (build instructions in the comments) is below. It is also in attachment to the post along with the GUI TCL/TK script (license)

#include <cstdio>
#include <dlfcn.h>

#define likely(x)   __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)

// g++ -pthread -m64 -fPIC -std=c++0x -O3 -Wl,-zdefs -Wl,-znow -ldl -shared -Wl,-soname,heaptrace.so -o heaptrace.so heaptrace.cc

namespace
{

/**
 * malloc() direct call
 */
inline void * libc_malloc(size_t size)
{
  typedef void* (*malloc_func_t)(size_t);
  static malloc_func_t malloc_func = (malloc_func_t) dlsym(RTLD_NEXT, "malloc");

  return malloc_func(size);
}

/**
 * free() direct call
 */
inline void * libc_free(void* ptr)
{
  typedef void (*free_func_t)(void*);
  static free_func_t free_func = (free_func_t) dlsym(RTLD_NEXT, "free");

  free_func(ptr);
}

/**
 * malloc() call recorder
 */
void record_malloc(size_t size, void* ptr)
{
  if (unlikely(ptr == 0)) return;

  char buf[64];
  size_t len = snprintf(buf, sizeof(buf) / sizeof(char), "+heap %p %lu\n", ptr, size);
  fwrite(buf, sizeof(char), len, stdout);
}

/**
 * free() call recorder
 */
void record_free(void* ptr)
{
  if (unlikely(ptr == 0)) return;

  char buf[64];
  size_t len = snprintf(buf, sizeof(buf) / sizeof(char), "-heap %p 0\n", ptr);
  fwrite(buf, sizeof(char), len, stdout);
}

} // anonymous namespace


/**
 * malloc() override
 */
extern "C" void* malloc(size_t size)
{
  void* ptr = libc_malloc(size);
  record_malloc(size, ptr);
  return ptr;
}

/**
 * free() override
 */
extern "C" void free(void *ptr)
{
  libc_free(ptr);
  record_free(ptr);
  return;
}

On Linux where address randomization may take place one would need to run setarch -R in the shell to switch it of during the test.

AttachmentSize
heapmap.tk16.96 KB
heaptrace.cc1.45 KB

Executing Tasks in Parallel with Xargs

A neat single liner to run tasks in parallel with xargs command available on every Linux/Unix server.
#!/bin/bash

...
echo PARALLEL_JOBS:${PARALLEL_JOBS:=1}

declare -a tests=($(.../find_all_tests))
echo "${tests[@]}" | \
  xargs -d' ' -n1 -P${PARALLEL_JOBS} -I {} bash -c ".../run_test {}" || { echo "FAILURE"; exit 1; }

echo "SUCCESS"

Incremental Text File Monitoring

In many situations it is desired to monitor a log file for errors in incremental fashion, as if tail -f was run on it, but in batches, say from a cron-scheduled task. The subject script takes the filename as an input and outputs what changed from the last invocation (storing the state in the same directory as a subject file).
The script (whatsnew.awk)
#!/bin/awk -f
# Copyright (C) 2012 Vladimir Lysyy
# http://bobah.net/d4d/source-code/license

BEGIN {

    FS=" ";
    state_file="";
    last_line=0;
}

{
  if (NR == 1) {
    state_file = FILENAME ".lastline";
    sub(/^.*\//, "", state_file);
    getline last_line < state_file;
    print "state_file =", state_file, "last_line =", last_line
  }

  if (NR <= last_line) next;
}

{ print; }

END {
  print NR > state_file
}

Example invocation
$ whatsnew.awk /var/log/messages

Jackrabbit on Apache Tomcat

# download files
wget 'http://mirror.switch.ch/mirror/apache/dist/tomcat/tomcat-6/v6.0.26/bin/apache-tomcat-6.0.26.tar.gz'
wget 'http://www.apache.org/dyn/closer.cgi/jackrabbit/2.1.0/jackrabbit-webapp-2.1.0.war'
wget 'http://repo1.maven.org/maven2/javax/jcr/jcr/2.0/jcr-2.0.jar'

# extract tomcat
tar xzf apache-tomcat-6.0.26.tar.gz

# put JCR jar on classpath
cd apache-tomcat-6.0.26
mkdir -p ./shared/lib
cp ../jcr-2.0.jar ./shared/lib
sed -s 's@shared.loader=@shared.loader=${catalina.home}/shared/lib/jcr-2.0.jar,@' ./conf/catalina.properties

# start Tomcat
bin/startup.sh

# deploy Jackrabbit
cp ../jackrabbit-webapp-2.1.0.war ./webaps

Java Developer's Toolset

This small article is a reminder for myself, for the case when I need to set my workspace up next time. A set of short and most essential points which I don't want to ever re-discover again. I start from the description about the development, build, and testing environment and accepted practices and continue with the description (or just mentioning) of important software packages or frameworks (IO, logging, persistence, XML, etc.) which can really boost a developer's performance and are a must for your code to be "pluggable" like the rest of Java code.

A Convention

A basic things about Java development environments is a convention, as basic as a rule to put #ifndef...#define...#endif macros in C/C++ header files. Each class should belong to a package. A package name is reversed order corporate Internet domain name followed by dot separated hierarchical package name in direct order (like net.bobah.examples.logging). The package statement should be the first statement in the file. The files should be located in the directory tree according to their packages names. The file HelloWorld from net.bobah.examples.logging package should be put to .../net/bobah/examples/logging/HelloWorld.java). Once the convention is met the source tree can be used by most Java development frameworks and build systems without any refactoring.

IDE

I like the Eclipse platform (the one for Java EE), http://www.eclipse.org/downloads/moreinfo/jee.php. A full featured development framework written in Java and for Java. Very convenient for code browsing, refactoring, debugging. And building, but I personally prefer Ant (see next section "Building"). It also provides an excellent real-time compiler errors reporting. The most essential thing to know about Eclipse is a terminology:
project — a project, a set of files not having much sense without each other (usually implementing a well determined piece of functionality)
workspace — a set of projects having something in common (interdependencies, logical or runtime connection, common protocols, etc.), kind of "working context", the workspaces can be "switched" from one to another perspective — a configuration of a GUI's widgets layout and a functionality available via menus. There are, for instance "Java code browse" and "Java debug" perspectives. Perspectives themselves are pluggable, so one can have, say Hibernate perspective with special tools for persistence layer development.
The refactoring tools and automatic code generators for constructors and getters/setters are available via context menu (and shortcuts). "SHIFT+CTRL+T" is a sortcut for a class search within the workspace.
I'd recommend downloading and installing the Eclipse IDE for Java EE Developers as it much more feature reach than the one for Java SE (JSP/JSF, Web-Services, EJB, XML, etc.).

Building

The build is typically done either directly from Eclipse or using either Maven or Ant.

Eclipse

http://eclipse.org/. The simplest push-button build variant, good for the build debug or a prototype small project. The CLASSPATH, input and output directories are configuration is done via project properties menu. The project specific settings are stored in .project and .classpath files in the project's root directory. The workspace specific settings (code style, compiler settings, etc.) are configured via Window->Preferences, I never had to find either of these setting in the file system.

Ant

http://ant.apache.org/. Very similar to makefile but configuration files are in XML format and there is a set of Java specific atomic tasks so there is no need to memorize all javac flags and automate a filesystem iteration for compiling a source tree. My preferred build system. Scalable, extensible, configurable, yet simple and very fast.

Maven

http://maven.apache.org/. Convention enforcing, strict framework written in plugin-based architecture. Plugins exist for most common use cases. When using Maven the Eclipse project can be generated with this command: "mvn -DdownloadSources=true -DdownloadJavadocs=true -DoutputDirectory=target/eclipse-classes eclipse:clean eclipse:eclipse".

Already written

Testing - JUnit

http://www.junit.org/ a de facto standard for Java project unit testing.

Logging — SLF4J & Logback

http://slf4j.org// and http://logback.qos.ch/. The first is a logging abstraction layer, the letter is a feature rich, configurable, and reliable logger.

Persistence — Hibernate

https://www.hibernate.org/. JPA certified persistence framework. Very powerful and extremely easy to set up and use once you understand the concepts. Offers and excellent reverse engineering (DB-to-Java or DB-to-DDL) functionality. There is a Hibernate plugin for Eclipse offering visual configuration files creation with development time connection to the DB.
A good complimentary tool, a DBMS client written in Java is a SquirrelSQL, it let's you test a DB connection settings, SQL queries in a JDBC dialect and many more. And it is using the same code and drivers which your application would use.

Web Services & Web GUI (JSP) — Apache Tomcat, Servlet, JAX-WS, SoapUI

Lot of other stuff

There are lots of projects in the Apache Software Foundation which can be combined, reused (as a whole or partially) or copy-pasted from.

Simple Test Runner

300 lines of Bash code. Take and use.

Born on a rainy Herfordshire weekend, the Simple Test Runner, a balanced fuse of simplicity and power, is an essence of more than a decade of experience with software development processes automation at world's leading EDA companies and a top tier investment bank.

The Runner will be an ideal base for testing automation in any Bash-enabled Linux environment. The code base of just about 300 lines is meant to direct but not limit the user. A single file tests.txt is both a list of tests and a configuration.

Software Quality Assurance (QA)

Quality Assurance is the process that the testing is a part of. A test is a single QA task with well defined success criteria (e.g. software build is a QA step validating that the software builds, and is thus the first step in any QA process). Test is only worth conducting if it positively impacts the QA process (the cost to run the test is less than the penalty from the potential loss if not running it). Once automated, the test well amortizes across the life time of the software, providing that automation process maintenance is robust enough and is not an overhead on its own.

Tests are typically subdivided by categories (unit, functional, regression, performance, etc.), but it really does not matter for automation. What does matter is the cost of automation versus that of manual testing. In order for test to be reliably automated it should be possible to consistently isolate the entity being tested (for input and environment) and have a reliable success criteria capable of producing the binary yes/no answer for the test. Simple Test Runner is written to treat software QA as such.

Abstraction Levels

The Simple Test Runner has three levels of testing abstraction and includes just enough support for all of them. These levels are (from abstract to specific):

  1. Abstract Test - run test command and check its exit code
  2. Material Test - run test.sh script that takes three parameters, test setup command, test execution command, and test result analyzer command, all three must succeed (0 exit code) in order for test to be counted as passed
  3. Specific Test - run test.sh script same way as above but use analyzer.sh to analyze the output, the analyzer.sh expects that test command produces files and needs to be provided the directory with "golden" files to compare test output with
Start with the simplest one that solves your immediate task, check the next one when need more.
Distribution Archive Contents

suite
     \
      +-code     # Bash Code, ~319 lines
      +-example  # is worth a thousand words
      +-LICENSE  # to guard ourselves from each other

Minimum Setup

  1. Put the directory code somewhere and treat it read-only
  2. Create a file tests.txt in the directory where you run the test or, preferrably, create it somewhere else and create a symlink to it in the directory where you're going to run tests
If you change to the directory having tests.txt you should be able to run the script suite.sh. If you run it as suite.sh -n -g\* it will display all tests in the file tests.txt.

First Level - Abstract Test

Abstract test is only supposed to be a shell executable and thus be able to return exit status.

The file tests.txt can look as the one below:

# [group name] [test name]            [timeout, sec.] [command to run]
  run          trivial_pass_test      30              /bin/true
  run          trivial_fail_test      30              /bin/false
  run          trivial_timedout_test  3               /bin/sleep 5
The suite.sh runs test's command line and fetches the exit code if the code is 0 the test is PASSED, otherwise FAILED
bobah@europa> suite.sh -g run ;# -g: a group of tests to run, in this case "run"
suite.sh -I- run trivial_pass_test (pid=4813, pwd=run/trivial_pass_test/work)
suite.sh -I- trivial_pass_test - PASSED, 00:00:00 (0s)
suite.sh -I- run trivial_fail_test (pid=4821, pwd=run/trivial_fail_test/work)
suite.sh -I- trivial_fail_test - FAILED, rc=1, 00:00:01 (1s)
suite.sh -I- run trivial_timedout_test (pid=4831, pwd=run/trivial_timedout_test/work)
suite.sh -I- run trivial_timedout_test - TIMEOUT, terminated, 00:00:04 (4s)
suite.sh -I- total tests: 3, passed: 1, failed: 2

Second Level - Material Test

Material test is supposed to need pre-run preparation and produce analyzable results. For instance, preparation can be fetching the person's e-mail from address book, test execution can be sending a mail to the person and asking to reply, and test results analysis can be checking for the reply in the own mailbox.

In case your test fits in prepare-execute-analyze model you can use Simple Test Runner's test.sh script, which needs you to provide commands for three above described steps. Setup and Analyzer commands are defaulted to /bin/true, so if configured as below, it runs just as an Abstract Test, just testing the exit code of test executable.

# [group name] [test name]                [timeout, sec.] [command to run]
  run          abstract_as_material_pass  30              ${BASE_DIR}/test.sh -x /bin/true

The below tests.txt demonstrates all possibilities of the Material Test model implemented in test.sh

run  material_pass           30  ${BASE_DIR}/test.sh -x /bin/true
run  material_fail_setup     30  ${BASE_DIR}/test.sh -s /bin/false -x /bin/true  -a /bin/true
run  material_fail_running   30  ${BASE_DIR}/test.sh -s /bin/true  -x /bin/false -a /bin/true
run  material_fail_analyzis  30  ${BASE_DIR}/test.sh -s /bin/true  -x /bin/true  -a /bin/false

Third Level - Specific Test

Specific test is supposed to produce something in the directory where it runs. The canonical file-based test output analysis implementation is provided by the script analyzer.sh, which is also a part of the Simple Test Runner.

The analyzer.sh script expects three parameters: diff command (defaults to "diff -q"), filter command (defaults to /bin/cat), and a directory with golden output to compare the current output with. The work done by analyzer for each file in the output is schematically described as below.

current/outputfile.ext | filter | outputfile.txt.current \
                                                          --> diff_cmd ? PASS/FAIL
golden/outputfile.ext | filter | outputfile.txt.gold     /

AttachmentSize
str_v0.2.2.tgz5.08 KB