Sunday, December 16, 2012

Отчёт о Dee Mon Winter Contest


По традиции участвовал в этом контесте в составе команды THIRTEEN. Отчёт(осторожно, там СПОЙЛЕРЫ-СПОЙЛЕРЫ-СПОЙЛЕРЫ), написанный san'ом и brox'ом с моими небольшими дополнениями можно прочитать здесь. Контест выдался замечательный, вполне в духе, автору браво и аплодисменты.

Thursday, October 11, 2012

Talk: What every programmer should know about merging branches in Mercurial

Using named branches in Mercurial can be rather difficult especially when it gets to merging these branches. There are several useful techniques and process improvements that simplify the development/release loop. You can find some of them in the talk slides I did on Oct 11, 2012 here at Quickoffice(aka DoctorMobile). It doesn't restrain any development flow and can be useful for any Mercurial (or even non-Mercurial) user. For those who are not familiar with Mercurial at all I recommend to read some guides before looking at the slides. What every programmer should know about merging branches in Mercurial Sources for the slides can be found here

Sunday, May 27, 2012

Counting bits

How would you count the number of bits set in a 32-bit integer? This question is amongst the favorite interviewing questions. The naïve approach is to loop through the bits aggregating the sum in a variable. It's already fast enough but you can make the interviewer happy with a much better "lookup table" solution: just use a static 256-entry table that contains pre-evaluated bits count for any 8-bit integer. With such table you need only 4 lookups and 4 summations to get the answer. "Lookup table" algorithm is even implemented as a GCC builtin function. Take a look at the code:

#include <stdint.h>

const uint32_t tbl[] = {
4181239173/*,lots of random data, I used 16384 integers for tests*/,2866015773 } ;

#define TEST_COUNT 100000
static const uint32_t ARRAY_SIZE = sizeof(tbl)/sizeof(tbl[0]);

int main()
{
 uint32_t i = 0;
 volatile register uint32_t c;

 for (; i< TEST_COUNT; ++i)
 {
  uint32_t j = 0;
  for (; j < ARRAY_SIZE; ++j)
  {
   c = __builtin_popcount(tbl[j]);
  }
 }
}

Here __builtin_popcount is a function that calculates the population count. If compiled with
gcc -O3 t.c -o bitcount
one can get such results:
$ time ./bitcount 
real 0m17.465s
user 0m17.425s
sys 0m0.000s
Not too fast, huh? Some of the time is wasted calling the function and inlining it can speed up things. But! There is even a better way. Compiling it with
gcc -O3 -mpopcnt t.c -o bitcount
results in:
$ time ./bitcount 
real 0m1.738s
user 0m1.728s
sys 0m0.004s
That's simply because modern processors have POPCNT instruction that can be used to count the number of bits and GCC is nice enough to support it. As you can see, in this case hardware optimization beats smart algorithm design with ease.

Resume

Algorithms are designed on paper but executed on a real hardware. Knowing what your hardware is capable of is a must for a software engineer. And please don't tell me about all these patterns and object-oriented stuff: unless you invent a machine that can run concepts instead of machine instructions, you must be aware of all the low-level stuff because it will save lives because I don't want to buy a new laptop capable of running new version of nano/Notepad.

I also highly recommend an awesome article "Benchmarking CRC32 and PopCnt instructions" for further reading.

Thursday, March 29, 2012

Ruby-like blocks and yield "keyword" in C

Today I want to show you some of the plain old C black magic. This is the magic of direct user context manipulating. You can use this low-level feature to implement some high-level language abstractions such as generators and ruby-like blocks. Let's start with an example(test1.c):

#include <stdio.h>
#include <string.h>
#include "gens.h"
#include "cblocks.h"


int main(int argc, char** argv)
{
EXECUTE_BLOCK(enumerate, (1, 10), int, x)
printf("Got %d\n", *x);
END_BLOCK
char* buf = "Hello, world!";
EXECUTE_BLOCK(enumerate, (1, 4), int, z)
printf("Saying it %d time\n", *z);
EXECUTE_BLOCK(for_each, (buf, buf + strlen(buf)), char, c)
printf("%c.", *c);
END_BLOCK
printf("\n");
END_BLOCK

return 0;
}

The code above is a valid C code. It produces such output:
Yielding 1
Got 1
Yielding 2
Got 2
Yielding 3
Got 3
Yielding 4
Got 4
Yielding 5
Got 5
Yielding 6
Got 6
Yielding 7
Got 7
Yielding 8
Got 8
Yielding 9
Got 9
Yielding 1
Saying it 1 time
H.H.e.e.l.l.l.l.o.o.,.,. . .w.w.o.o.r.r.l.l.d.d.!.!.
Yielding 2
Saying it 2 time
H.H.e.e.l.l.l.l.o.o.,.,. . .w.w.o.o.r.r.l.l.d.d.!.!.
Yielding 3
Saying it 3 time
H.H.e.e.l.l.l.l.o.o.,.,. . .w.w.o.o.r.r.l.l.d.d.!.!.

Let's take a look on the implementation of used generators (enumerate, for_each in gens.c):
#include "gens.h"
#include <stdio.h>

START_GENERATOR(int, enumerate, (int start, int end))
int i = start;
for(; i<end; ++i)
{
printf("Yielding %d\n", i);
YIELD(i);
}
END_GENERATOR

START_GENERATOR(char, for_each, (char* start, char* end))
while (start != end)
{
YIELD(*start);
YIELD(*start); // intentionally
++start;
}
END_GENERATOR

As you can see, the implementation of generators is pretty straightforward. All the magic is done in cblocks.h:
#include <stdlib.h>
#include <ucontext.h>

#define DECLARE_GENERATOR(type, name, params) extern ucontext_t name ## _context__; \
extern ucontext_t name ## _out_context__; \
extern int name ## _finished__; \
extern type name ## _out_var__; \
extern void name ## _fun__ params

#define START_GENERATOR(type, name, params) ucontext_t name ## _context__; \
ucontext_t name ## _out_context__; \
int name ## _finished__ = 0; \
type name ## _out_var__; \
void name ## _fun__ params \
{ \
name ## _finished__ = 0; \
int* current_finished = & name ## _finished__; \
ucontext_t* current_context = & name ## _context__; \
ucontext_t* current_out_context = & name ## _out_context__; \
type* current_var = & name ## _out_var__;

#define END_GENERATOR *current_finished = 1; \
}

#define YIELD(x) *current_var = x; swapcontext(current_context, current_out_context);
#define EXECUTE_BLOCK(gen, params, type, varname) do{ type* varname = & gen ## _out_var__; \
ucontext_t* current_context = & gen ## _context__; \
ucontext_t* current_out_context = & gen ## _out_context__; \
volatile int gen ## _temp_var__ = 0; \
getcontext(& gen ## _out_context__); \
if (gen ## _temp_var__ == 0) \
{ \
gen ## _temp_var__ = 1; \
gen ## _fun__ params; \
}\
else \
{ \
while (! gen ## _finished__) {

#define END_BLOCK swapcontext(current_out_context, current_context); } \
} \
} while(0);

Manual pages for swapcontext and getcontext can help in understanding what's going on here.

Resume

  • It's possible to implement nifty language features in C using its low-level features. The generators described above can be even used to build cooperative multitasking systems.
  • It doesn't mean that you should do it just because you can ;). Beware that context manipulating doesn't work on all platforms.
  • You can find the source code on github. The code should be considered just a playground so please forgive me its untidiness. If you want something mature(e.g., you're seeking for coroutines support in C), check out the libtask, libtorque or libcoro

As always I will be happy to accept any criticism on the code above. Feel free to contact me in any way you can imagine.