Monday 29 December 2008

And The Dog Ate My Homework

After relocating from London to Buenos Aires my HD broke. Then, the pile of DVDs and CDs got lost including the OSX rescue (finding Mac install DVDs on christmas to borrow was quite a challenge) and my pre-flight backup DVDs were gone there too. Oh, and my hidden TrueCrypt volume got trashed by stupidly adding things to the outer volume (paranoia doesn't pay.)

Luckily the outlines of upcoming posts were stored on Blogger, and I remember most of the code's ideas. I'll try to move on from this.

Besides that hiccup things are doing very well and even got a suntan.

Backup now to a pendrive, to some DVDs, to an external HD, to Gmail, everywhere. Use TrueCrypt, but wisely, get yourself some subtle warning if you do the hidden volume trick! (And at least write-protect the drive.)

Sunday 9 November 2008

The Ephemeral Ports Problem (and solution)

Richard Jones ran into a problem doing load tests with many open sockets. It was quite an interesting thing to investigate. For reasons still unknown to me after many hours of reading kernel code, documentation, and mailing lists, Linux shares the assigned list of ephemeral ports across all local IPs for unconnected sockets. I hope this post will save some time to someone and there is a workaround suggestion for libevent. (By the way RJ, you were right on this problem! Though I differ now on the solution... Sorry, can't help it, it seems :)

Ephemeral Ports

Ephemeral ports are an assigned range of IP ports to be used by unprivileged processes. This range is usually a few thousand ports above 1024. This range can be modified by the administrator by modifying the relevant sysctl variable. As you probably know, transport protocols in the TCP/IP suite use ports, and when a program wants to initiate communication with this protocols it needs to ask for a local address, and if not done explicitly the OS assigns one automatically. The problem lies on the lookup of available ports.

The operating system tracks the port numbers in use and also the ones in use recently (to know how to handle leftover incoming network packets.) In Linux, this is done with a technique called hash table. The code in the Linux kernel for TCP/IP networking is quite complicated, lacks documentation or comments, and is hard to track what is defined where. After many days banging my head against the crude code, I finally got it. Random posts on internet and a side comment on a standard draft said the ephemeral range was shared for all local addresses on most operating systems. I wanted to know where, how, and if possible, why. So far, I only got the first two and only hints of the last.

Ephemeral Port Assignment

The pattern to create a TCP socket for client software is to call:
int  sock_fd = socket(AF_INET, SOCK_STREAM, 0);
This will make a socket of TCP/IP famiy, of stream type (connected), of default protocol ("0" in this case, TCP.) After this you can manually assign it to a local address by calling:
bind(sock_fd, local_addr, local_addr_length);
That address should contain both the IP address and the port. If the port specified is 0, the kernel looks up for an available port in the ephemeral range. After this you make the actual connection to the server with:
connect(sock_fd, destination_addr, destination_addr_length);
If the bind step was omitted the kernel's connect code does a similar, but slightly different lookup of available ports. Let's compare both lookups.

The bind lookup algorithm resides in net/ipv4/inet_connection_sock.c's function inet_csk_get_port():
/* Obtain a reference to a local port for the given sock,
* if snum is zero it means select any available local port.
*/
int inet_csk_get_port(struct sock *sk, unsigned short snum)
{
/* ... */
if (!snum) {
int remaining, rover, low, high;

inet_get_local_port_range(&low, &high);
remaining = (high - low) + 1;
rover = net_random() % remaining + low;

do {
head = &hashinfo->bhash[inet_bhashfn(net, rover,
hashinfo->bhash_size)];
spin_lock(&head->lock);
inet_bind_bucket_for_each(tb, node, &head->chain)
if (tb->ib_net == net && tb->port == rover)
goto next;
break;
next:
spin_unlock(&head->lock);
if (++rover > high)
rover = low;
} while (--remaining > 0);

/* Exhausted local port range during search? It is not
* possible for us to be holding one of the bind hash
* locks if this test triggers, because if 'remaining'
* drops to zero, we broke out of the do/while loop at
* the top level, not from the 'break;' statement.
*/
ret = 1;
if (remaining <= 0)
goto fail;

/* OK, here is the one we will use. HEAD is
* non-NULL and we hold it's mutex.
*/
snum = rover;
} else {
When snum is 0, it looks for an available bucket in the hash table, but if there is anything in it (any socket using that port, or recently closed) it keeps looking. If the search hits the end, the function fails. To note, there is no use of local IP address in the hash table! The net thing passed isn't forthat. The hash table only cares of port numbers. In contrast, the port lookup on connect in net/ipv4/inet_hashtables.c does:
int __inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk, u32 port_offset,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16, struct inet_timewait_sock **),
void (*hash)(struct sock *sk))
{
/* ... */
if (!snum) {
int i, remaining, low, high, port;
static u32 hint;
u32 offset = hint + port_offset;
struct hlist_node *node;
struct inet_timewait_sock *tw = NULL;

inet_get_local_port_range(&low, &high);
remaining = (high - low) + 1;

local_bh_disable();
for (i = 1; i <= remaining; i++) {
port = low + (i + offset) % remaining;
head = &hinfo->bhash[inet_bhashfn(net, port,
hinfo->bhash_size)];
spin_lock(&head->lock);

/* Does not bother with rcv_saddr checks,
* because the established check is already
* unique enough.
*/
inet_bind_bucket_for_each(tb, node, &head->chain) {
if (tb->ib_net == net && tb->port == port) {
WARN_ON(hlist_empty(&tb->owners));
if (tb->fastreuse >= 0)
goto next_port;
if (!check_established(death_row, sk,
port, &tw))
goto ok;
goto next_port;
}
}

tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
net, head, port);
if (!tb) {
spin_unlock(&head->lock);
break;
}
tb->fastreuse = -1;
goto ok;

next_port:
spin_unlock(&head->lock);
}
local_bh_enable();

return -EADDRNOTAVAIL;

ok:
/* ... */
The algorithm is quite similar but if the hash table bucket for the port is in use, it calls check_established() to perform further checks:

/* called with local bh disabled */
static int __inet_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk, __u16 lport,
struct inet_timewait_sock **twp)
{
/* ... */
/* Check TIME-WAIT sockets first. */
sk_for_each(sk2, node, &head->twchain) {
tw = inet_twsk(sk2);

if (INET_TW_MATCH(sk2, net, hash, acookie,
saddr, daddr, ports, dif)) {
if (twsk_unique(sk, sk2, twp))
goto unique;
else
goto not_unique;
}
}
tw = NULL;

/* And established part... */
sk_for_each(sk2, node, &head->chain) {
if (INET_MATCH(sk2, net, hash, acookie,
saddr, daddr, ports, dif))
goto not_unique;
}

unique:
/* Must record num and sport now. Otherwise we will see
* in hash table socket with a funny identity. */
inet->num = lport;
inet->sport = htons(lport);
sk->sk_hash = hash;
WARN_ON(!sk_unhashed(sk));
__sk_add_node(sk, &head->chain);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
write_unlock(lock);

if (twp) {
*twp = tw;
NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED);
} else if (tw) {
/* Silly. Should hash-dance instead... */
inet_twsk_deschedule(tw, death_row);
NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED);

inet_twsk_put(tw);
}

return 0;

not_unique:
write_unlock(lock);
return -EADDRNOTAVAIL;
}
This allows to reuse the same local port as long as the 5-tuple (protocol, source address, source port, destination address, destination port) doesn't exist already (the INET_MATCH call.)

Catch 22

So there is a dilemma on how to create more client TCP sockets than the number of available ephemeral ports (let's call n_sockets and n_ephemeral.)
  • Increasing n_sockets by having with multiple source IP addresses (the RJ approach) won't work, because it will fail on the lookup of available ephemeral ports (it doesn't care about the source address.)
  • If you make just a connect call you get limited to n_ephemeral because the lookup isn't for IP and ephemeral port, it's just a lookup of a port within a local IP (as noted on the comment above.)
[Note: there is no way to do an incomplete bind of only the IP address part and leaving the port to be assigned for later.

After this situation RJ offered a patch to libevent to do it the way httpperf does, binding local address and port. This means the client code has to do the port allocation lookup and if not carefully managed it will be an incredible amount of work on tries to call bind(). In my opinion, this is hackish and ugly. It's not their fault, they were cornered by poor implementations and poor interfaces. In RJ's case libevent always calls bind before connect so there isn't even a chance to do it right, as it is.

Also I didn't like the idea of having to bother the user to have more local addresses and having to pass that to the client program.

My $.02

As a programmer, one way to allow so many connections to a server from a single host would be to instead increase the number of ports the server is listening. This is very common and should be trivial to do and scales very well (n_ephemeral times the number of server ports.) The only limitation is if there is a firewall or some other kind of filter but it is quite unlikely. In this particular case this would require a modification of libevent, to prevent it from calling bind() before connect if no local address is specified (for client code.) This is in effect a four line patch and no change of libevent API (RJ's diff adds another API function and is about 16 lines):
--- http.c      2008-09-08 01:11:13.000000000 +0100
+++ http.c.new 2008-11-13 02:09:12.000000000 +0000
@@ -1731,7 +1731,10 @@
assert(!(evcon->flags & EVHTTP_CON_INCOMING));
evcon->flags |= EVHTTP_CON_OUTGOING;

- evcon->fd = bind_socket(evcon->bind_address, 0 /*port*/, 0 /*reuse*/);
+ if (evcon->bind_address)
+ evcon->fd = bind_socket(evcon->bind_address, 0 /*port*/, 0 /*reuse*/);
+ else
+ evcon->fd = socket(AF_INET, SOCK_STREAM, 0); /* generic socket */
if (evcon->fd == -1) {
event_debug(("%s: failed to bind to \"%s\"",
__func__, evcon->bind_address));
(Yes, I already mailed Niels a few days ago about it. But as usual, he'll probably have a better way to do it. Hi Niels ;)

Trying to Make Sense of the Kernel Algorithm

Why are ephemeral ports searched this way? Why is bind() so strict? Well, at that point:
  • The kernel only knows it is a TCP socket.
  • It doesn't know if it is going to be a client or server (listen) socket.
  • And even if it knew it is a client, it wouldn't know yet the destination address and port.
Some peripheral comments on the subject on Linux kernel mailing list mention the issues of strange things like double connects (valid in TCP.) I am still not convinced this isn't just an archaic lookup that doesn't consider the local address. This issue is discussed by Fernando Gont (a fellow UTN-er, what a coincidence.) in his IETF draft. This was made earlier this year (February 2008) I guess for working out the issues with port prediction (like Dan Kaminsky's DNS bug.) Very interesting read.

Extra


Here is some code to play with:
/*
Copyright (C) 2008 Alejo Sanchez

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/resource.h>
#include <errno.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

const char *addrs[] = { "127.0.0.1", "127.0.0.2" };

int
main (int argc, char **argv)
{
struct rlimit rl; /* to bump up system limits for this process */
int *sockets; /* ptr to array of sockets */
int nsockets = 120000;
int c, i;

while ((c = getopt(argc, argv, "n:")) != -1) {
switch (c) {
case 'n':
nsockets = atoi(optarg);
break;
default:
fprintf(stderr, "Illegal argument \"%c\"\n", c);
exit(1);
}
}

rl.rlim_cur = rl.rlim_max = nsockets + 10;
if (setrlimit(RLIMIT_NOFILE, &rl) == -1) {
perror("setrlimit");
exit(1);
}

if ((sockets = (int *) malloc(nsockets * 2 * sizeof(int))) == NULL) {
perror("malloc");
exit(1);
}

for (i = 0; i < nsockets; i++) {
#ifdef BIND_ONLY
struct addrinfo *aitop, ai_hints = { .ai_family = AF_INET,
.ai_socktype = SOCK_STREAM, .ai_flags = AI_PASSIVE };
const char *addr = addrs[i % (sizeof(addrs) / sizeof (addrs[0]))];
const char *portstr = "0";

getaddrinfo(addr, portstr, &ai_hints, &aitop);

sockets[i] = socket(AF_INET, SOCK_STREAM, 0);

if (bind(sockets[i], aitop->ai_addr, aitop->ai_addrlen) == -1) {
fprintf(stderr, "Error binding %s, for %s : %d\n",
strerror(errno), addr, portstr);
} else
fprintf(stderr, "ok addr: %s, i: %d\n", addr, i);
#else
struct addrinfo *aitop, ai_hints = { .ai_family = AF_INET,
.ai_socktype = SOCK_STREAM, .ai_flags = AI_PASSIVE };
char portstr[20];

snprintf(portstr, sizeof(portstr), "%d", 8080 + (i % 4));
getaddrinfo("127.0.0.1", portstr, &ai_hints, &aitop); /* dst */
sockets[i] = socket(AF_INET, SOCK_STREAM, 0);

if (connect(sockets[i], aitop->ai_addr, aitop->ai_addrlen) == -1) {
fprintf(stderr, "Error connecting %s, for port %d\n",
strerror(errno), portstr);
}
#endif

free(aitop);
}

printf("%i socket pairs created, check memory. Sleeping 10 sec.\n", i);
sleep(10);

exit(0);
}

Thursday 23 October 2008

A Gazillion-user Comet Server With libevent, Part 0

Abstract
After reading an inspiring saga about building a Comet server with Erlang and Mochiweb, I inadvertently snowballed into making my own full-blown challenger using C and libevent. The results hint an order of magnitude increase of performance compared to state of the art open source servers. Comet is very important for Web 2.0 services, it reduces the amount of requests to the backend by the clients and brings real time updates. This is a description of the many frustrations and achievements doing this project. It will be posted in 4 installments, from 0 to 3.

Updates: Typos, spell check, thanks Stef/Nico.

Introduction

A recent post by Richard Jones (of last.fm fame) inspired me to start a comet server that scales well reviving old-school skills. In his post A Million-user Comet Application With Mochiweb Part 1 he presents a (mockup) prototype Erlang HTTP server. The goal of that project is to make a functional Comet server.
Gazillion (n.): Informal An indefinitely large number.
There is no working prototype on this first introductory instalment, hence the name "Part 0." But plenty of code, don't despair.

The Comet Problem

Since Comet is a push technology, most possible solutions rely on keeping an HTTP connection open because the server can't connect back to clients. It's a type of subscription model with some hacks on top. Current open source Comet servers can handle 10 to 20,000 simultaneous connections on a stock server. Most are written in Java, Python, and Erlang. On the same article the developers of Liberator, a closed source commercial server (C or C-something I guess), claim to be able to sustain up to a million client updates per second for 10,000 clients. Their site expands hinting it's running a daemon per core with client-side (Browser/Javascript) doing load balancing. All these figures were reported by those projects own developers. I couldn't find any independent benchmark. But they really sound like a good crowd, so I can take their words for it and you should too.

The scalability problem of AJAX, and now Comet, is a major problem for the adoption of web technologies. Imagine the dialog between a Javascript mail application and a server, with the client polling every X seconds:
Client: "Is there anything new?"
Server: "Not yet..."
Client: "Now?"
Server: "No..."
Client: "Are we there yet"
Server: "@#%@$^!" (HTTP 503 Service Unavailable)
Comet fixes that but pays the price of open connections. Word on the streets is around 50KB per open connection for Java/Python using careful programming, and don't even think on so many objects to write to the wire. Garbage Collection optimization can become your own private horror story.

So after all that introduction, this is my own multipart presentation of a (mockup) prototype. It should be an (ANSI/POSIX) C library and server using the fantastic libevent library (hi Niels!), and the popular libpcre regular expression engine library (more on later posts.) The goal is to crash the cool crowd party and show some old-school moves.

Among the many observations RJ makes, on his first installment he mentions:
The resident size of the mochiweb beam process with 10,000 active connections was 450MB - that’s 45KB per connection. CPU utilization on the machine was practically nothing, as expected.
(Edit: But on his second post he takes those numbers down to 8KB per user by tuning memory management. That is still about 8 GB for 1M users and without counting system resources!)

Scalability: Some Ballpark Math

To have an idea what to expect we need some ballpark calculations. This crude numbers would affect any kind of approach because it is the Operating System side. A starting point is finding out what happens with any given program when there are many sockets connected. With this little program we can see:
/*
Copyright (C) 2008 Alejo Sanchez
(Inspired on bench.c by Niels Provos)

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <sys/types.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/resource.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char **argv)
{
struct rlimit rl; /* to bump up system limits for this process */
int *pipes; /* pipe (pairs) memory block */
int *cp; /* traverse pipes */
int npipes;
int i, c;

npipes = 100000; /* default */
while ((c = getopt(argc, argv, "n:")) != -1) {
switch (c) {
case 'n':
npipes = atoi(optarg);
rl.rlim_cur = rl.rlim_max = npipes * 2 + 20;
break;
default:
fprintf(stderr, "Illegal argument \"%c\"\n", c);
exit(1);
}
}

if (setrlimit(RLIMIT_NOFILE, &rl) == -1) {
perror("setrlimit");
exit(1);
}

if ((pipes = (int *) malloc(npipes * 2 * sizeof(int))) == NULL) {
perror("malloc");
exit(1);
}

for (cp = pipes, i = 0; i < npipes; i++, cp += 2) {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, cp) == -1) {
perror("pipe");
exit(1);
}
}

printf("%i socket pairs created, check memory. Sleeping 1 sec.\n", i);
sleep(1);

exit(0);
}
A test with 200,000 sockets (note it's 100,000 pairs) showed a process size of 2MB, so far so good. But the command free showed about 210MB less free memory. It can make you think it is buffers and cache but those numbers didn't move. Repeated tests gave a very similar number and it had correlation with the amount of sockets created. The output of free wasn't useful, same with top. A bit of investigation showed this changes on /proc/meminfo:
MemTotal:      2041864 kB
MemFree: 1007248 kB
Buffers: 57744 kB
Cached: 400772 kB
[13 uninteresting lines]
Slab: 257196 kB
SReclaimable: 136784 kB
SUnreclaim: 120412 kB

MemTotal: 2041864 kB
MemFree: 1225060 kB
Buffers: 57744 kB
Cached: 400772 kB
[13 uninteresting lines]
Slab: 40612 kB
SReclaimable: 34020 kB
SUnreclaim: 6592 kB
The difference is about 217MB, that is around 1KB per connected socket. The Linux kernel takes a large amount of memory for connected sockets, it seems. This memory is initialized and ready for those sockets, but not yet in use. There is a good writeup about the slab allocator.

OS X Crashing, and Linux too

The operating system imposes some limits on the amount of open files. These and other limits can be modified by editing the file /etc/sysctl.conf on both Linux and OS X. The most important for our tests is fs.file-max (kern.maxfiles in OS X) as it controls the global maximum of open files (sockets included.) In Linux there is a per user limit to set in /etc/security/limits.conf:
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#domain type item value
alecco hard nofile 1001000

# End of file
To reload the configuration run sysctl -p changes. and re-login for limits.conf Then either the program has to increase the soft limits with setrlimit, or running ulimit -n unlimited on the shell before invoking the program.

Trying to make a million connected pipes both OS X and Linux crash-freezed. I'm probably missing something here as RC claims he got his prototype to do it (according to post title.) The maximum my Linux could handle was 400,000 and by those numbers some programs start to get killed.

I couldn't find configuration for the behaviour of the slab allocator. There should be a way to prevent it from eating so much memory, IMHO. But still there isn't clear evidence it is related to the crashes. Anyway, this is a fixable environment limit, the code clearly can scale as it never gets over 25MB. When RC gets to explain a bit more perhaps this will just be a non-issue.

With libevent and Linux the scalability of the building blocks should be O(log n) as they show. To get to a more realistic number a test with libevent's HTTP support was needed. In about an hour I wrote a simple 137 line server. To attack it what better than Apache's ab. Resources now jumped to a maximum of 21MB resident (25MB virtual) for 200,000 working connections, but once again the OS was showing ~450MB extra memory used (400,000 connected sockets as ab was running local.) But, lo an behold, the thingie was starting to take shape.
  • For 10,000 parallel clients hammering the server could answer at 44,000 requests per second (12,000 for OS X.)
  • For 10,000 parallel clients with a reconnect per request it was still high at 18,000 requests per second!
Not bad for a notebook, huh? On a single CPU core! The numbers were practically the same for repeated tests. Furthermore, libevent HTTP code does memory allocation all over the place and behaved much better than I expected (my hopes were for something around 10,000 req./sec.) Here's the code:

/*
Comet-c, a high performance Comet server.
Copyright (C) 2008 Alejo Sanchez

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <sys/types.h>

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <event.h>
#include <evhttp.h>

int debug = 0;

void
generic_request_handler(struct evhttp_request *req, void *arg)
{
struct evbuffer *evb = evbuffer_new();

/*
XXX add here code for managing non-subscription requests
*/
if (debug)
fprintf(stderr, "Request for %s from %s\n", req->uri, req->remote_host);

evbuffer_add_printf(evb, "blah blah");
evhttp_send_reply(req, HTTP_OK, "Hello", evb);
evbuffer_free(evb);
return;
}

/*
* Bayeux /meta handler
*/

void
bayeux_meta_handler_cb(struct evhttp_request *req, void *arg)
{
struct evbuffer *evb = evbuffer_new();

if (debug)
fprintf(stderr, "Request for %s from %s\n", req->uri, req->remote_host);
/*
XXX add here code for managing non-subscription requests
*/
evbuffer_add_printf(evb, "blah blah");
evhttp_send_reply(req, HTTP_OK, "Hello", evb);
evbuffer_free(evb);
return;
}

void
usage(const char *progname)
{
fprintf(stderr,
"%s: [-B] [-d] [-p port] [-l addr]\n"
"\t -B enable Bayeux support (on)\n"
"\t -d enable debug (off)\n"
"\t -l local address to bind comet server on (127.0.0.1)\n"
"\t -p port port number to create comet server on (8080)\n"
"\t (C) Alejo Sanchez - AGPL)\n",
progname);
}

int
main(int argc, char **argv)
{
extern char *optarg;
extern int optind;
short http_port = 8080;
char *http_addr = "127.0.0.1";
struct evhttp *http_server = NULL;
int c;
int bayeux = 1;

while ((c = getopt(argc, argv, "Bd:p:l:")) != -1)
switch(c) {
case 'B':
bayeux++;
break;
case 'd':
debug++;
break;
case 'p':
http_port = atoi(optarg);
if (http_port == 0) {
usage(argv[0]);
exit(1);
}
break;
case 'l':
http_addr = optarg;
break;
default:
usage(argv[0]);
exit(1);
}
argc -= optind;
argv += optind;

/* init libevent */
event_init();

http_server = evhttp_start(http_addr, http_port);
if (http_server == NULL) {
fprintf(stderr, "Error starting comet server on port %d\n",
http_port);
exit(1);
}

/* XXX bayeux /meta handler */
if (bayeux)
evhttp_set_cb(http_server, "/meta", bayeux_meta_handler_cb, NULL);

/* XXX default handler */
evhttp_set_gencb(http_server, generic_request_handler, NULL);

fprintf(stderr, "Comet server started on port %d\n", http_port);
event_dispatch(); /* Brooom, brooom */

exit(0); /* UNREACHED ? */
}

A Comet version of the server would surely improve on those amounts as the client doesn't need to pull from the server, each request is mostly server-side writes.

So that was a nice mockup but it's not a working prototype, yet. A prototype would manage registrations of clients to channels, perhaps using a standard transport protocol, and doing a little bit of this and that.

A report on the state of the art of Comet servers shows the most popular transport is
Bayeux. So this prototype can't skip that.

So I should just plug in one of the JSON C parsers and it should be OK, right? Wrong again. Just like Richard Dawkins described this situation:
[So, programming was] a classic addiction: prolonged frustration, occasionally rewarded by a briefly glowing fix of achievement. It was that pernicious "just one more push to see what's over the next mountain and then I'll call it a day" syndrome. It was a lonely vice, interfering with sleeping, eating, useful work and healthy human intercourse. I'm glad it's over and I won't start up again. Except ... perhaps one day, just a little ...
Let's just say those JSON implementations didn't live up to my expectations. But, what a time waster...

So, if we got this far, let's make a little Bayeux parser! How bad could it be?

Coming up: First prototype, trying to do decent parsing in C without killing performance (right), more analysis on the original saga by Richard Jones, and I hope, please, some, sleep... But, well, I'm typing and it's just a matter of alt-tab, so perhaps, let's see just a little bit more... Just the one...

Alecco

PS: Sorry about posting licenses, it is for the lack of warranty part mostly.
 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.