4 <title>Blërg Documentation</title>
5 <link rel="stylesheet" href="/css/doc.css">
6 <meta http-equiv="content-type" content="text/html; charset=utf8">
12 Blërg is a minimalistic tagged text document database engine that also
13 pretends to be a <a href="/">microblogging system</a>. It is designed
14 to efficiently store small (< 64K) pieces of text in a way that they
15 can be quickly retrieved by record number or by querying for tags
16 embedded in the text. Its native interface is HTTP — Blërg comes
17 as either a standalone HTTP server, or a CGI. Blërg is written in pure
21 <li><a href="#running">Running Blërg</a>
23 <li><a href="#getting_the_source">Getting the source</a></li>
24 <li><a href="#requirements">Requirements</a></li>
25 <li><a href="#configuring">Configuring</a></li>
26 <li><a href="#building">Building</a></li>
27 <li><a href="#installing">Installing</a></li>
30 <li><a href="#api">API</a>
32 <li><a href="#api_definitions">API Definitions</a></li>
33 <li><a href="#api_authorization">Authorizaton</a></li>
34 <li><a href="#api_create">/create - create a new user</a></li>
35 <li><a href="#api_login">/login - log in</a></li>
36 <li><a href="#api_logout">/logout - log out</a></li>
37 <li><a href="#api_put">/put - add a new record</a></li>
38 <li><a href="#api_get">/get/(user), /get/(user)/(start record)-(end record) - get records for a user</a></li>
39 <li><a href="#api_info">/info/(user) - Get information about a user</a></li>
40 <li><a href="#api_tag">/tag/(#|H|@)(tagname) - Retrieve records containing tags</a></li>
41 <li><a href="#api_subscribe">/subscribe/(user) - Subscribe to a user's updates</a></li>
42 <li><a href="#api_feed">/feed - Get updates for subscribed users</a></li>
43 <li><a href="#api_status">/status, /status/(user) - Get or clear general and user-specific status</a></li>
44 <li><a href="#api_passwd">/passwd - Change a user's password</a></li>
47 <li><a href="#libraries">Libraries</a>
49 <li><a href="#lib_c">C</a></li>
50 <li><a href="#lib_perl">Perl</a></li>
53 <li><a href="#design">Design</a>
55 <li><a href="#motivation">Motivation</a></li>
56 <li><a href="#web_app_stack">Web App Stack</a></li>
57 <li><a href="#database">Database</a></li>
58 <li><a href="#subscriptions">Subscriptions</a></li>
59 <li><a href="#problems">Problems and Future Work</a></li>
64 <h2><a name="running">Running Blërg</a></h2>
66 <h3><a name="getting_the_source">Getting the source</a></h3>
68 <p>There's no stable release yet, but you can get everything currently
69 running on blerg.dominionofawesome.com by cloning the git repository at
70 http://git.bytex64.net/blerg.git.
72 <h3><a name="requirements">Requirements</a></h3>
74 <p>Blërg has varying requirements depending on how you want to run it
75 — as a standalone HTTP server, or as a CGI. You will need:
78 <li><a href="http://lloyd.github.com/yajl/">yajl</a> >= 1.0.0 and < 2
79 (yajl is a JSON parser/generator written in C which, by some twisted
80 sense of humor, requires ruby to compile)</li>
83 <p>As a standalone HTTP, server, you will also need:
86 <li><a href="http://www.gnu.org/software/libmicrohttpd/">GNU libmicrohttpd</a> >= 0.9.3</li>
89 <p>Or, as a CGI, you will need:
92 <li><a href="http://www.newbreedsoftware.com/cgi-util/download/">cgi-util</a> >= 2.2.1</li>
95 <h3><a name="configuring">Configuring</a></h3>
97 <p>Edit libs.mk and put in the paths where you can find headers and
98 libraries for the above requirements.
100 <p>Also, further apologies to BSD folks — I've probably committed
101 several unconscious Linux-isms. It would not surprise me if the
102 makefile refuses to work with BSD make, or if it fails to compile even
103 with gmake. If you have patches or suggestions on how to make Blërg
104 more portable, I'd be happy to hear them.
106 <h3><a name="building">Building</a></h3>
108 <p>At this point, it should be gravy. Type 'make' and in a few seconds,
109 you should have <code>blerg.httpd</code>, <code>blerg.cgi</code>,
110 <code>rss.cgi</code>, and <code>blergtool</code>. Each of those can be
111 made individually as well, if you, for example, don't want to install
112 the prerequisites for <code>blerg.httpd</code> or
113 <code>blerg.cgi</code>.
115 <p><strong>NOTE</strong>: blerg.httpd is deprecated and will not be
116 updated with new features.
118 <h3><a name="installing">Installing</a></h3>
120 <p>While it's not strictly required, Blërg will be easier to set up if
121 you configure it to work from the root of your website. For this
122 reason, it's better to use a subdomain (i.e., blerg.yoursite.com is
123 easier than yoursite.com/blerg/). If you do want to put it in a
124 subdirectory, you will have to modify <code>www/js/blerg.js</code> and
125 change baseURL at the top as well as a number of other self-references
126 in that file and <code>www/index.html</code>.
128 <p>You cannot serve the database and client from different domains
129 (i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and
130 bar.yoursite.com). This is a requirement of the web browser — the
131 same origin policy will not allow an AJAX request to travel across
132 domains (though you can probably get around it these days with <a
133 href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">Cross-origin
134 resource sharing</a>).
136 <h4>For straight CGI with Apache</h4>
138 <p>Copy the files in www/ to the root of your web server. Copy
139 <code>blerg.cgi</code> to your web server. Included in www-configs/ is
140 a .htaccess file for Apache that will rewrite the URLs. If you need to
141 call the CGI something other than <code>blerg.cgi</code>, the .htaccess
142 file will need to be modified.
146 <p>Nginx can't run CGI directly, and there's currently no FastCGI
147 version of Blërg, so you will have to run it under some kind of CGI to
148 FastCGI gateway, like the one described <a
149 href="http://wiki.nginx.org/SimpleCGI">here on the nginx wiki</a>. This
150 pretty much destroys the performance of Blërg, but it's all we've got
153 <h4>The extra RSS CGI</h4>
155 <p>There is an optional RSS cgi (<code>rss.cgi</code>) that will serve
156 RSS feeds for users. Install this like <code>blerg.cgi</code> above.
157 As of 1.9.0, this is a perl FastCGI script, so you will have to make
158 sure the perl libraries are available to it. A good way of doing that
159 is to install to an environment directory, as described below.
161 <h4>Installing to an environment directory</h4>
163 <p>The Makefile has support for installing Blërg into a directory that
164 includes tools, libraries, and configuration snippets for shell and web
165 servers. Use it as <code>make install-environment
166 ENV_DIR=<directory></code>. Under <directory>/etc will be
167 a shell script that sets environment variables, and configuration
168 snippets for nginx and apache to do the same. This should make it
169 somewhat easier to use Blërg in a self-contained way.
171 <p>For example, this will install Blërg to an environment directory
172 inside your home directory:
174 <pre>user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env
176 user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh
179 <p>Then, you will be able to run tools like <code>blergtool</code>, and
180 it will operate on data inside <code>~/blerg-env/data</code>. Likewise,
182 <code>/home/user/blerg-env/etc/nginx-fastcgi-vars.conf</code> or
183 <code>/home/user/blerg-env/etc/apache-setenv.conf</code> in your
184 webserver to make the CGI/FastCGI scripts to the same thing.
187 <h2><a name="api">API</a></h2>
189 <p>Blërg's API was designed to be as simple as possible. Data sent from
190 the client is POSTed with the application/x-www-form-urlencoded
191 encoding, and a successful response is always JSON. The API endpoints
192 will be described as though the server were serving requests from the
195 <h3><a name="api_definitions">API Definitions</a></h3>
197 <p>On failure, all API calls return either a standard HTTP error
198 response, like 404 Not Found if a record or user doesn't exist, or a 200
199 response with a 'JSON failure', which will look like this:
201 <pre>{"status": "failure"}</pre>
203 <p>Blërg doesn't currently explain <i>why</i> there is a failure, and
204 I'm not sure it ever will.
206 <p>On success, you'll either get some JSON relating to your request (for
207 /get, /tag, or /info), or a 'JSON success' response (for /create, /put,
208 /login, or /logout), which looks like this:
210 <pre>{"status": "success"}</pre>
212 <p>For the CGI backend, you may get a 500 error if something goes wrong.
213 For the HTTP backend, you'll get nothing (since it will have crashed),
214 or maybe a 502 Bad Gateway if you have it behind another web server.
216 <p>All usernames must be 32 characters or less. Usernames must contain
217 only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
218 Passwords can be at most 64 bytes, and have no limits on characters (but
219 beware: if you have a null in the middle, it will stop checking there
220 because I use <code>strncmp(3)</code> to compare).
222 <p>Tags must be 64 characters or less, and can contain only the ASCII
223 characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
225 <h3><a name="api_authorization">Authorization</a></h3>
227 <p>As the result of a successful <a href="#api_login">login</a>, the server
228 will send back a cookie named <code>auth</code>. This cookie authorizes
229 restricted requests, and must be sent for any API endpoint marked <span
230 class="feature">authorization</span>, or else you will get a 403 Forbidden
231 response. The cookie format looks like:
233 auth=username/abcdef0123456789abcdef0123456789
235 That is a username, a forward slash, and 32 hexadecimal digits which denote the
236 "token" identifying the session. On logout, the server will invalidate the
237 token and expire the cookie.
239 <h3><a name="api_create">/create</a> - create a new user</a></h3>
241 <p>To create a user, POST to /create with <code>username</code> and
242 <code>password</code> parameters for the new user. The server will
243 respond with JSON failure if the user exists, or if the user can't be
244 created for some other reason. The server will respond with JSON
245 success if the user is created.
247 <h3><a name="api_login">/login</a> - log in</a></h3>
249 <p>POST to /login with the <code>username</code> and
250 <code>password</code> parameters for an existing user. The server will
251 respond with JSON failure if the user does not exist or if the password
252 is incorrect. On success, the server will respond with JSON success,
253 and will set a cookie named 'auth' that must be sent by the client when
254 accessing restricted API functions (See <a
255 href="#api_authorization">Authorization</a> above).
257 <h3><a name="api_logout">/logout</a> - log out</a></h3>
258 <div class="feature">authorization</div>
260 <p>POST to /logout. The server will respond with JSON failure if the
261 user does not exist or if the request is unauthorized. The server will
262 respond with JSON success after the user is successfully logged out.
264 <h3><a name="api_put">/put</a> - add a new record</a></h3>
265 <div class="feature">authorization</div>
267 <p>POST to /put with a <code>data</code> parameter. The server will
268 respond with JSON failure if the request is unauthorized, if the user
269 doesn't exist, or if <code>data</code> contains more than 65535 bytes
270 <i>after</i> URL decoding. The server will respond with JSON success
271 after the record is successfully added.
273 <h3><a name="api_get">/get/(user), /get/(user)/(start record)-(end record)</a> - get records for a user</a></h3>
275 <p>A GET request to /get/(user), where (user) is the user desired, will
276 return the last 50 records for that user in a list of objects. The
277 record objects look like this:
282 "timestamp":1294309438,
283 "data":"eatin a taco on fifth street"
287 <p><code>record</code> is the record number, <code>timestamp</code> is
288 the UNIX epoch timestamp (i.e., the number of seconds since Jan 1 1970
289 00:00:00 GMT), and <code>data</code> is the content of the record. The
290 record number is sent as a string because while Blërg supports record
291 numbers up to 2<sup>64</sup> - 1, Javascript uses floating point for all
292 its numbers, and can only support integers without truncation up to
293 2<sup>53</sup>. This difference is largely academic, but I didn't want
294 this problem to sneak up on anyone who is more insane than I am. :]
296 <p>The second form, /get/(user)/(start record)-(end record), retrieves a
297 specific range of records, from (start record) to (end record)
298 inclusive. You can retrieve at most 100 records this way. If (end
299 record) - (start record) specifies more than 100 records, or if the
300 range specifies invalid records, or if the end record is before the
301 start record, the server will respond with JSON failure.
303 <h3><a name="api_info">/info/(user)</a> - Get information about a user</a></h3>
305 <p>A GET request to /info/(user) will return a JSON object with
306 information about the user (currently only the number of records). The
307 info object looks like this:
311 "record_count": "544"
315 <p>Again, the record count is sent as a string for 64-bit safety.
317 <h3><a name="api_tag">/tag/(#|H|@)(tagname)</a> - Retrieve records containing tags</a></h3>
319 <p>A GET request to this endpoint will return the last 50 records
320 associated with the given tag. The first character is either # or H for
321 hashtags, or @ for mentions (I call them ref tags). You should URL
322 encode the # or @, lest some servers complain at you. The H alias for #
323 was created because Apache helpfully strips the fragment of a URL
324 (everything from the # to the end) before handing it off to the CGI,
325 even if the hash is URL encoded. The record objects also contain an
326 extra <code>author</code> field, like so:
332 "timestamp":1294555793,
333 "data":"I'm taking #garfield to the vet."
337 <p>There is currently no support for getting more than 50 tags, but /tag
338 will probably mutate to work like /get.
340 <h3><a name="api_subscribe">/subscribe/(user)</a> - Subscribe to a
341 user's updates</a></h3>
342 <div class="feature">authorization</div>
344 <p>POST to /subscribe/(user) with a <code>subscribed</code> parameter
345 that is either "true" or "false", indicating whether (user) should be
346 subscribed to or not. The server will respond with JSON failure if the
347 request is unauthorized or if the user doesn't exist. The server will
348 respond with JSON success after the subscription request is successfully
351 <h3><a name="api_feed">/feed</a> - Get updates for subscribed users</h3>
352 <div class="feature">authorization</div>
354 <p>POST to /feed, with a <code>username</code> parameter and an auth
355 cookie. The server will respond with a JSON list of the last 50 updates
356 from all subscribed users, in reverse chronological order. Fetching
357 /feed does not reset the new message count returned from /status. To do
358 that, look at <a href="#api_status">POST /status</a>.
360 <p>NOTE: subscription notifications are only stored while subscriptions
361 are active. Any records inserted before or after a subscription is
362 active will not show up in /feed.
364 <h3><a name="api_status">/status, /status/(user)</a> - Get or clear
365 general and user-specific status</a></h3>
366 <div class="feature">authorization</div>
368 <p>GET to /status to get information about your account. It tells you
369 the number of new subscription records since the last time the
370 subscription counter was reset, and a flag for whether the account was
371 mentioned since the last time the mention flag was cleared. The server
372 will respond with a JSON object:
381 <p>POST to /status with a <code>clear</code> parameter that is either
382 "feed" or "mentioned" to reset either the subscription counter or the
383 mention flag, respectively. There is not currently a way to clear both
384 with a single request. The server will respond with JSON success.
386 <p>GET to /status/(user) to get subscription information for a
387 particular user. The server will respond with a simple JSON object:
393 <p>The value of "subscribed" will be either true or false depending on
394 the subscription status.
396 <h3><a name="api_passwd">/passwd</a> - Change a user's password</a></h3>
397 <div class="feature">authorization</div>
399 <p>POST to /passwd with <code>password</code> and
400 <code>new_password</code> parameters to change the user's password. For
401 extra protection, changing a password requires sending the user's
402 current password in the <code>password</code> parameter. If
403 authentication is successful and the password matches, the user's
404 password is set to <code>new_password</code> and the server responds
407 If the password doesn't match, or one of <code>password</code> or
408 <code>new_password</code> are missing, the server returns JSON failure.
410 <h2><a name="libraries">Libraries</a></h2>
412 <h3><a name="lib_c">C</a></h3>
414 <p>Most of Blërg's core functionality is packaged in a static library
415 called <code>blerg.a</code>. It's not designed to be public or
416 installed with `make install-environment`, but it should be relatively
417 straightforward to use it in C programs. Look at the headers under the
418 <code>database</code> directory.
420 <p>A secondary library called <code>blerg_auth.a</code> handles the
421 authentication layer of Blërg. To use it, look at
422 <code>common/auth.h</code>.
424 <h3><a name="lib_perl">Perl</a></h3>
426 <p>As of 1.9.0, Blërg includes a perl library called
427 <code>Blerg::Database</code>. It wraps the core and authentication
428 functionality in a perlish interface. The module has its own POD
429 documentation, which you can read with your favorite POD reader, from
430 the manual installed in an environment directory, or in HTML <a
431 href="perl/Blerg-Database.html">here</a>.
433 <h2><a name="design">Design</a></h2>
435 <h3><a name="motivation">Motivation</a></h3>
437 <p>Blërg was created as the result of a thought experiment: "What if
438 Twitter didn't need thousands of servers? What if its millions of users
439 could be handled by a single highly efficient server?" This is probably
440 an unreachable goal due to the sheer amount of I/O, but we can certainly
441 try to do better. Blërg was thus designed as a system with very simple
445 <li>Store and fetch small chunks of text efficiently</li>
446 <li>Create fast indexes for hash tags and @ mentions</li>
447 <li>Provide a HTTP interface web apps can use</li>
450 <p>And to further simplify, I didn't bother handling deletes, full text
451 search, or more complicated tag searches. Blërg only does the basics.
453 <h3><a name="web_app_stack">Web App Stack</a></h3>
455 <table class="pizzapie">
456 <tr><th>Classical model</th></tr>
458 <td style="background-color: blue; color: white"><b>Client App</b><br>HTML/Javascript</td>
461 <td style="background-color: #9F0000; color: white"><b>Webserver</b><br>Apache, lighttpd, nginx, etc.</td>
464 <td style="background-color: #009F00; color: white"><b>Server App</b><br>Python, Perl, Ruby, etc.</td>
467 <td style="background-color: #404040; color: white"><b>Database</b><br>MySQL, PostgreSQL, MongoDB, CouchDB, etc.</td>
471 <p>Modern web applications have at least a four-layer approach. You
472 have the client-side browser app, the web server, the server-side
473 application, and the database. Your data goes through a lot of layers
474 before it actually resides on disk somewhere (or, as they're calling it
475 these days, "The Cloud" *waves hands*). Each of those layers requires
476 some amount of computing resources, so to increase throughput, we must
477 make the layers more efficient, or reduce the number of layers.
479 <table class="pizzapie">
480 <tr><th>Blërg model</th></tr>
482 <td style="background-color: blue; color: white"><b>Blërg Client App</b><br>HTML/Javascript</td>
485 <td style="background-color: #404040; color: white"><b>Blërg Database</b><br>Fuckin' hardcore C and shit</td>
489 <p>Blërg does both by smashing the last two or three layers into one
490 application. Blërg can be run as either a standalone web server
491 (currently deprecated because maintaining two versions is hard), or as a
492 CGI (FastCGI support is planned, but I just don't care right now). Less
493 waste, more throughput. As a consequence of this, the entirety of the
494 application logic that the user sees is implemented in the client app in
495 Javascript. That's why all the URLs have #'s — the page is loaded
496 once and switched on the fly to show different views, further reducing
497 load on the server. Even parsing hash tags and URLs are done in client
500 <p>The API is simple and pragmatic. It's not entirely RESTful, but is
501 rather designed to work well with web-based front-ends. Client data is
502 always POSTed with the usual application/x-www-form-urlencoded encoding,
503 and server data is always returned in JSON format.
505 <p>The HTTP interface to the database idea has already been done by <a
506 href="http://couchdb.apache.org/">CouchDB</a>, though I didn't know that
507 until after I wrote Blërg. :)
509 <h3><a name="database">Database</a></h3>
511 <p>I was impressed by <a
512 href="http://www.varnish-cache.org/">varnish</a>'s design, so I decided
513 early in the design process that I'd try out mmaped I/O. Each user in
514 Blërg has their own database, which consists of a metdata file, and one
515 or more data and index files. The data and index files are memory
516 mapped, which hopefully makes things more efficient by letting the OS
517 handle when to read from disk (or maybe not — I haven't
518 benchmarked it). The index files are preallocated because I believe
519 it's more efficient than writing to it 40 bytes at a time as records are
520 added. The database's limits are reasonable:
522 <table class="statistics">
523 <tr><td>maximum record size</td><td>65535 bytes</td></tr>
524 <tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1</td></tr>
525 <tr><td>maximum number of tags per record</td><td>1024</td></tr>
528 <p>So as not to create grossly huge and unwieldy data files, the
529 database layer splits data and index files into many "segments"
530 containing at most 64K entries each. Those of you doing some quick
531 mental math may note that this could cause a problem on 32-bit machines
532 — if a full segment contains entries of the maximum length, you'll
533 have to mmap 4GB (32-bit Linux gives each process only 3GB of virtual
534 address space). Right now, 32-bit users should change
535 <code>RECORDS_PER_SEGMENT</code> in <code>config.h</code> to something
536 lower like 32768. In the future, I might do something smart like not
537 mmaping the whole fracking file.
539 <table class="bitstructure">
540 <tr><th>Record Index Structure</th></tr>
541 <tr><td class="B4">offset (32-bit integer)</td></tr>
542 <tr><td class="B2">length (16-bit integer)</td></tr>
543 <tr><td class="B2">flags (16-bit integer)</td></tr>
544 <tr><td class="B4">timestamp (32-bit integer)</td></tr>
547 <p>A record is stored by first appending the data to the data file, then
548 writing an entry in the index file containing the offset and length of
549 the data, as well as the timestamp. Since each index entry is fixed
550 length, we can find the index entry simply by multiplying the record
551 number we want by the size of the index entry. Upshot: constant-time
552 random-access reads and constant-time writes. As an added bonus,
553 because we're using append-only files, we get lockless reads.
555 <table class="bitstructure">
556 <tr><th>Tag Structure</th></tr>
557 <tr><td class="B32">username (32 bytes)</td></tr>
558 <tr><td class="B8">record number (64-bit integer)</td></tr>
561 <p>Tags are handled by a separate set of indices, one per tag. When a
562 record is added, it is scanned for tags, then entries are appended to
563 each tag index for the tags found. Each index record simply stores the
564 user and record number. Tags are searched by opening the tag file,
565 reading the last 50 entries or so, and then reading all the records
566 listed. Voila, fast tag lookups.
568 <p>At this point, you're probably thinking, "Is that it?" Yep, that's
569 it. Blërg isn't revolutionary, it's just a system whose requirements
570 were pared down until the implementation could be made dead simple.
572 <p>Also, keeping with the style of modern object databases, I haven't
573 implemented any data safety (har har). Blërg does not sync anything to
574 disk before returning success. This should make Blërg extremely fast,
575 and totally unreliable in a crash. But that's the way you want it,
578 <h3><a name="subscriptions">Subscriptions</a></h3>
580 <p>When I first started thinking about the idea of subscriptions, I
581 immediately came up with the naïve solution: keep a list of users to
582 which users are subscribed, then when you want to get updates, iterate
583 over the list and find the last entries for each user. And that would
584 work, but it's kind of costly in terms of disk I/O. I have to visit
585 each user in the list, retrieve their last few entries, and store them
586 somewhere else to be sorted later. And worse, that computation has to
587 be done every time a user checks their feed. As the number of users and
588 subscriptions grows, that will become a problem.
590 <p>So instead, I thought about it the other way around. Instead of doing
591 all the work when the request is received, Blërg tries to do as much as
592 possible by "pushing" updates to subscribed users. You can think of it
593 kind of like a mail system. When a user posts new content, a
594 notification is "sent" out to each of that user's subscribers. Later,
595 when the subscribers want to see what's new, they simply check their
596 mailbox. Checking your mailbox is usually a lot more efficient than
597 going around and checking everyone's records yourself, even with the
598 overhead of the "mailman."
600 <p>The "mailbox" is a subscription index, which is identical to a tag
601 index, but is a per-user construct. When a user posts a new record, a
602 subscription index record is written for every subscriber. It's a
603 similar amount of I/O as the naïve version above, but the important
604 difference is that it's only done once. Retrieving records for accounts
605 you're subscribed to is then as simple as reading your subscription
606 index and reading the associated records. This is hopefully less I/O
607 than the naïve version, since you're reading, at most, as many accounts
608 as you have records in the last N entries of your subscription index,
609 instead of all of them. And as an added bonus, since subscription index
610 records are added as posts are created, the subscription index is
611 automatically sorted by time! To support this "mail" architecture, we
612 also keep a list of subscribers and subscrib...ees in each account.
614 <h3><a name="problems">Problems, Caveats, and Future Work</a></h3>
616 <p>Blërg probably doesn't actually work like Twitter because I've never
617 actually had a Twitter account.
619 <p>I couldn't find a really good fast HTTP server library.
620 Libmicrohttpd is small, but it's focused on embedded applications, so it
621 often eschews speed for small memory footprint. This is especially
622 apparent when you watch it chew through a POST request 300 bytes at a
623 time even though you've specified a buffer size of 256K.
624 <code>blerg.httpd</code> is still pretty fast this way — on my
626 href="http://www.joedog.org/index/siege-home">siege</a> says it serves a
627 690-byte /get request at about 945 transactions per second, average
628 response time 0.05 seconds, with 100 concurrent accesses — but a
629 fast HTTP server implementation could knock this out of the park.
631 <p>Libmicrohttpd is also really difficult to work with. If you look at
632 the code, <code>http_blerg.c</code> is about 70% longer than
633 <code>cgi_blerg.c</code> simply because of all the iterator hoops I had
634 to jump through to process POST requests. And if you can believe it, I
635 wrote <code>http_blerg.c</code> first. If I'd done it the other way
636 around, I probably would have given up on libmicrohttpd. :-/
638 <p>The data structures written to disk are dependent on the size and
639 endianness of the primitive data types on your architecture and OS.
640 This means that the databases are not portable. A dump/import tool is
641 probably the easiest way to handle this.
643 <p>I do want to make a FastCGI version eventually, and this will
644 probably be a rather simple modification of cgi_blerg.
646 <p>Implementing deletes will be... interesting. There is room in the
647 record index for a 'deleted' flag, but the problem is deleting any tags
648 referenced in the data. This requires rescanning the record content and
649 putting a 'deleted' flag in the tag indices. This will not be pretty,
650 so I'm just going to ignore it and hope nobody makes any mistakes. ;]
652 <p>Tag indices can grow arbitrarily large, which will cause problems for
653 32-bit machines around the 3GB mark. Still, that's something like 80
654 million tags, so maybe it's not something to worry about.
656 <p>The API currently requires the client to transmit the user's password
657 in the clear. A digest-based authentication scheme would be better,
658 though for real security, the app should run over HTTPS.