www/doc/index.html

   1 <!DOCTYPE html>
   2 <html>
   3 <head>
   4 <title>Blërg Documentation</title>
   5 <link rel="stylesheet" href="/css/doc.css">
   6 </head>
   7 <body>
   8
   9 <h1>Blërg</h1>
  10
  11 Blërg is a minimalistic tagged text document database engine that also
  12 pretends to be a <a href="/">microblogging system</a>.  It is designed
  13 to efficiently store small (&lt; 64K) pieces of text in a way that they
  14 can be quickly retrieved by record number or by querying for tags
  15 embedded in the text.  Its native interface is HTTP &mdash; Blërg comes
  16 as either a standalone HTTP server, or a CGI.  Blërg is written in pure
  17 C.
  18
  19 <ul class="toc">
  20   <li><a href="#installing">Installing</a>
  21     <ul>
  22       <li><a href="#getting_the_source">Getting the source</a></li>
  23       <li><a href="#requirements">Requirements</a></li>
  24       <li><a href="#configuring">Configuring</a></li>
  25       <li><a href="#building">Building</a></li>
  26       <li><a href="#installing">Installing</a></li>
  27     </ul>
  28   </li>
  29   <li><a href="#api">API</a>
  30     <ul>
  31       <li><a href="#api_definitions">API Definitions</a></li>
  32       <li><a href="#api_create">/create - create a new user</a></li>
  33       <li><a href="#api_login">/login - log in</a></li>
  34       <li><a href="#api_logout">/logout - log out</a></li>
  35       <li><a href="#api_put">/put - add a new record</a></li>
  36       <li><a href="#api_get">/get/(user), /get/(user)/(start record)-(end record) - get records for a user</a></li>
  37       <li><a href="#api_info">/info/(user) - Get information about a user</a></li>
  38       <li><a href="#api_tag">/tag/(#|H|@)(tagname) - Retrieve records containing tags</a></li>
  39       <li><a href="#api_subscribe">/subscribe/(user) - Subscribe to a user's updates</a></li>
  40       <li><a href="#api_unsubscribe">/unsubscribe/(user) - Unsubscribe from a user's updates</a></li>
  41       <li><a href="#api_feed">/feed - Get updates for subscribed users</a></li>
  42       <li><a href="#api_feedinfo">/feedinfo, /feedinfo/(user) - Get subscription status</a></li>
  43     </ul>
  44   </li>
  45   <li><a href="#design">Design</a>
  46     <ul>
  47       <li><a href="#motivation">Motivation</a></li>
  48       <li><a href="#web_app_stack">Web App Stack</a></li>
  49       <li><a href="#database">Database</a></li>
  50       <li><a href="#subscriptions">Subscriptions</a></li>
  51       <li><a href="#problems">Problems and Future Work</a></li>
  52     </ul>
  53   </li>
  54 </ul>
  55
  56 <h2><a name="installing">Installing</a></h2>
  57
  58 <h3><a name="getting_the_source">Getting the source</a></h3>
  59
  60 <p>There's no stable release yet, but you can get everything currently
  61 running on blerg.dominionofawesome.com by cloning the git repository at
  62 http://git.bytex64.net/blerg.git.
  63
  64 <h3><a name="requirements">Requirements</a></h3>
  65
  66 <p>Blërg has varying requirements depending on how you want to run it
  67 &mdash; as a standalone HTTP server, or as a CGI.  You will need:
  68
  69 <ul>
  70 <li><a href="http://lloyd.github.com/yajl/">yajl</a> &gt;= 1.0.0
  71 (yajl is a JSON parser/generator written in C which, by some twisted
  72 sense of humor, requires ruby to compile)</li>
  73 </ul>
  74
  75 <p>As a standalone HTTP, server, you will also need:
  76
  77 <ul>
  78 <li><a href="http://www.gnu.org/software/libmicrohttpd/">GNU libmicrohttpd</a> &gt;= 0.9.3</li>
  79 </ul>
  80
  81 <p>Or, as a CGI, you will need:
  82
  83 <ul>
  84 <li><a href="http://www.newbreedsoftware.com/cgi-util/download/">cgi-util</a> &gt;= 2.2.1</li>
  85 </ul>
  86
  87 <h3><a name="configuring">Configuring</a></h3>
  88
  89 <p>There is now an experimental autoconf build system.  If you run
  90 <code>add-autoconf</code>, it'll do the magic and create a
  91 <code>configure</code> script that'll do the familiar things.  If I ever
  92 get around to distributing source packages, you should find that this
  93 has already been done.
  94
  95 <p>If you'd rather stick with the manual system, you should edit libs.mk
  96 and put in the paths where you can find headers and libraries for the
  97 above requirements.
  98
  99 <p>Also, further apologies to BSD folks &mdash; I've probably committed
 100 several unconscious Linux-isms.  It would not surprise me if the
 101 makefile refuses to work with BSD make, or if it fails to compile even
 102 with gmake.  If you have patches or suggestions on how to make Blërg
 103 more portable, I'd be happy to hear them.
 104
 105 <h3><a name="building">Building</a></h3>
 106
 107 <p>At this point, it should be gravy.  Type 'make' and in a few seconds,
 108 you should have <code>blerg.httpd</code>, <code>blerg.cgi</code>,
 109 <code>rss.cgi</code>, and <code>blergtool</code>.  Each of those can be
 110 made individually as well, if you, for example, don't want to install
 111 the prerequisites for <code>blerg.httpd</code> or
 112 <code>blerg.cgi</code>.
 113
 114 <p><strong>NOTE</strong>: blerg.httpd is deprecated and will not be
 115 updated with new features.
 116
 117 <h3><a name="installing">Installing</a></h3>
 118
 119 <p>While it's not strictly required, Blërg will be easier to set up if
 120 you configure it to work from the root of your website.  For this
 121 reason, it's better to use a subdomain (i.e., blerg.yoursite.com is
 122 easier than yoursite.com/blerg/).  If you do want to put it in a
 123 subdirectory, you will have to modify <code>www/js/blerg.js</code> and
 124 change baseURL at the top as well as a number of other self-references
 125 in that file and <code>www/index.html</code>.  The CGI version should
 126 work fine this way, but the HTTP version will require the request to be
 127 rewritten, as it expects to be serving from the root.
 128
 129 <p>You cannot serve the database and client from different domains
 130 (i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and
 131 bar.yoursite.com).  This is a requirement of the web browser &mdash; the
 132 same origin policy will not allow an AJAX request to travel across
 133 domains.
 134
 135 <h4>For the standalone web server:</h4>
 136
 137 <p>Right now, <code>blerg.httpd</code> doesn't serve any static assets,
 138 so you're going to have to put it behind a real webserver like apache,
 139 lighttpd, nginx, or similar.  Set the document root to the www
 140 directory, then proxy /info, /create, /login, /logout, /get, /tag, and
 141 /put to blerg.httpd.  You can change the port <code>blerg.httpd</code>
 142 listens on in <code>config.h</code>.
 143
 144 <h4>For the CGI version:</h4>
 145
 146 <p>Copy the files in www/ to the root of your web server.  Copy
 147 <code>blerg.cgi</code> to your web server.  Included in www-configs/ is
 148 a .htaccess file for Apache that will rewrite the URLs.  If you need to
 149 call the CGI something other than <code>blerg.cgi</code>, the .htaccess
 150 file will need to be modified.
 151
 152 <h4>The extra RSS CGI</h4>
 153
 154 <p>There is an optional RSS cgi (<code>rss.cgi</code>) that will serve
 155 RSS feeds for users.  Install this like <code>blerg.cgi</code> above.
 156
 157
 158 <h2><a name="api">API</a></h2>
 159
 160 <p>Blërg's API was designed to be as simple as possible.  Data sent from
 161 the client is POSTed with the application/x-www-form-urlencoded
 162 encoding, and a successful response is always JSON.  The API endpoints
 163 will be described as though the server were serving requests from the
 164 root of the wesite.
 165
 166 <h3><a name="api_definitions">API Definitions</a></h3>
 167
 168 <p>On failure, all API calls return either a standard HTTP error
 169 response, like 404 Not Found if a record or user doesn't exist, or a 200
 170 response with a 'JSON failure', which will look like this:
 171
 172 <p><code>{"status": "failure"}</code>
 173
 174 <p>Blërg doesn't currently explain <i>why</i> there is a failure, and
 175 I'm not sure it ever will.
 176
 177 <p>On success, you'll either get some JSON relating to your request (for
 178 /get, /tag, or /info), or a 'JSON success' response (for /create, /put,
 179 /login, or /logout), which looks like this:
 180
 181 <p><code>{"status": "success"}</code>
 182
 183 <p>For the CGI backend, you may get a 500 error if something goes wrong.
 184 For the HTTP backend, you'll get nothing (since it will have crashed),
 185 or maybe a 502 Bad Gateway if you have it behind another web server.
 186
 187 <p>All usernames must be 32 characters or less.  Usernames must contain
 188 only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
 189 Passwords can be at most 64 bytes, and have no limits on characters (but
 190 beware: if you have a null in the middle, it will stop checking there
 191 because I use <code>strncmp(3)</code> to compare).
 192
 193 <p>Tags must be 64 characters or less, and can contain only the ASCII
 194 characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
 195
 196 <h3><a name="api_create">/create</a> - create a new user</a></h3>
 197
 198 <p>To create a user, POST to /create with <code>username</code> and
 199 <code>password</code> parameters for the new user.  The server will
 200 respond with JSON failure if the user exists, or if the user can't be
 201 created for some other reason.  The server will respond with JSON
 202 success if the user is created.
 203
 204 <h3><a name="api_login">/login</a> - log in</a></h3>
 205
 206 <p>POST to /login with the <code>username</code> and
 207 <code>password</code> parameters for an existing user.  The server will
 208 respond with JSON failure if the user does not exist or if the password
 209 is incorrect.  On success, the server will respond with JSON success,
 210 and will set a cookie named 'auth' that must be sent by the client when
 211 accessing restricted API functions (/put and /logout).
 212
 213 <h3><a name="api_logout">/logout</a> - log out</a></h3>
 214
 215 <p>POST to /logout with with <code>username</code>, the user to log out,
 216 along with the auth cookie in a Cookie header.  The server will respond
 217 with JSON failure if the user does not exist or if the auth cookie is
 218 bad.  The server will respond with JSON success after the user is
 219 successfully logged out.
 220
 221 <h3><a name="api_put">/put</a> - add a new record</a></h3>
 222
 223 <p>POST to /put with <code>username</code> and <code>data</code>
 224 parameters, and an auth cookie.  The server will respond with JSON
 225 failure if the auth cookie is bad, if the user doesn't exist, or if
 226 <code>data</code> contains more than 65535 bytes <i>after</i> URL
 227 decoding.  The server will respond with JSON success after the record is
 228 successfully added.
 229
 230 <h3><a name="api_get">/get/(user), /get/(user)/(start record)-(end record)</a> - get records for a user</a></h3>
 231
 232 <p>A GET request to /get/(user), where (user) is the user desired, will
 233 return the last 50 records for that user in a list of objects.  The
 234 record objects look like this:
 235
 236 <pre>
 237 {
 238   "record":"0",
 239   "timestamp":1294309438,
 240   "data":"eatin a taco on fifth street"
 241 }
 242 </pre>
 243
 244 <p><code>record</code> is the record number, <code>timestamp</code> is
 245 the UNIX epoch timestamp (i.e., the number of seconds since Jan 1 1970
 246 00:00:00 GMT), and <code>data</code> is the content of the record.  The
 247 record number is sent as a string because while Blërg supports record
 248 numbers up to 2<sup>64</sup> - 1, Javascript uses floating point for all
 249 its numbers, and can only support integers without truncation up to
 250 2<sup>53</sup>.  This difference is largely academic, but I didn't want
 251 this problem to sneak up on anyone who is more insane than I am. :]
 252
 253 <p>The second form, /get/(user)/(start record)-(end record), retrieves a
 254 specific range of records, from (start record) to (end record)
 255 inclusive.  You can retrieve at most 100 records this way.  If (end
 256 record) - (start record) specifies more than 100 records, or if the
 257 range specifies invalid records, or if the end record is before the
 258 start record, the server will respond with JSON failure.
 259
 260 <h3><a name="api_info">/info/(user)</a> - Get information about a user</a></h3>
 261
 262 <p>A GET request to /info/(user) will return a JSON object with
 263 information about the user (currently only the number of records).  The
 264 info object looks like this:
 265
 266 <pre>
 267 {
 268   "record_count": "544"
 269 }
 270 </pre>
 271
 272 <p>Again, the record count is sent as a string for 64-bit safety.
 273
 274 <h3><a name="api_tag">/tag/(#|H|@)(tagname)</a> - Retrieve records containing tags</a></h3>
 275
 276 <p>A GET request to this endpoint will return the last 50 records
 277 associated with the given tag.  The first character is either # or H for
 278 hashtags, or @ for mentions (I call them ref tags).  You should URL
 279 encode the # or @, lest some servers complain at you.  The H alias for #
 280 was created because Apache helpfully strips the fragment of a URL
 281 (everything from the # to the end) before handing it off to the CGI,
 282 even if the hash is URL encoded.  The record objects also contain an
 283 extra <code>author</code> field, like so:
 284
 285 <pre>
 286 {
 287   "author":"Jon",
 288   "record":"57",
 289   "timestamp":1294555793,
 290   "data":"I'm taking #garfield to the vet."
 291 }
 292 </pre>
 293
 294 <p>There is currently no support for getting more than 50 tags, but /tag
 295 will probably mutate to work like /get.
 296
 297 <h3><a name="api_subscribe">/subscribe/(user)</a> - Subscribe to a
 298 user's updates</a></h3>
 299
 300 <p>POST to /subscribe/(user) with a <code>username</code> parameter and
 301 an auth cookie, where (user) is the user whose updates you wish to
 302 subscribe to.  The server will respond with JSON failure if the auth
 303 cookie is bad or if the user doesn't exist.  The server will respond
 304 with JSON success after the subscription is successfully registered.
 305
 306 <h3><a name="api_unsubscribe">/unsubscribe/(user)</a> - Unsubscribe from
 307 a user's updates</h3>
 308
 309 <p>Identical to /subscribe, but removes the subscription.
 310
 311 <h3><a name="api_feed">/feed</a> - Get updates for subscribed users</h3>
 312
 313 <p>POST to /feed, with a <code>username</code> parameter and an auth
 314 cookie.  The server will respond with a JSON list of the last 50 updates
 315 from all subscribed users, in reverse chronological order.  Fetching
 316 /feed resets the new message count returned from /feedinfo.
 317
 318 <p>NOTE: subscription notifications are only stored while subscriptions
 319 are active.  Any records inserted before or after a subscription is
 320 active will not show up in /feed.
 321
 322 <h3><a name="api_feedinfo">/feedinfo, /feedinfo/(user)</a> - Get subscription
 323 status for a user</a></h3>
 324
 325 <p>POST to /feedinfo with a <code>username</code> parameter and an auth
 326 cookie to get general information about your subscribed feeds.
 327 Currently, this only tells you how many new records there are since the
 328 last time /feed was fetched.  The server will respond with a JSON
 329 object:
 330
 331 <pre>
 332 {"new":3}
 333 </pre>
 334
 335 <p>POST to /feedinfo/(user) with a <code>username</code> parameter and
 336 an auth cookie, where (user) is a user whose subscription status you are
 337 interested in.  The server will respond with a simple JSON object:
 338
 339 <pre>
 340 {"subscribed":true}
 341 </pre>
 342
 343 <p>The value of "subscribed" will be either true or false depending on
 344 the subscription status.
 345
 346 <h2><a name="design">Design</a></h2>
 347
 348 <h3><a name="motivation">Motivation</a></h3>
 349
 350 <p>Blërg was created as the result of a thought experiment: "What if
 351 Twitter didn't need thousands of servers? What if its millions of users
 352 could be handled by a single highly efficient server?"  This is probably
 353 an unreachable goal due to the sheer amount of I/O, but we can certainly
 354 try to do better.  Blërg was thus designed as a system with very simple
 355 requirements:
 356
 357 <ol>
 358 <li>Store and fetch small chunks of text efficiently</li>
 359 <li>Create fast indexes for hash tags and @ mentions</li>
 360 <li>Provide a HTTP interface web apps can use</li>
 361 </ol>
 362
 363 <p>And to further simplify, I didn't bother handling deletes, full text
 364 search, or more complicated tag searches.  Blërg only does the basics.
 365
 366 <h3><a name="web_app_stack">Web App Stack</a></h3>
 367
 368 <table class="pizzapie">
 369 <tr><th>Classical model</th></tr>
 370 <tr>
 371   <td style="background-color: blue; color: white"><b>Client App</b><br>HTML/Javascript</td>
 372 </tr>
 373 <tr>
 374   <td style="background-color: #9F0000; color: white"><b>Webserver</b><br>Apache, lighttpd, nginx, etc.</td>
 375 </tr>
 376 <tr>
 377   <td style="background-color: #009F00; color: white"><b>Server App</b><br>Python, Perl, Ruby, etc.</td>
 378 </tr>
 379 <tr>
 380   <td style="background-color: #404040; color: white"><b>Database</b><br>MySQL, PostgreSQL, MongoDB, CouchDB, etc.</td>
 381 </tr>
 382 </table>
 383
 384 <p>Modern web applications have at least a four-layer approach.  You
 385 have the client-side browser app, the web server, the server-side
 386 application, and the database.  Your data goes through a lot of layers
 387 before it actually resides on disk somewhere (or, as they're calling it
 388 these days, "The Cloud" *waves hands*).  Each of those layers requires
 389 some amount of computing resources, so to increase throughput, we must
 390 make the layers more efficient, or reduce the number of layers.
 391
 392 <table class="pizzapie">
 393 <tr><th>Blërg model</th></tr>
 394 <tr>
 395   <td style="background-color: blue; color: white"><b>Blërg Client App</b><br>HTML/Javascript</td>
 396 </tr>
 397 <tr>
 398   <td style="background-color: #404040; color: white"><b>Blërg Database</b><br>Fuckin' hardcore C and shit</td>
 399 </tr>
 400 </table>
 401
 402 <p>Blërg does both by smashing the last two or three layers into one
 403 application.  Blërg can be run as either a standalone web server, or as
 404 a CGI (FastCGI support is planned, but I just don't care right now).
 405 Less waste, more throughput.  As a consequence of this, the entirety of
 406 the application logic that the user sees is implemented in the client
 407 app in Javascript.  That's why all the URLs have #'s &mdash; the page is
 408 loaded once and switched on the fly to show different views, further
 409 reducing load on the server.  Even parsing hash tags and URLs are done
 410 in client JS.
 411
 412 <p>The API is simple and pragmatic.  It's not entirely RESTful, but is
 413 rather designed to work well with web-based front-ends.  Client data is
 414 always POSTed with the usual application/x-www-form-urlencoded encoding,
 415 and server data is always returned in JSON format.
 416
 417 <p>The HTTP interface to the database idea has already been done by <a
 418 href="http://couchdb.apache.org/">CouchDB</a>, though I didn't know that
 419 until after I wrote Blërg. :)
 420
 421 <h3><a name="database">Database</a></h3>
 422
 423 <p>I was impressed by <a
 424 href="http://www.varnish-cache.org/">varnish</a>'s design, so I decided
 425 early in the design process that I'd try out mmaped I/O.  Each user in
 426 Blërg has their own database, which consists of a metdata file, and one
 427 or more data and index files.  The data and index files are memory
 428 mapped, which hopefully makes things more efficient by letting the OS
 429 handle when to read from disk (or maybe not &mdash I haven't benchmarked
 430 it).  The index files are preallocated because I believe it's more
 431 efficient than writing to it 40 bytes at a time as records are added.
 432 The database's limits are reasonable:
 433
 434 <table class="statistics">
 435 <tr><td>maximum record size</td><td>65535 bytes</td></tr>
 436 <tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1 bytes</td></tr>
 437 <tr><td>maximum number of tags per record</td><td>1024</td></tr>
 438 <table>
 439
 440 <p>So as not to create grossly huge and unwieldy data files, the
 441 database layer splits data and index files into many "segments"
 442 containing at most 64K entries each.  Those of you doing some quick math
 443 in your heads may note that this could cause a problem on 32-bit
 444 machines &mdash; if a full segment contains entries of the maximum
 445 length, you'll have to mmap 4GB (32-bit Linux gives each process only
 446 3GB of virtual address space).  Right now, 32-bit users should change
 447 <code>RECORDS_PER_SEGMENT</code> in <code>config.h</code> to something
 448 lower like 32768.  In the future, I might do something smart like not
 449 mmaping the whole fracking file.
 450
 451 <table class="bitstructure">
 452 <tr><th>Record Index Structure</th></tr>
 453 <tr><td class="B4">offset (32-bit integer)</td></tr>
 454 <tr><td class="B2">length (16-bit integer)</td></tr>
 455 <tr><td class="B2">flags (16-bit integer)</td></tr>
 456 <tr><td class="B4">timestamp (32-bit integer)</td></tr>
 457 </table>
 458
 459 <p>A record is stored by first appending the data to the data file, then
 460 writing an entry in the index file containing the offset and length of
 461 the data, as well as the timestamp.  Since each index entry is fixed
 462 length, we can find the index entry simply by multiplying the record
 463 number we want by the size of the index entry.  Upshot: constant-time
 464 random-access reads and constant-time writes.  As an added bonus,
 465 because we're using append-only files, we get lockless reads.
 466
 467 <table class="bitstructure">
 468 <tr><th>Tag Structure</th></tr>
 469 <tr><td class="B32">username (32 bytes)</td></tr>
 470 <tr><td class="B8">record number (64-bit integer)</td></tr>
 471 </table>
 472
 473 <p>Tags are handled by a separate set of indices, one per tag.  When a
 474 record is added, it is scanned for tags, then entries are appended to
 475 each tag index for the tags found.  Each index record simply stores the
 476 user and record number.  Tags are searched by opening the tag file,
 477 reading the last 50 entries or so, and then reading all the records
 478 listed.  Voila, fast tag lookups.
 479
 480 <p>At this point, you're probably thinking, "Is that it?"  Yep, that's
 481 it.  Blërg isn't revolutionary, it's just a system whose requirements
 482 were pared down until the implementation could be made dead simple.
 483
 484 <p>Also, keeping with the style of modern object databases, I haven't
 485 implemented any data safety (har har).  Blërg does not sync anything to
 486 disk before returning success.  This should make Blërg extremely fast,
 487 and totally unreliable in a crash.  But that's the way you want it,
 488 right? :]
 489
 490 <h3><a name="subscriptions">Subscriptions</a></h3>
 491
 492 <p>When I first started thinking about the idea of subscriptions, I
 493 immediately came up with the naïve solution: keep a list of users to
 494 which users are subscribed, then when you want to get updates, iterate
 495 over the list and find the last entries for each user.  And that would
 496 work, but it's kind of costly in terms of disk I/O.  I have to visit
 497 each user in the list, retrieve their last few entries, and store them
 498 somewhere else to be sorted later.  And worse, that computation has to
 499 be done every time a user checks their feed. As the number of users and
 500 subscriptions grows, that will become a problem.
 501
 502 <p>So instead, I thought about it the other way around. Instead of doing
 503 all the work when the request is received, Blërg tries to do as much as
 504 possible by "pushing" updates to subscribed users.  You can think of it
 505 kind of like a mail system.  When a user posts new content, a
 506 notification is "sent" out to each of that user's subscribers.  Later,
 507 when the subscribers want to see what's new, they simply check their
 508 mailbox.  Checking your mailbox is usually a lot more efficient than
 509 going around and checking everyone's records yourself, even with the
 510 overhead of the "mailman."
 511
 512 <p>The "mailbox" is a subscription index, which is identical to a tag
 513 index, but is a per-user construct.  When a user posts a new record, a
 514 subscription index record is written for every subscriber.  It's a
 515 similar amount of I/O as the naïve version above, but the important
 516 difference is that it's only done once.  Retrieving records for accounts
 517 you're subscribed to is then as simple as reading your subscription
 518 index and reading the associated records.  This is hopefully less I/O
 519 than the naïve version, since you're reading, at most, as many accounts
 520 as you have records in the last N entries of your subscription index,
 521 instead of all of them.  And as an added bonus, since subscription index
 522 records are added as posts are created, the subscription index is
 523 automatically sorted by time!  To support this "mail" architecture, we
 524 also keep a list of subscribers and subscrib...ees in each account.
 525
 526 <h3><a name="problems">Problems, Caveats, and Future Work</a></h3>
 527
 528 <p>Blërg probably doesn't actually work like Twitter because I've never
 529 actually had a Twitter account.
 530
 531 <p>I couldn't find a really good fast HTTP server library.
 532 Libmicrohttpd is small, but it's focused on embedded applications, so it
 533 often eschews speed for small memory footprint.  This is especially
 534 apparent when you watch it chew through a POST request 300 bytes at a
 535 time even though you've specified a buffer size of 256K.
 536 <code>blerg.httpd</code> is still pretty fast this way &mdash; on my
 537 2GHz Opteron 246, <a
 538 href="http://www.joedog.org/index/siege-home">siege</a> says it serves a
 539 690-byte /get request at about 945 transactions per second, average
 540 response time 0.05 seconds, with 100 concurrent accesses &mdash; but a
 541 fast HTTP server implementation could knock this out of the park.
 542
 543 <p>Libmicrohttpd is also really difficult to work with.  If you look at
 544 the code, <code>http_blerg.c</code> is about 70% longer than
 545 <code>cgi_blerg.c</code> simply because of all the iterator hoops I had
 546 to jump through to process POST requests.  And if you can believe it, I
 547 wrote <code>http_blerg.c</code> first. If I'd done it the other way
 548 around, I probably would have given up on libmicrohttpd. :-/
 549
 550 <p>The data structures written to disk are dependent on the size and
 551 endianness of the primitive data types on your architecture and OS.
 552 This means that the databases are not portable.  A dump/import tool is
 553 probably the easiest way to handle this.
 554
 555 <p>I do want to make a FastCGI version eventually, and this will
 556 probably be a rather simple modification of cgi_blerg.
 557
 558 <p>Implementing deletes will be... interesting.  There is room in the
 559 record index for a 'deleted' flag, but the problem is deleting any tags
 560 referenced in the data.  This requires rescanning the record content and
 561 putting a 'deleted' flag in the tag indices.  This will not be pretty,
 562 so I'm just going to ignore it and hope nobody makes any mistakes. ;]
 563
 564 <p>Tag indices can grow arbitrarily large, which will cause problems for
 565 32-bit machines around the 3GB mark.  Still, that's something like 80
 566 million tags, so maybe it's not something to worry about.
 567
 568 <p>The API currently requires the client to transmit the user's password
 569 in the clear.  A digest-based authentication scheme would be better,
 570 though for real security, the app should run over HTTPS.
 571
 572 </body>
 573 </html>