<head>
<title>Blërg Documentation</title>
<link rel="stylesheet" href="/css/doc.css">
+<meta http-equiv="content-type" content="text/html; charset=utf8">
</head>
<body>
C.
<ul class="toc">
- <li><a href="#installing">Installing</a>
+ <li><a href="#running">Running Blërg</a>
<ul>
<li><a href="#getting_the_source">Getting the source</a></li>
<li><a href="#requirements">Requirements</a></li>
<li><a href="#api_get">/get/(user), /get/(user)/(start record)-(end record) - get records for a user</a></li>
<li><a href="#api_info">/info/(user) - Get information about a user</a></li>
<li><a href="#api_tag">/tag/(#|H|@)(tagname) - Retrieve records containing tags</a></li>
+ <li><a href="#api_subscribe">/subscribe/(user) - Subscribe to a user's updates</a></li>
+ <li><a href="#api_unsubscribe">/unsubscribe/(user) - Unsubscribe from a user's updates</a></li>
+ <li><a href="#api_feed">/feed - Get updates for subscribed users</a></li>
+ <li><a href="#api_feedinfo">/feedinfo, /feedinfo/(user) - Get subscription status</a></li>
+ <li><a href="#api_passwd">/passwd - Change a user's password</a></li>
+ </ul>
+ </li>
+ <li><a href="#libraries">Libraries</a>
+ <ul>
+ <li><a href="#lib_c">C</a></li>
+ <li><a href="#lib_perl">Perl</a></li>
</ul>
</li>
<li><a href="#design">Design</a>
<li><a href="#motivation">Motivation</a></li>
<li><a href="#web_app_stack">Web App Stack</a></li>
<li><a href="#database">Database</a></li>
+ <li><a href="#subscriptions">Subscriptions</a></li>
<li><a href="#problems">Problems and Future Work</a></li>
</ul>
</li>
</ul>
-<h2><a name="installing">Installing</a></h2>
+<h2><a name="running">Running Blërg</a></h2>
<h3><a name="getting_the_source">Getting the source</a></h3>
<h3><a name="configuring">Configuring</a></h3>
-<p>I know I'm gonna get shit for not using an autoconf-based system, but
-I really didn't want to spend time figuring it out. You should edit
-libs.mk and put in the paths where you can find headers and libraries
-for the above requirements.
+<p>There is now an experimental autoconf build system. If you run
+<code>add-autoconf</code>, it'll do the magic and create a
+<code>configure</code> script that'll do the familiar things. If I ever
+get around to distributing source packages, you should find that this
+has already been done.
+
+<p>If you'd rather stick with the manual system, you should edit libs.mk
+and put in the paths where you can find headers and libraries for the
+above requirements.
<p>Also, further apologies to BSD folks — I've probably committed
several unconscious Linux-isms. It would not surprise me if the
the prerequisites for <code>blerg.httpd</code> or
<code>blerg.cgi</code>.
+<p><strong>NOTE</strong>: blerg.httpd is deprecated and will not be
+updated with new features.
+
<h3><a name="installing">Installing</a></h3>
<p>While it's not strictly required, Blërg will be easier to set up if
easier than yoursite.com/blerg/). If you do want to put it in a
subdirectory, you will have to modify <code>www/js/blerg.js</code> and
change baseURL at the top as well as a number of other self-references
-in that file and <code>www/index.html</code>. The CGI version should
-work fine this way, but the HTTP version will require the request to be
-rewritten, as it expects to be serving from the root.
+in that file and <code>www/index.html</code>.
<p>You cannot serve the database and client from different domains
(i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and
bar.yoursite.com). This is a requirement of the web browser — the
same origin policy will not allow an AJAX request to travel across
-domains.
+domains (though you can probably get around it these days with <a
+ href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">Cross-origin
+ resource sharing</a>).
-<h4>For the standalone web server:</h4>
-
-<p>Right now, <code>blerg.httpd</code> doesn't serve any static assets,
-so you're going to have to put it behind a real webserver like apache,
-lighttpd, nginx, or similar. Set the document root to the www
-directory, then proxy /info, /create, /login, /logout, /get, /tag, and
-/put to blerg.httpd. You can change the port <code>blerg.httpd</code>
-listens on in <code>config.h</code>.
-
-<h4>For the CGI version:</h4>
+<h4>For straight CGI with Apache</h4>
<p>Copy the files in www/ to the root of your web server. Copy
<code>blerg.cgi</code> to your web server. Included in www-configs/ is
call the CGI something other than <code>blerg.cgi</code>, the .htaccess
file will need to be modified.
+<h4>For nginx</h4>
+
+<p>Nginx can't run CGI directly, and there's currently no FastCGI
+version of Blërg, so you will have to run it under some kind of CGI to
+FastCGI gateway, like the one described <a
+href="http://wiki.nginx.org/SimpleCGI">here on the nginx wiki</a>. This
+pretty much destroys the performance of Blërg, but it's all we've got
+right now.
+
<h4>The extra RSS CGI</h4>
<p>There is an optional RSS cgi (<code>rss.cgi</code>) that will serve
RSS feeds for users. Install this like <code>blerg.cgi</code> above.
+As of 1.9.0, this is a perl FastCGI script, so you will have to make
+sure the perl libraries are available to it. A good way of doing that
+is to install to an environment directory, as described below.
+
+<h4>Installing to an environment directory</h4>
+
+<p>The Makefile has support for installing Blërg into a directory that
+includes tools, libraries, and configuration snippets for shell and web
+servers. Use it as <code>make install-environment
+ ENV_DIR=<directory></code>. Under <directory>/etc will be
+a shell script that sets environment variables, and configuration
+snippets for nginx and apache to do the same. This should make it
+somewhat easier to use Blërg in a self-contained way.
+
+<p>For example, this will install Blërg to an environment directory
+inside your home directory:
+
+<pre>user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env
+...
+user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh
+</pre>
+
+<p>Then, you will be able to run tools like <code>blergtool</code>, and
+it will operate on data inside <code>~/blerg-env/data</code>. Likewise,
+you can include
+<code>/home/user/blerg-env/etc/nginx-fastcgi-vars.conf</code> or
+<code>/home/user/blerg-env/etc/apache-setenv.conf</code> in your
+webserver to make the CGI/FastCGI scripts to the same thing.
<h2><a name="api">API</a></h2>
<p>There is currently no support for getting more than 50 tags, but /tag
will probably mutate to work like /get.
+<h3><a name="api_subscribe">/subscribe/(user)</a> - Subscribe to a
+user's updates</a></h3>
+
+<p>POST to /subscribe/(user) with a <code>username</code> parameter and
+an auth cookie, where (user) is the user whose updates you wish to
+subscribe to. The server will respond with JSON failure if the auth
+cookie is bad or if the user doesn't exist. The server will respond
+with JSON success after the subscription is successfully registered.
+
+<h3><a name="api_unsubscribe">/unsubscribe/(user)</a> - Unsubscribe from
+a user's updates</h3>
+
+<p>Identical to /subscribe, but removes the subscription.
+
+<h3><a name="api_feed">/feed</a> - Get updates for subscribed users</h3>
+
+<p>POST to /feed, with a <code>username</code> parameter and an auth
+cookie. The server will respond with a JSON list of the last 50 updates
+from all subscribed users, in reverse chronological order. Fetching
+/feed resets the new message count returned from /feedinfo.
+
+<p>NOTE: subscription notifications are only stored while subscriptions
+are active. Any records inserted before or after a subscription is
+active will not show up in /feed.
+
+<h3><a name="api_feedinfo">/feedinfo, /feedinfo/(user)</a> - Get subscription
+status for a user</a></h3>
+
+<p>POST to /feedinfo with a <code>username</code> parameter and an auth
+cookie to get general information about your subscribed feeds.
+Currently, this only tells you how many new records there are since the
+last time /feed was fetched. The server will respond with a JSON
+object:
+
+<pre>
+{"new":3}
+</pre>
+
+<p>POST to /feedinfo/(user) with a <code>username</code> parameter and
+an auth cookie, where (user) is a user whose subscription status you are
+interested in. The server will respond with a simple JSON object:
+
+<pre>
+{"subscribed":true}
+</pre>
+
+<p>The value of "subscribed" will be either true or false depending on
+the subscription status.
+
+<h3><a name="api_passwd">/passwd</a> - Change a user's password</a></h3>
+
+<p>POST to /passwd with a <code>username</code> parameter and an auth
+cookie, plus <code>password</code> and <code>new_password</code>
+parameters to change the user's password. For extra protection,
+changing a password requires sending the user's current password in the
+<code>password</code> parameter. If authentication is successful and
+the password matches, the user's password is set to
+<code>new_password</code> and the server responds with JSON success.
+
+If the password doesn't match, or one of <code>password</code> or
+<code>new_password</code> are missing, the server returns JSON failure.
+
+<h2><a name="libraries">Libraries</a></h2>
+
+<h3><a name="lib_c">C</a></h3>
+
+<p>Most of Blërg's core functionality is packaged in a static library
+called <code>blerg.a</code>. It's not designed to be public or
+installed with `make install-environment`, but it should be relatively
+straightforward to use it in C programs. Look at the headers under the
+<code>databse</code> directory.
+
+<p>A secondary library called <code>blerg_auth.a</code> handles the
+authentication layer of Blërg. To use it, look at
+<code>common/auth.h</code>.
+
+<h3><a name="lib_perl">Perl</a></h3>
+
+<p>As of 1.9.0, Blërg includes a perl library called
+<code>Blerg::Database</code>. It wraps the core and authentication
+functionality in a perlish interface. The module has its own POD
+documentation, which you can read with your favorite POD reader, from
+the manual installed in an environment directory, or in HTML <a
+href="perl/Blerg-Database.html">here</a>.
+
<h2><a name="design">Design</a></h2>
<h3><a name="motivation">Motivation</a></h3>
</table>
<p>Blërg does both by smashing the last two or three layers into one
-application. Blërg can be run as either a standalone web server, or as
-a CGI (FastCGI support is planned, but I just don't care right now).
-Less waste, more throughput. As a consequence of this, the entirety of
-the application logic that the user sees is implemented in the client
-app in Javascript. That's why all the URLs have #'s — the page is
-loaded once and switched on the fly to show different views, further
-reducing load on the server. Even parsing hash tags and URLs are done
-in client JS.
+application. Blërg can be run as either a standalone web server
+(currently deprecated because maintaining two versions is hard), or as a
+CGI (FastCGI support is planned, but I just don't care right now). Less
+waste, more throughput. As a consequence of this, the entirety of the
+application logic that the user sees is implemented in the client app in
+Javascript. That's why all the URLs have #'s — the page is loaded
+once and switched on the fly to show different views, further reducing
+load on the server. Even parsing hash tags and URLs are done in client
+JS.
<p>The API is simple and pragmatic. It's not entirely RESTful, but is
rather designed to work well with web-based front-ends. Client data is
Blërg has their own database, which consists of a metdata file, and one
or more data and index files. The data and index files are memory
mapped, which hopefully makes things more efficient by letting the OS
-handle when to read from disk (or maybe not &mdash I haven't benchmarked
-it). The index files are preallocated because I believe it's more
-efficient than writing to it 40 bytes at a time as records are added.
-The database's limits are reasonable:
+handle when to read from disk (or maybe not — I haven't
+benchmarked it). The index files are preallocated because I believe
+it's more efficient than writing to it 40 bytes at a time as records are
+added. The database's limits are reasonable:
<table class="statistics">
<tr><td>maximum record size</td><td>65535 bytes</td></tr>
-<tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1 bytes</td></tr>
+<tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1</td></tr>
<tr><td>maximum number of tags per record</td><td>1024</td></tr>
<table>
<p>So as not to create grossly huge and unwieldy data files, the
database layer splits data and index files into many "segments"
-containing at most 64K entries each. Those of you doing some quick math
-in your heads may note that this could cause a problem on 32-bit
-machines — if a full segment contains entries of the maximum
-length, you'll have to mmap 4GB (32-bit Linux gives each process only
-3GB of virtual address space). Right now, 32-bit users should change
+containing at most 64K entries each. Those of you doing some quick
+mental math may note that this could cause a problem on 32-bit machines
+— if a full segment contains entries of the maximum length, you'll
+have to mmap 4GB (32-bit Linux gives each process only 3GB of virtual
+address space). Right now, 32-bit users should change
<code>RECORDS_PER_SEGMENT</code> in <code>config.h</code> to something
lower like 32768. In the future, I might do something smart like not
mmaping the whole fracking file.
and totally unreliable in a crash. But that's the way you want it,
right? :]
+<h3><a name="subscriptions">Subscriptions</a></h3>
+
+<p>When I first started thinking about the idea of subscriptions, I
+immediately came up with the naïve solution: keep a list of users to
+which users are subscribed, then when you want to get updates, iterate
+over the list and find the last entries for each user. And that would
+work, but it's kind of costly in terms of disk I/O. I have to visit
+each user in the list, retrieve their last few entries, and store them
+somewhere else to be sorted later. And worse, that computation has to
+be done every time a user checks their feed. As the number of users and
+subscriptions grows, that will become a problem.
+
+<p>So instead, I thought about it the other way around. Instead of doing
+all the work when the request is received, Blërg tries to do as much as
+possible by "pushing" updates to subscribed users. You can think of it
+kind of like a mail system. When a user posts new content, a
+notification is "sent" out to each of that user's subscribers. Later,
+when the subscribers want to see what's new, they simply check their
+mailbox. Checking your mailbox is usually a lot more efficient than
+going around and checking everyone's records yourself, even with the
+overhead of the "mailman."
+
+<p>The "mailbox" is a subscription index, which is identical to a tag
+index, but is a per-user construct. When a user posts a new record, a
+subscription index record is written for every subscriber. It's a
+similar amount of I/O as the naïve version above, but the important
+difference is that it's only done once. Retrieving records for accounts
+you're subscribed to is then as simple as reading your subscription
+index and reading the associated records. This is hopefully less I/O
+than the naïve version, since you're reading, at most, as many accounts
+as you have records in the last N entries of your subscription index,
+instead of all of them. And as an added bonus, since subscription index
+records are added as posts are created, the subscription index is
+automatically sorted by time! To support this "mail" architecture, we
+also keep a list of subscribers and subscrib...ees in each account.
+
<h3><a name="problems">Problems, Caveats, and Future Work</a></h3>
<p>Blërg probably doesn't actually work like Twitter because I've never