Normalize /subscribe to accept a parameter rather than having two endpoints

[blerg.git] / www / doc / index.html
diff --git a/www/doc/index.html b/www/doc/index.html

index 9b2e3df..61b8035 100644 (file)
--- a/www/doc/index.html
+++ b/www/doc/index.html
@@ -3,6 +3,7 @@
  <head>
  <title>Blërg Documentation</title>
  <link rel="stylesheet" href="/css/doc.css">
+<meta http-equiv="content-type" content="text/html; charset=utf8">
  </head>
  <body>
  
@@ -17,7 +18,7 @@ as either a standalone HTTP server, or a CGI.  Blërg is written in pure
  C.
  
  <ul class="toc">
-  <li><a href="#installing">Installing</a>
+  <li><a href="#running">Running Blërg</a>
      <ul>
        <li><a href="#getting_the_source">Getting the source</a></li>
        <li><a href="#requirements">Requirements</a></li>
@@ -36,6 +37,17 @@ C.
        <li><a href="#api_get">/get/(user), /get/(user)/(start record)-(end record) - get records for a user</a></li>
        <li><a href="#api_info">/info/(user) - Get information about a user</a></li>
        <li><a href="#api_tag">/tag/(#|H|@)(tagname) - Retrieve records containing tags</a></li>
+      <li><a href="#api_subscribe">/subscribe/(user) - Subscribe to a user's updates</a></li>
+      <li><a href="#api_unsubscribe">/unsubscribe/(user) - Unsubscribe from a user's updates</a></li>
+      <li><a href="#api_feed">/feed - Get updates for subscribed users</a></li>
+      <li><a href="#api_feedinfo">/feedinfo, /feedinfo/(user) - Get subscription status</a></li>
+      <li><a href="#api_passwd">/passwd - Change a user's password</a></li>
+    </ul>
+  </li>
+  <li><a href="#libraries">Libraries</a>
+    <ul>
+      <li><a href="#lib_c">C</a></li>
+      <li><a href="#lib_perl">Perl</a></li>
      </ul>
    </li>
    <li><a href="#design">Design</a>
@@ -43,12 +55,13 @@ C.
        <li><a href="#motivation">Motivation</a></li>
        <li><a href="#web_app_stack">Web App Stack</a></li>
        <li><a href="#database">Database</a></li>
+      <li><a href="#subscriptions">Subscriptions</a></li>
        <li><a href="#problems">Problems and Future Work</a></li>
      </ul>
    </li>
  </ul>
  
-<h2><a name="installing">Installing</a></h2>
+<h2><a name="running">Running Blërg</a></h2>
  
  <h3><a name="getting_the_source">Getting the source</a></h3>
  
@@ -81,10 +94,15 @@ sense of humor, requires ruby to compile)</li>
  
  <h3><a name="configuring">Configuring</a></h3>
  
-<p>I know I'm gonna get shit for not using an autoconf-based system, but
-I really didn't want to spend time figuring it out.  You should edit
-libs.mk and put in the paths where you can find headers and libraries
-for the above requirements.
+<p>There is now an experimental autoconf build system.  If you run
+<code>add-autoconf</code>, it'll do the magic and create a
+<code>configure</code> script that'll do the familiar things.  If I ever
+get around to distributing source packages, you should find that this
+has already been done.
+
+<p>If you'd rather stick with the manual system, you should edit libs.mk
+and put in the paths where you can find headers and libraries for the
+above requirements.
  
  <p>Also, further apologies to BSD folks &mdash; I've probably committed
  several unconscious Linux-isms.  It would not surprise me if the
@@ -101,6 +119,9 @@ made individually as well, if you, for example, don't want to install
  the prerequisites for <code>blerg.httpd</code> or
  <code>blerg.cgi</code>.
  
+<p><strong>NOTE</strong>: blerg.httpd is deprecated and will not be
+updated with new features.
+
  <h3><a name="installing">Installing</a></h3>
  
  <p>While it's not strictly required, Blërg will be easier to set up if
@@ -109,26 +130,17 @@ reason, it's better to use a subdomain (i.e., blerg.yoursite.com is
  easier than yoursite.com/blerg/).  If you do want to put it in a
  subdirectory, you will have to modify <code>www/js/blerg.js</code> and
  change baseURL at the top as well as a number of other self-references
-in that file and <code>www/index.html</code>.  The CGI version should
-work fine this way, but the HTTP version will require the request to be
-rewritten, as it expects to be serving from the root.
+in that file and <code>www/index.html</code>.
  
  <p>You cannot serve the database and client from different domains
  (i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and
  bar.yoursite.com).  This is a requirement of the web browser &mdash; the
  same origin policy will not allow an AJAX request to travel across
-domains.
+domains (though you can probably get around it these days with <a
+  href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">Cross-origin
+  resource sharing</a>).
  
-<h4>For the standalone web server:</h4>
-
-<p>Right now, <code>blerg.httpd</code> doesn't serve any static assets,
-so you're going to have to put it behind a real webserver like apache,
-lighttpd, nginx, or similar.  Set the document root to the www
-directory, then proxy /info, /create, /login, /logout, /get, /tag, and
-/put to blerg.httpd.  You can change the port <code>blerg.httpd</code>
-listens on in <code>config.h</code>.
-
-<h4>For the CGI version:</h4>
+<h4>For straight CGI with Apache</h4>
  
  <p>Copy the files in www/ to the root of your web server.  Copy
  <code>blerg.cgi</code> to your web server.  Included in www-configs/ is
@@ -136,10 +148,47 @@ a .htaccess file for Apache that will rewrite the URLs.  If you need to
  call the CGI something other than <code>blerg.cgi</code>, the .htaccess
  file will need to be modified.
  
+<h4>For nginx</h4>
+
+<p>Nginx can't run CGI directly, and there's currently no FastCGI
+version of Blërg, so you will have to run it under some kind of CGI to
+FastCGI gateway, like the one described <a
+href="http://wiki.nginx.org/SimpleCGI">here on the nginx wiki</a>.  This
+pretty much destroys the performance of Blërg, but it's all we've got
+right now.
+
  <h4>The extra RSS CGI</h4>
  
  <p>There is an optional RSS cgi (<code>rss.cgi</code>) that will serve
  RSS feeds for users.  Install this like <code>blerg.cgi</code> above.
+As of 1.9.0, this is a perl FastCGI script, so you will have to make
+sure the perl libraries are available to it.  A good way of doing that
+is to install to an environment directory, as described below.
+
+<h4>Installing to an environment directory</h4>
+
+<p>The Makefile has support for installing Blërg into a directory that
+includes tools, libraries, and configuration snippets for shell and web
+servers.  Use it as <code>make install-environment
+  ENV_DIR=&lt;directory&gt;</code>.  Under &lt;directory&gt;/etc will be
+a shell script that sets environment variables, and configuration
+snippets for nginx and apache to do the same.  This should make it
+somewhat easier to use Blërg in a self-contained way.
+
+<p>For example, this will install Blërg to an environment directory
+inside your home directory:
+
+<pre>user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env
+...
+user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh
+</pre>
+
+<p>Then, you will be able to run tools like <code>blergtool</code>, and
+it will operate on data inside <code>~/blerg-env/data</code>.  Likewise,
+you can include
+<code>/home/user/blerg-env/etc/nginx-fastcgi-vars.conf</code> or
+<code>/home/user/blerg-env/etc/apache-setenv.conf</code> in your
+webserver to make the CGI/FastCGI scripts to the same thing.
  
  
  <h2><a name="api">API</a></h2>
@@ -281,6 +330,91 @@ extra <code>author</code> field, like so:
  <p>There is currently no support for getting more than 50 tags, but /tag
  will probably mutate to work like /get.
  
+<h3><a name="api_subscribe">/subscribe/(user)</a> - Subscribe to a
+user's updates</a></h3>
+
+<p>POST to /subscribe/(user) with a <code>username</code> parameter and
+an auth cookie, where (user) is the user whose updates you wish to
+subscribe to.  The server will respond with JSON failure if the auth
+cookie is bad or if the user doesn't exist.  The server will respond
+with JSON success after the subscription is successfully registered.
+
+<h3><a name="api_unsubscribe">/unsubscribe/(user)</a> - Unsubscribe from
+a user's updates</h3>
+
+<p>Identical to /subscribe, but removes the subscription.
+
+<h3><a name="api_feed">/feed</a> - Get updates for subscribed users</h3>
+
+<p>POST to /feed, with a <code>username</code> parameter and an auth
+cookie.  The server will respond with a JSON list of the last 50 updates
+from all subscribed users, in reverse chronological order.  Fetching
+/feed resets the new message count returned from /feedinfo.
+
+<p>NOTE: subscription notifications are only stored while subscriptions
+are active.  Any records inserted before or after a subscription is
+active will not show up in /feed.
+
+<h3><a name="api_feedinfo">/feedinfo, /feedinfo/(user)</a> - Get subscription
+status for a user</a></h3>
+
+<p>POST to /feedinfo with a <code>username</code> parameter and an auth
+cookie to get general information about your subscribed feeds.
+Currently, this only tells you how many new records there are since the
+last time /feed was fetched.  The server will respond with a JSON
+object:
+
+<pre>
+{"new":3}
+</pre>
+
+<p>POST to /feedinfo/(user) with a <code>username</code> parameter and
+an auth cookie, where (user) is a user whose subscription status you are
+interested in.  The server will respond with a simple JSON object:
+
+<pre>
+{"subscribed":true}
+</pre>
+
+<p>The value of "subscribed" will be either true or false depending on
+the subscription status.
+
+<h3><a name="api_passwd">/passwd</a> - Change a user's password</a></h3>
+
+<p>POST to /passwd with a <code>username</code> parameter and an auth
+cookie, plus <code>password</code> and <code>new_password</code>
+parameters to change the user's password.  For extra protection,
+changing a password requires sending the user's current password in the
+<code>password</code> parameter.  If authentication is successful and
+the password matches, the user's password is set to
+<code>new_password</code> and the server responds with JSON success.
+
+If the password doesn't match, or one of <code>password</code> or
+<code>new_password</code> are missing, the server returns JSON failure.
+
+<h2><a name="libraries">Libraries</a></h2>
+
+<h3><a name="lib_c">C</a></h3>
+
+<p>Most of Blërg's core functionality is packaged in a static library
+called <code>blerg.a</code>.  It's not designed to be public or
+installed with `make install-environment`, but it should be relatively
+straightforward to use it in C programs.  Look at the headers under the
+<code>databse</code> directory.
+
+<p>A secondary library called <code>blerg_auth.a</code> handles the
+authentication layer of Blërg.  To use it, look at
+<code>common/auth.h</code>.
+
+<h3><a name="lib_perl">Perl</a></h3>
+
+<p>As of 1.9.0, Blërg includes a perl library called
+<code>Blerg::Database</code>.  It wraps the core and authentication
+functionality in a perlish interface.  The module has its own POD
+documentation, which you can read with your favorite POD reader, from
+the manual installed in an environment directory, or in HTML <a
+href="perl/Blerg-Database.html">here</a>.
+
  <h2><a name="design">Design</a></h2>
  
  <h3><a name="motivation">Motivation</a></h3>
@@ -338,14 +472,15 @@ make the layers more efficient, or reduce the number of layers.
  </table>
  
  <p>Blërg does both by smashing the last two or three layers into one
-application.  Blërg can be run as either a standalone web server, or as
-a CGI (FastCGI support is planned, but I just don't care right now).
-Less waste, more throughput.  As a consequence of this, the entirety of
-the application logic that the user sees is implemented in the client
-app in Javascript.  That's why all the URLs have #'s &mdash; the page is
-loaded once and switched on the fly to show different views, further
-reducing load on the server.  Even parsing hash tags and URLs are done
-in client JS.
+application.  Blërg can be run as either a standalone web server
+(currently deprecated because maintaining two versions is hard), or as a
+CGI (FastCGI support is planned, but I just don't care right now).  Less
+waste, more throughput.  As a consequence of this, the entirety of the
+application logic that the user sees is implemented in the client app in
+Javascript.  That's why all the URLs have #'s &mdash; the page is loaded
+once and switched on the fly to show different views, further reducing
+load on the server.  Even parsing hash tags and URLs are done in client
+JS.
  
  <p>The API is simple and pragmatic.  It's not entirely RESTful, but is
  rather designed to work well with web-based front-ends.  Client data is
@@ -364,24 +499,24 @@ early in the design process that I'd try out mmaped I/O.  Each user in
  Blërg has their own database, which consists of a metdata file, and one
  or more data and index files.  The data and index files are memory
  mapped, which hopefully makes things more efficient by letting the OS
-handle when to read from disk (or maybe not &mdash I haven't benchmarked
-it).  The index files are preallocated because I believe it's more
-efficient than writing to it 40 bytes at a time as records are added.
-The database's limits are reasonable:
+handle when to read from disk (or maybe not &mdash; I haven't
+benchmarked it).  The index files are preallocated because I believe
+it's more efficient than writing to it 40 bytes at a time as records are
+added.  The database's limits are reasonable:
  
  <table class="statistics">
  <tr><td>maximum record size</td><td>65535 bytes</td></tr>
-<tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1 bytes</td></tr>
+<tr><td>maximum number of records per database</td><td>2<sup>64</sup> - 1</td></tr>
  <tr><td>maximum number of tags per record</td><td>1024</td></tr>
  <table>
  
  <p>So as not to create grossly huge and unwieldy data files, the
  database layer splits data and index files into many "segments"
-containing at most 64K entries each.  Those of you doing some quick math
-in your heads may note that this could cause a problem on 32-bit
-machines &mdash; if a full segment contains entries of the maximum
-length, you'll have to mmap 4GB (32-bit Linux gives each process only
-3GB of virtual address space).  Right now, 32-bit users should change
+containing at most 64K entries each.  Those of you doing some quick
+mental math may note that this could cause a problem on 32-bit machines
+&mdash; if a full segment contains entries of the maximum length, you'll
+have to mmap 4GB (32-bit Linux gives each process only 3GB of virtual
+address space).  Right now, 32-bit users should change
  <code>RECORDS_PER_SEGMENT</code> in <code>config.h</code> to something
  lower like 32768.  In the future, I might do something smart like not
  mmaping the whole fracking file.
@@ -425,6 +560,42 @@ disk before returning success.  This should make Blërg extremely fast,
  and totally unreliable in a crash.  But that's the way you want it,
  right? :]
  
+<h3><a name="subscriptions">Subscriptions</a></h3>
+
+<p>When I first started thinking about the idea of subscriptions, I
+immediately came up with the naïve solution: keep a list of users to
+which users are subscribed, then when you want to get updates, iterate
+over the list and find the last entries for each user.  And that would
+work, but it's kind of costly in terms of disk I/O.  I have to visit
+each user in the list, retrieve their last few entries, and store them
+somewhere else to be sorted later.  And worse, that computation has to
+be done every time a user checks their feed. As the number of users and
+subscriptions grows, that will become a problem.
+
+<p>So instead, I thought about it the other way around. Instead of doing
+all the work when the request is received, Blërg tries to do as much as
+possible by "pushing" updates to subscribed users.  You can think of it
+kind of like a mail system.  When a user posts new content, a
+notification is "sent" out to each of that user's subscribers.  Later,
+when the subscribers want to see what's new, they simply check their
+mailbox.  Checking your mailbox is usually a lot more efficient than
+going around and checking everyone's records yourself, even with the
+overhead of the "mailman."
+
+<p>The "mailbox" is a subscription index, which is identical to a tag
+index, but is a per-user construct.  When a user posts a new record, a
+subscription index record is written for every subscriber.  It's a
+similar amount of I/O as the naïve version above, but the important
+difference is that it's only done once.  Retrieving records for accounts
+you're subscribed to is then as simple as reading your subscription
+index and reading the associated records.  This is hopefully less I/O
+than the naïve version, since you're reading, at most, as many accounts
+as you have records in the last N entries of your subscription index,
+instead of all of them.  And as an added bonus, since subscription index
+records are added as posts are created, the subscription index is
+automatically sorted by time!  To support this "mail" architecture, we
+also keep a list of subscribers and subscrib...ees in each account.
+
  <h3><a name="problems">Problems, Caveats, and Future Work</a></h3>
  
  <p>Blërg probably doesn't actually work like Twitter because I've never