X-Git-Url: http://git.bytex64.net/?a=blobdiff_plain;f=www%2Fdoc%2Findex.html;h=9a92088592078eddcf1ac7c3c9dbc39191d779ef;hb=574667b5856f1df735e0dff39637cb0f516ec9b0;hp=947c373582d8232798564520c7ba4ab01ad8e9cf;hpb=fd4abb4a961a1cfe8ebe1fa0b2a0987dc2f90b35;p=blerg.git diff --git a/www/doc/index.html b/www/doc/index.html index 947c373..9a92088 100644 --- a/www/doc/index.html +++ b/www/doc/index.html @@ -3,6 +3,7 @@
There's no stable release, yet, but you can get everything currently +
There's no stable release yet, but you can get everything currently running on blerg.dominionofawesome.com by cloning the git repository at http://git.bytex64.net/blerg.git. @@ -50,7 +75,7 @@ http://git.bytex64.net/blerg.git. — as a standalone HTTP server, or as a CGI. You will need:
I know I'm gonna get shit for not using an autoconf-based system, but -I really didn't want to waste time figuring it out. You should edit -libs.mk and put in the paths where you can find headers and libraries -for the above requirements. +
Edit libs.mk and put in the paths where you can find headers and +libraries for the above requirements.
Also, further apologies to BSD folks — I've probably committed several unconscious Linux-isms. It would not surprise me if the -makefile refuses to work with BSD make. If you have patches or -suggestions on how to make Blërg more portable, I'd be happy to hear -them. +makefile refuses to work with BSD make, or if it fails to compile even +with gmake. If you have patches or suggestions on how to make Blërg +more portable, I'd be happy to hear them.
At this point, it should be gravy. Type 'make' and in a few seconds,
-you should have http_blerg
, cgi_blerg
,
-rss
, and blergtool
. Each of those can be made
-individually as well, if you, for example, don't want to install the
-prerequisites for http_blerg
or cgi_blerg
.
+you should have blerg.httpd
, blerg.cgi
,
+rss.cgi
, and blergtool
. Each of those can be
+made individually as well, if you, for example, don't want to install
+the prerequisites for blerg.httpd
or
+blerg.cgi
.
+
+
NOTE: blerg.httpd is deprecated and will not be +updated with new features.
While it's not required, Blërg will be easier to set up if you -configure it to work from the root of your website. For this reason, -it's better to use a subdomain (i.e., blerg.yoursite.com is easier than -yoursite.com/blerg/). If you do want to put it in a subdirectory, you -will have to modify www/js/blerg.js and change baseURL at the top. The -CGI version should work fine this way, but the HTTP version will require -the request to be rewritten, as it expects to be serving from the root. +
While it's not strictly required, Blërg will be easier to set up if
+you configure it to work from the root of your website. For this
+reason, it's better to use a subdomain (i.e., blerg.yoursite.com is
+easier than yoursite.com/blerg/). If you do want to put it in a
+subdirectory, you will have to modify www/js/blerg.js
and
+change baseURL at the top as well as a number of other self-references
+in that file and www/index.html
.
+
+
You cannot serve the database and client from different domains +(i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and +bar.yoursite.com). This is a requirement of the web browser — the +same origin policy will not allow an AJAX request to travel across +domains (though you can probably get around it these days with Cross-origin + resource sharing). + +
Copy the files in www/ to the root of your web server. Copy
+blerg.cgi
to your web server. Included in www-configs/ is
+a .htaccess file for Apache that will rewrite the URLs. If you need to
+call the CGI something other than blerg.cgi
, the .htaccess
+file will need to be modified.
+
+
Nginx can't run CGI directly, and there's currently no FastCGI +version of Blërg, so you will have to run it under some kind of CGI to +FastCGI gateway, like the one described here on the nginx wiki. This +pretty much destroys the performance of Blërg, but it's all we've got +right now. -
Right now, http_blerg doesn't serve any static assets, so you're -going to have to put it behind a real webserver like apache, lighttpd, -nginx, or similar. Set the document root to the www directory, then -proxy /info, /create, /login, /logout, /get, /tag, and /put to -http_blerg. +
There is an optional RSS cgi (rss.cgi
) that will serve
+RSS feeds for users. Install this like blerg.cgi
above.
+As of 1.9.0, this is a perl FastCGI script, so you will have to make
+sure the perl libraries are available to it. A good way of doing that
+is to install to an environment directory, as described below.
-
Copy the files in www to the root of your web server. Copy cgi_blerg -to blerg.cgi somewhere on your web server. Included in www-configs is a -.htaccess file for apache that will rewrite the URLs. If you need to -call cgi_blerg something other than blerg.cgi, the .htaccess file will -need to be modified. +
The Makefile has support for installing Blërg into a directory that
+includes tools, libraries, and configuration snippets for shell and web
+servers. Use it as make install-environment
+ ENV_DIR=<directory>
. Under <directory>/etc will be
+a shell script that sets environment variables, and configuration
+snippets for nginx and apache to do the same. This should make it
+somewhat easier to use Blërg in a self-contained way.
-
For example, this will install Blërg to an environment directory +inside your home directory: -
There is an optional RSS cgi (called simply rss) that will serve RSS -feeds for users. Install this like the CGI version above (on my server, -it's at /rss.cgi). +
user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env +... +user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh ++ +
Then, you will be able to run tools like blergtool
, and
+it will operate on data inside ~/blerg-env/data
. Likewise,
+you can include
+/home/user/blerg-env/etc/nginx-fastcgi-vars.conf
or
+/home/user/blerg-env/etc/apache-setenv.conf
in your
+webserver to make the CGI/FastCGI scripts to the same thing.
On failure, all API calls return either a standard HTTP error response, like 404 Not Found if a record or user doesn't exist, or a 200 -response with some JSON indicating failure, which will look like this: +response with a 'JSON failure', which will look like this: -
{"status": "failure"}
+
{"status": "failure"}
Blërg doesn't currently explain why there is a failure, and I'm not sure it ever will.
On success, you'll either get some JSON relating to your request (for -/get, /tag, or /info), or a JSON object indicating success (for /create, -/put, /login, or /logout), which looks like this: +/get, /tag, or /info), or a 'JSON success' response (for /create, /put, +/login, or /logout), which looks like this: -
{"status": "success"}
+
{"status": "success"}
For the CGI backend, you may get a 500 error if something goes wrong. For the HTTP backend, you'll get nothing (since it will have crashed), or maybe a 502 Bad Gateway if you have it behind another web server.
All usernames must be 32 characters or less. Usernames must contain
-only the ASCII characters 0-9, A-Z, a-z, underscore (_), period (.),
-hyphen (-), single quote ('), and space ( ). Passwords can be at most
-64 bytes, and have no limits on characters (but beware: if you have a
-null in the middle, it will stop checking there because I use
-strncmp(3)
to compare).
+only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
+Passwords can be at most 64 bytes, and have no limits on characters (but
+beware: if you have a null in the middle, it will stop checking there
+because I use strncmp(3)
to compare).
Tags must be 64 characters or less, and can contain only the ASCII -characters 0-9, A-Z, a-z, hyphen (-), and underscore (_). +characters 0-9, A-Z, a-z, underscore (_), and hyphen (-). + +
As the result of a successful login, the server
+will send back a cookie named auth
. This cookie authorizes
+restricted requests, and must be sent for any API endpoint marked authorization, or else you will get a 403 Forbidden
+response. The cookie format looks like:
+
+auth=username/abcdef0123456789abcdef0123456789
+
+That is a username, a forward slash, and 32 hexadecimal digits which denote the
+"token" identifying the session. On logout, the server will invalidate the
+token and expire the cookie.
To create a user, POST to /create with username
and
password
parameters for the new user. The server will
-respond with failure if the user exists, or if the user can't be created
-for some other reason. The server will respond with success if the user
-is created.
+respond with JSON failure if the user exists, or if the user can't be
+created for some other reason. The server will respond with JSON
+success if the user is created.
POST to /login with the username
and
password
parameters for an existing user. The server will
-respond with failure if the user does not exist or if the password is
-incorrect. On success, the server will respond with success, and will
-set a cookie named 'auth' that must be sent by the client when accessing
-restricted API functions (/put and /logout).
+respond with JSON failure if the user does not exist or if the password
+is incorrect. On success, the server will respond with JSON success,
+and will set a cookie named 'auth' that must be sent by the client when
+accessing restricted API functions (See Authorization above).
POST to /logout with with username
, the user to log out,
-along with the auth cookie in a Cookie header. The server will respond
-with failure if the user does not exist or if the auth cookie is bad.
-The server will respond with success after the user is successfully
-logged out.
+
POST to /logout. The server will respond with JSON failure if the +user does not exist or if the request is unauthorized. The server will +respond with JSON success after the user is successfully logged out.
POST to /put with username
and data
-parameters, and an auth cookie. The server will respond with failure
-if the auth cookie is bad, if the user doesn't exist, or if
-data
contains more than 65535 bytes after URL
-decoding. The server will respond with success after the record is
-successfully added.
+
POST to /put with a data
parameter. The server will
+respond with JSON failure if the request is unauthorized, if the user
+doesn't exist, or if data
contains more than 65535 bytes
+after URL decoding. The server will respond with JSON success
+after the record is successfully added.
The second form, /get/(user)/(start record)-(end record), retrieves a specific range of records, from (start record) to (end record) inclusive. You can retrieve at most 100 records this way. If (end -record) - (start record) specifies more than 100 records, the server -will respond with JSON failure. +record) - (start record) specifies more than 100 records, or if the +range specifies invalid records, or if the end record is before the +start record, the server will respond with JSON failure.
author
field, like so:
+{ "author":"Jon", "record":"57", "timestamp":1294555793, "data":"I'm taking #garfield to the vet." } +
There is currently no support for getting more than 50 tags, but /tag will probably mutate to work like /get. +
POST to /subscribe/(user) with a subscribed
parameter
+that is either "true" or "false", indicating whether (user) should be
+subscribed to or not. The server will respond with JSON failure if the
+request is unauthorized or if the user doesn't exist. The server will
+respond with JSON success after the subscription request is successfully
+registered.
+
+
POST to /feed, with a username
parameter and an auth
+cookie. The server will respond with a JSON list of the last 50 updates
+from all subscribed users, in reverse chronological order. Fetching
+/feed does not reset the new message count returned from /status. To do
+that, look at POST /status.
+
+
NOTE: subscription notifications are only stored while subscriptions +are active. Any records inserted before or after a subscription is +active will not show up in /feed. + +
GET to /status to get information about your account. It tells you +the number of new subscription records since the last time the +subscription counter was reset, and a flag for whether the account was +mentioned since the last time the mention flag was cleared. The server +will respond with a JSON object: + +
+{ + "feed_new": 3, + "mentioned": false +} ++ +
POST to /status with a clear
parameter that is either
+"feed" or "mentioned" to reset either the subscription counter or the
+mention flag, respectively. There is not currently a way to clear both
+with a single request. The server will respond with JSON success.
+
+
GET to /status/(user) to get subscription information for a +particular user. The server will respond with a simple JSON object: + +
+{"subscribed":true} ++ +
The value of "subscribed" will be either true or false depending on +the subscription status. + +
POST to /passwd with password
and
+new_password
parameters to change the user's password. For
+extra protection, changing a password requires sending the user's
+current password in the password
parameter. If
+authentication is successful and the password matches, the user's
+password is set to new_password
and the server responds
+with JSON success.
+
+If the password doesn't match, or one of password
or
+new_password
are missing, the server returns JSON failure.
+
+
Most of Blërg's core functionality is packaged in a static library
+called blerg.a
. It's not designed to be public or
+installed with `make install-environment`, but it should be relatively
+straightforward to use it in C programs. Look at the headers under the
+database
directory.
+
+
A secondary library called blerg_auth.a
handles the
+authentication layer of Blërg. To use it, look at
+common/auth.h
.
+
+
As of 1.9.0, Blërg includes a perl library called
+Blerg::Database
. It wraps the core and authentication
+functionality in a perlish interface. The module has its own POD
+documentation, which you can read with your favorite POD reader, from
+the manual installed in an environment directory, or in HTML here.
+
Blërg was created as the result of a thought experiment: "What if Twitter didn't need thousands of servers? What if its millions of users could be handled by a single highly efficient server?" This is probably -an unreachable goal due to the sheer amount of I/O, but we could -certainly do better. Blërg was thus designed as a system with very -simple requirements: +an unreachable goal due to the sheer amount of I/O, but we can certainly +try to do better. Blërg was thus designed as a system with very simple +requirements:
Modern web applications have at least a four-layer approach. You -have the client-side browser app written in HTML and Javascript, the web -server, the server-side application typically written in some scripting -language (or, if it's high-performance, ASP/Java/C/C++), and the -database (usually SQL, but newer web apps seem to love object-oriented -DBs). +have the client-side browser app, the web server, the server-side +application, and the database. Your data goes through a lot of layers +before it actually resides on disk somewhere (or, as they're calling it +these days, "The Cloud" *waves hands*). Each of those layers requires +some amount of computing resources, so to increase throughput, we must +make the layers more efficient, or reduce the number of layers.
Blërg model | Blërg Client App HTML/Javascript |
---|---|
Blërg Database |
+ Blërg Database Fuckin' hardcore C and shit |
Blërg compresses the last two or three layers into one application. -Blërg can be run as either a standalone web server, or as a CGI (FastCGI -support is planned, but I just don't care right now). Less waste, more -throughput. As a consequence of this, the entirety of the application -logic that the user sees is implemented in the client app in Javascript. -That's why all the URLs have #'s — the page is loaded once and -switched on the fly to show different views, further reducing load on -the server. Even parsing hash tags and URLs are done in client JS. +
Blërg does both by smashing the last two or three layers into one +application. Blërg can be run as either a standalone web server +(currently deprecated because maintaining two versions is hard), or as a +CGI (FastCGI support is planned, but I just don't care right now). Less +waste, more throughput. As a consequence of this, the entirety of the +application logic that the user sees is implemented in the client app in +Javascript. That's why all the URLs have #'s — the page is loaded +once and switched on the fly to show different views, further reducing +load on the server. Even parsing hash tags and URLs are done in client +JS.
The API is simple and pragmatic. It's not entirely RESTful, but is rather designed to work well with web-based front-ends. Client data is @@ -333,48 +508,62 @@ until after I wrote Blërg. :)
Early in the design process, I decided to blatantly copy varnish and rely heavily on -mmap for I/O. Each user in Blërg has their own database, which consists -of one or more data and index files, and a metadata file. When a -database is opened, only the metadata is actually read (currently a -single 64-bit integer keeping track of the last record id). The data -and index files are memory mapped, which hopefully makes things more -efficient by letting the OS handle when to read from disk. The index -files are preallocated because I believe it's more efficient than -writing to it 40 bytes at a time as records are added. Here's some info -on the database's limitations: +
I was impressed by varnish's design, so I decided +early in the design process that I'd try out mmaped I/O. Each user in +Blërg has their own database, which consists of a metdata file, and one +or more data and index files. The data and index files are memory +mapped, which hopefully makes things more efficient by letting the OS +handle when to read from disk (or maybe not — I haven't +benchmarked it). The index files are preallocated because I believe +it's more efficient than writing to it 40 bytes at a time as records are +added. The database's limits are reasonable:
maximum record size | 65535 bytes |
maximum number of records per database | 264 - 1 bytes |
maximum number of records per database | 264 - 1 |
maximum number of tags per record | 1024 |
Record Index Structure |
---|
offset (32-bit integer) |
length (16-bit integer) |
flags (16-bit integer) |
timestamp (32-bit integer) |
A record is stored by first appending the data to the data file, then -writing an index entry containing the offset and length of the data, as -well as the timestamp, to the index file. Since each index entry is -fixed length, we can find the index entry simply by multiplying the -record number we want by the size of the index entry. Upshot: -constant-time random-access reads and constant-time writes. As an added -bonus, because we're using append-only files, we get lockless reads. - -
Tags are handled by a separate set of indices, one per tag. Each -index record simply stores the user and record number. Tags are -searched by opening the tag file, reading the last 50 entries or so, and -then reading all the records listed. Voila, fast tag lookups. +writing an entry in the index file containing the offset and length of +the data, as well as the timestamp. Since each index entry is fixed +length, we can find the index entry simply by multiplying the record +number we want by the size of the index entry. Upshot: constant-time +random-access reads and constant-time writes. As an added bonus, +because we're using append-only files, we get lockless reads. + +
Tag Structure |
---|
username (32 bytes) |
record number (64-bit integer) |
Tags are handled by a separate set of indices, one per tag. When a +record is added, it is scanned for tags, then entries are appended to +each tag index for the tags found. Each index record simply stores the +user and record number. Tags are searched by opening the tag file, +reading the last 50 entries or so, and then reading all the records +listed. Voila, fast tag lookups.
At this point, you're probably thinking, "Is that it?" Yep, that's it. Blërg isn't revolutionary, it's just a system whose requirements @@ -386,7 +575,43 @@ disk before returning success. This should make Blërg extremely fast, and totally unreliable in a crash. But that's the way you want it, right? :] -
When I first started thinking about the idea of subscriptions, I +immediately came up with the naïve solution: keep a list of users to +which users are subscribed, then when you want to get updates, iterate +over the list and find the last entries for each user. And that would +work, but it's kind of costly in terms of disk I/O. I have to visit +each user in the list, retrieve their last few entries, and store them +somewhere else to be sorted later. And worse, that computation has to +be done every time a user checks their feed. As the number of users and +subscriptions grows, that will become a problem. + +
So instead, I thought about it the other way around. Instead of doing +all the work when the request is received, Blërg tries to do as much as +possible by "pushing" updates to subscribed users. You can think of it +kind of like a mail system. When a user posts new content, a +notification is "sent" out to each of that user's subscribers. Later, +when the subscribers want to see what's new, they simply check their +mailbox. Checking your mailbox is usually a lot more efficient than +going around and checking everyone's records yourself, even with the +overhead of the "mailman." + +
The "mailbox" is a subscription index, which is identical to a tag +index, but is a per-user construct. When a user posts a new record, a +subscription index record is written for every subscriber. It's a +similar amount of I/O as the naïve version above, but the important +difference is that it's only done once. Retrieving records for accounts +you're subscribed to is then as simple as reading your subscription +index and reading the associated records. This is hopefully less I/O +than the naïve version, since you're reading, at most, as many accounts +as you have records in the last N entries of your subscription index, +instead of all of them. And as an added bonus, since subscription index +records are added as posts are created, the subscription index is +automatically sorted by time! To support this "mail" architecture, we +also keep a list of subscribers and subscrib...ees in each account. + +
Blërg probably doesn't actually work like Twitter because I've never actually had a Twitter account. @@ -395,20 +620,20 @@ actually had a Twitter account. Libmicrohttpd is small, but it's focused on embedded applications, so it often eschews speed for small memory footprint. This is especially apparent when you watch it chew through a POST request 300 bytes at a -time even though you've specified a buffer size of 256K. Http_blerg is -still pretty fast this way (on my 2GHz Opteron 246, blerg.httpd is still pretty fast this way — on my +2GHz Opteron 246, siege says it serves a 690-byte /get request at about 945 transactions per second, average -response time 0.05 seconds, with 100 concurrent accesses), but a -high-efficiency HTTP server implementation could knock this out of the -park. +response time 0.05 seconds, with 100 concurrent accesses — but a +fast HTTP server implementation could knock this out of the park.
Libmicrohttpd is also really difficult to work with. If you look at
-the code, http_blerg.c is about 70% longer than cgi_blerg.c simply
-because of all the iterator hoops I had to jump through to process POST
-requests. And if you can believe it, I wrote http_blerg.c first. If
-I'd done it the other way around, I probably would have given up on
-libmicrohttpd. :-/
+the code, http_blerg.c
is about 70% longer than
+cgi_blerg.c
simply because of all the iterator hoops I had
+to jump through to process POST requests. And if you can believe it, I
+wrote http_blerg.c
first. If I'd done it the other way
+around, I probably would have given up on libmicrohttpd. :-/
The data structures written to disk are dependent on the size and endianness of the primitive data types on your architecture and OS.