X-Git-Url: http://git.bytex64.net/?a=blobdiff_plain;ds=sidebyside;f=www%2Fdoc%2Findex.html;h=61b8035a528b88d999ded944cc536072456988da;hb=ec5e57cc0f39e47903322f3259b2f28b982e0d15;hp=fb2ddc7d4f2085047f67e1f9a0e701984be9884c;hpb=4f3fa0594ee551da29b1ea0ed20a076410959d91;p=blerg.git diff --git a/www/doc/index.html b/www/doc/index.html index fb2ddc7..61b8035 100644 --- a/www/doc/index.html +++ b/www/doc/index.html @@ -3,6 +3,7 @@
There's no stable release, yet, but you can get everything currently +
There's no stable release yet, but you can get everything currently running on blerg.dominionofawesome.com by cloning the git repository at http://git.bytex64.net/blerg.git. @@ -69,57 +94,326 @@ sense of humor, requires ruby to compile)
I know I'm gonna get shit for not using an autoconf-based system, but -I really didn't want to waste time figuring it out. You should edit -libs.mk and put in the paths where you can find headers and libraries -for the above requirements. +
There is now an experimental autoconf build system. If you run
+add-autoconf
, it'll do the magic and create a
+configure
script that'll do the familiar things. If I ever
+get around to distributing source packages, you should find that this
+has already been done.
+
+
If you'd rather stick with the manual system, you should edit libs.mk +and put in the paths where you can find headers and libraries for the +above requirements.
Also, further apologies to BSD folks — I've probably committed several unconscious Linux-isms. It would not surprise me if the -makefile refuses to work with BSD make. If you have patches or -suggestions on how to make Blërg more portable, I'd be happy to hear -them. +makefile refuses to work with BSD make, or if it fails to compile even +with gmake. If you have patches or suggestions on how to make Blërg +more portable, I'd be happy to hear them.
At this point, it should be gravy. Type 'make' and in a few seconds,
-you should have http_blerg
, cgi_blerg
,
-rss
, and blergtool
. Each of those can be made
-individually as well, if you, for example, don't want to install the
-prerequisites for http_blerg
or cgi_blerg
.
+you should have blerg.httpd
, blerg.cgi
,
+rss.cgi
, and blergtool
. Each of those can be
+made individually as well, if you, for example, don't want to install
+the prerequisites for blerg.httpd
or
+blerg.cgi
.
+
+
NOTE: blerg.httpd is deprecated and will not be +updated with new features.
While it's not required, Blërg will be easier to set up if you -configure it to work from the root of your website. For this reason, -it's better to use a subdomain (i.e., blerg.yoursite.com is easier than -yoursite.com/blerg/). If you do want to put it in a subdirectory, you -will have to modify www/js/blerg.js and change baseURL at the top. The -CGI version should work fine this way, but the HTTP version will require -the request to be rewritten, as it expects to be serving from the root. +
While it's not strictly required, Blërg will be easier to set up if
+you configure it to work from the root of your website. For this
+reason, it's better to use a subdomain (i.e., blerg.yoursite.com is
+easier than yoursite.com/blerg/). If you do want to put it in a
+subdirectory, you will have to modify www/js/blerg.js
and
+change baseURL at the top as well as a number of other self-references
+in that file and www/index.html
.
+
+
You cannot serve the database and client from different domains +(i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and +bar.yoursite.com). This is a requirement of the web browser — the +same origin policy will not allow an AJAX request to travel across +domains (though you can probably get around it these days with Cross-origin + resource sharing). + +
Copy the files in www/ to the root of your web server. Copy
+blerg.cgi
to your web server. Included in www-configs/ is
+a .htaccess file for Apache that will rewrite the URLs. If you need to
+call the CGI something other than blerg.cgi
, the .htaccess
+file will need to be modified.
+
+
Nginx can't run CGI directly, and there's currently no FastCGI +version of Blërg, so you will have to run it under some kind of CGI to +FastCGI gateway, like the one described here on the nginx wiki. This +pretty much destroys the performance of Blërg, but it's all we've got +right now. -
Right now, http_blerg doesn't serve any static assets, so you're -going to have to put it behind a real webserver like apache, lighttpd, -nginx, or similar. Set the document root to the www directory, then -proxy /info, /create, /login, /logout, /get, /tag, and /put to -http_blerg. +
There is an optional RSS cgi (rss.cgi
) that will serve
+RSS feeds for users. Install this like blerg.cgi
above.
+As of 1.9.0, this is a perl FastCGI script, so you will have to make
+sure the perl libraries are available to it. A good way of doing that
+is to install to an environment directory, as described below.
-
Copy the files in www to the root of your web server. Copy cgi_blerg -to blerg.cgi somewhere on your web server. Included in www-configs is a -.htaccess file for apache that will rewrite the URLs. If you need to -call cgi_blerg something other than blerg.cgi, the .htaccess file will -need to be modified. +
The Makefile has support for installing Blërg into a directory that
+includes tools, libraries, and configuration snippets for shell and web
+servers. Use it as make install-environment
+ ENV_DIR=<directory>
. Under <directory>/etc will be
+a shell script that sets environment variables, and configuration
+snippets for nginx and apache to do the same. This should make it
+somewhat easier to use Blërg in a self-contained way.
-
For example, this will install Blërg to an environment directory +inside your home directory: + +
user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env +... +user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh ++ +
Then, you will be able to run tools like blergtool
, and
+it will operate on data inside ~/blerg-env/data
. Likewise,
+you can include
+/home/user/blerg-env/etc/nginx-fastcgi-vars.conf
or
+/home/user/blerg-env/etc/apache-setenv.conf
in your
+webserver to make the CGI/FastCGI scripts to the same thing.
+
+
+
Blërg's API was designed to be as simple as possible. Data sent from +the client is POSTed with the application/x-www-form-urlencoded +encoding, and a successful response is always JSON. The API endpoints +will be described as though the server were serving requests from the +root of the wesite. + +
On failure, all API calls return either a standard HTTP error +response, like 404 Not Found if a record or user doesn't exist, or a 200 +response with a 'JSON failure', which will look like this: + +
{"status": "failure"}
+
+
Blërg doesn't currently explain why there is a failure, and +I'm not sure it ever will. + +
On success, you'll either get some JSON relating to your request (for +/get, /tag, or /info), or a 'JSON success' response (for /create, /put, +/login, or /logout), which looks like this: + +
{"status": "success"}
+
+
For the CGI backend, you may get a 500 error if something goes wrong. +For the HTTP backend, you'll get nothing (since it will have crashed), +or maybe a 502 Bad Gateway if you have it behind another web server. + +
All usernames must be 32 characters or less. Usernames must contain
+only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).
+Passwords can be at most 64 bytes, and have no limits on characters (but
+beware: if you have a null in the middle, it will stop checking there
+because I use strncmp(3)
to compare).
+
+
Tags must be 64 characters or less, and can contain only the ASCII +characters 0-9, A-Z, a-z, underscore (_), and hyphen (-). + +
To create a user, POST to /create with username
and
+password
parameters for the new user. The server will
+respond with JSON failure if the user exists, or if the user can't be
+created for some other reason. The server will respond with JSON
+success if the user is created.
+
+
POST to /login with the username
and
+password
parameters for an existing user. The server will
+respond with JSON failure if the user does not exist or if the password
+is incorrect. On success, the server will respond with JSON success,
+and will set a cookie named 'auth' that must be sent by the client when
+accessing restricted API functions (/put and /logout).
+
+
POST to /logout with with username
, the user to log out,
+along with the auth cookie in a Cookie header. The server will respond
+with JSON failure if the user does not exist or if the auth cookie is
+bad. The server will respond with JSON success after the user is
+successfully logged out.
+
+
POST to /put with username
and data
+parameters, and an auth cookie. The server will respond with JSON
+failure if the auth cookie is bad, if the user doesn't exist, or if
+data
contains more than 65535 bytes after URL
+decoding. The server will respond with JSON success after the record is
+successfully added.
+
+
A GET request to /get/(user), where (user) is the user desired, will +return the last 50 records for that user in a list of objects. The +record objects look like this: + +
+{ + "record":"0", + "timestamp":1294309438, + "data":"eatin a taco on fifth street" +} ++ +
record
is the record number, timestamp
is
+the UNIX epoch timestamp (i.e., the number of seconds since Jan 1 1970
+00:00:00 GMT), and data
is the content of the record. The
+record number is sent as a string because while Blërg supports record
+numbers up to 264 - 1, Javascript uses floating point for all
+its numbers, and can only support integers without truncation up to
+253. This difference is largely academic, but I didn't want
+this problem to sneak up on anyone who is more insane than I am. :]
+
+
The second form, /get/(user)/(start record)-(end record), retrieves a +specific range of records, from (start record) to (end record) +inclusive. You can retrieve at most 100 records this way. If (end +record) - (start record) specifies more than 100 records, or if the +range specifies invalid records, or if the end record is before the +start record, the server will respond with JSON failure. + +
A GET request to /info/(user) will return a JSON object with +information about the user (currently only the number of records). The +info object looks like this: + +
+{ + "record_count": "544" +} ++ +
Again, the record count is sent as a string for 64-bit safety. + +
A GET request to this endpoint will return the last 50 records
+associated with the given tag. The first character is either # or H for
+hashtags, or @ for mentions (I call them ref tags). You should URL
+encode the # or @, lest some servers complain at you. The H alias for #
+was created because Apache helpfully strips the fragment of a URL
+(everything from the # to the end) before handing it off to the CGI,
+even if the hash is URL encoded. The record objects also contain an
+extra author
field, like so:
+
+
+{ + "author":"Jon", + "record":"57", + "timestamp":1294555793, + "data":"I'm taking #garfield to the vet." +} ++ +
There is currently no support for getting more than 50 tags, but /tag +will probably mutate to work like /get. + +
POST to /subscribe/(user) with a username
parameter and
+an auth cookie, where (user) is the user whose updates you wish to
+subscribe to. The server will respond with JSON failure if the auth
+cookie is bad or if the user doesn't exist. The server will respond
+with JSON success after the subscription is successfully registered.
+
+
Identical to /subscribe, but removes the subscription. + +
POST to /feed, with a username
parameter and an auth
+cookie. The server will respond with a JSON list of the last 50 updates
+from all subscribed users, in reverse chronological order. Fetching
+/feed resets the new message count returned from /feedinfo.
+
+
NOTE: subscription notifications are only stored while subscriptions +are active. Any records inserted before or after a subscription is +active will not show up in /feed. + +
POST to /feedinfo with a username
parameter and an auth
+cookie to get general information about your subscribed feeds.
+Currently, this only tells you how many new records there are since the
+last time /feed was fetched. The server will respond with a JSON
+object:
+
+
+{"new":3} ++ +
POST to /feedinfo/(user) with a username
parameter and
+an auth cookie, where (user) is a user whose subscription status you are
+interested in. The server will respond with a simple JSON object:
+
+
+{"subscribed":true} ++ +
The value of "subscribed" will be either true or false depending on +the subscription status. + +
POST to /passwd with a username
parameter and an auth
+cookie, plus password
and new_password
+parameters to change the user's password. For extra protection,
+changing a password requires sending the user's current password in the
+password
parameter. If authentication is successful and
+the password matches, the user's password is set to
+new_password
and the server responds with JSON success.
+
+If the password doesn't match, or one of password
or
+new_password
are missing, the server returns JSON failure.
+
+
There is an optional RSS cgi (called simply rss) that will serve RSS -feeds for users. Install this like the CGI version above (on my server, -it's at /rss.cgi). +
Most of Blërg's core functionality is packaged in a static library
+called blerg.a
. It's not designed to be public or
+installed with `make install-environment`, but it should be relatively
+straightforward to use it in C programs. Look at the headers under the
+databse
directory.
+
A secondary library called blerg_auth.a
handles the
+authentication layer of Blërg. To use it, look at
+common/auth.h
.
+
+
As of 1.9.0, Blërg includes a perl library called
+Blerg::Database
. It wraps the core and authentication
+functionality in a perlish interface. The module has its own POD
+documentation, which you can read with your favorite POD reader, from
+the manual installed in an environment directory, or in HTML here.
Blërg was created as the result of a thought experiment: "What if Twitter didn't need thousands of servers? What if its millions of users could be handled by a single highly efficient server?" This is probably -an unreachable goal due to the sheer amount of I/O, but we could -certainly do better. Blërg was thus designed as a system with very -simple requirements: +an unreachable goal due to the sheer amount of I/O, but we can certainly +try to do better. Blërg was thus designed as a system with very simple +requirements:
Modern web applications have at least a four-layer approach. You -have the client-side browser app written in HTML and Javascript, the web -server, the server-side application typically written in some scripting -language (or, if it's high-performance, ASP/Java/C/C++), and the -database (usually SQL, but newer web apps seem to love object-oriented -DBs). +have the client-side browser app, the web server, the server-side +application, and the database. Your data goes through a lot of layers +before it actually resides on disk somewhere (or, as they're calling it +these days, "The Cloud" *waves hands*). Each of those layers requires +some amount of computing resources, so to increase throughput, we must +make the layers more efficient, or reduce the number of layers.
Blërg model | Blërg Client App HTML/Javascript |
---|---|
Blërg Database |
+ Blërg Database Fuckin' hardcore C and shit |
Blërg compresses the last two or three layers into one application. -Blërg can be run as either a standalone web server, or as a CGI (FastCGI -support is planned, but I just don't care right now). Less waste, more -throughput. As a consequence of this, the entirety of the application -logic that the user sees is implemented in the client app in Javascript. -That's why all the URLs have #'s — the page is loaded once and -switched on the fly to show different views, further reducing load on -the server. Even parsing hash tags and URLs are done in client JS. +
Blërg does both by smashing the last two or three layers into one +application. Blërg can be run as either a standalone web server +(currently deprecated because maintaining two versions is hard), or as a +CGI (FastCGI support is planned, but I just don't care right now). Less +waste, more throughput. As a consequence of this, the entirety of the +application logic that the user sees is implemented in the client app in +Javascript. That's why all the URLs have #'s — the page is loaded +once and switched on the fly to show different views, further reducing +load on the server. Even parsing hash tags and URLs are done in client +JS.
The API is simple and pragmatic. It's not entirely RESTful, but is rather designed to work well with web-based front-ends. Client data is @@ -196,48 +493,62 @@ until after I wrote Blërg. :)
Early in the design process, I decided to blatantly copy varnish and rely heavily on -mmap for I/O. Each user in Blërg has their own database, which consists -of one or more data and index files, and a metadata file. When a -database is opened, only the metadata is actually read (currently a -single 64-bit integer keeping track of the last record id). The data -and index files are memory mapped, which hopefully makes things more -efficient by letting the OS handle when to read from disk. The index -files are preallocated because I believe it's more efficient than -writing to it 40 bytes at a time as records are added. Here's some info -on the database's limitations: +
I was impressed by varnish's design, so I decided +early in the design process that I'd try out mmaped I/O. Each user in +Blërg has their own database, which consists of a metdata file, and one +or more data and index files. The data and index files are memory +mapped, which hopefully makes things more efficient by letting the OS +handle when to read from disk (or maybe not — I haven't +benchmarked it). The index files are preallocated because I believe +it's more efficient than writing to it 40 bytes at a time as records are +added. The database's limits are reasonable:
maximum record size | 65535 bytes |
maximum number of records per database | 264 - 1 bytes |
maximum number of records per database | 264 - 1 |
maximum number of tags per record | 1024 |
Record Index Structure |
---|
offset (32-bit integer) |
length (16-bit integer) |
flags (16-bit integer) |
timestamp (32-bit integer) |
A record is stored by first appending the data to the data file, then -writing an index entry containing the offset and length of the data, as -well as the timestamp, to the index file. Since each index entry is -fixed length, we can find the index entry simply by multiplying the -record number we want by the size of the index entry. Upshot: -constant-time random-access reads and constant-time writes. As an added -bonus, because we're using append-only files, we get lockless reads. - -
Tags are handled by a separate set of indices, one per tag. Each -index record simply stores the user and record number. Tags are -searched by opening the tag file, reading the last 50 entries or so, and -then reading all the records listed. Voila, fast tag lookups. +writing an entry in the index file containing the offset and length of +the data, as well as the timestamp. Since each index entry is fixed +length, we can find the index entry simply by multiplying the record +number we want by the size of the index entry. Upshot: constant-time +random-access reads and constant-time writes. As an added bonus, +because we're using append-only files, we get lockless reads. + +
Tag Structure |
---|
username (32 bytes) |
record number (64-bit integer) |
Tags are handled by a separate set of indices, one per tag. When a +record is added, it is scanned for tags, then entries are appended to +each tag index for the tags found. Each index record simply stores the +user and record number. Tags are searched by opening the tag file, +reading the last 50 entries or so, and then reading all the records +listed. Voila, fast tag lookups.
At this point, you're probably thinking, "Is that it?" Yep, that's it. Blërg isn't revolutionary, it's just a system whose requirements @@ -249,7 +560,43 @@ disk before returning success. This should make Blërg extremely fast, and totally unreliable in a crash. But that's the way you want it, right? :] -
When I first started thinking about the idea of subscriptions, I +immediately came up with the naïve solution: keep a list of users to +which users are subscribed, then when you want to get updates, iterate +over the list and find the last entries for each user. And that would +work, but it's kind of costly in terms of disk I/O. I have to visit +each user in the list, retrieve their last few entries, and store them +somewhere else to be sorted later. And worse, that computation has to +be done every time a user checks their feed. As the number of users and +subscriptions grows, that will become a problem. + +
So instead, I thought about it the other way around. Instead of doing +all the work when the request is received, Blërg tries to do as much as +possible by "pushing" updates to subscribed users. You can think of it +kind of like a mail system. When a user posts new content, a +notification is "sent" out to each of that user's subscribers. Later, +when the subscribers want to see what's new, they simply check their +mailbox. Checking your mailbox is usually a lot more efficient than +going around and checking everyone's records yourself, even with the +overhead of the "mailman." + +
The "mailbox" is a subscription index, which is identical to a tag +index, but is a per-user construct. When a user posts a new record, a +subscription index record is written for every subscriber. It's a +similar amount of I/O as the naïve version above, but the important +difference is that it's only done once. Retrieving records for accounts +you're subscribed to is then as simple as reading your subscription +index and reading the associated records. This is hopefully less I/O +than the naïve version, since you're reading, at most, as many accounts +as you have records in the last N entries of your subscription index, +instead of all of them. And as an added bonus, since subscription index +records are added as posts are created, the subscription index is +automatically sorted by time! To support this "mail" architecture, we +also keep a list of subscribers and subscrib...ees in each account. + +
Blërg probably doesn't actually work like Twitter because I've never actually had a Twitter account. @@ -258,20 +605,20 @@ actually had a Twitter account. Libmicrohttpd is small, but it's focused on embedded applications, so it often eschews speed for small memory footprint. This is especially apparent when you watch it chew through a POST request 300 bytes at a -time even though you've specified a buffer size of 256K. Http_blerg is -still pretty fast this way (on my 2GHz Opteron 246, blerg.httpd is still pretty fast this way — on my +2GHz Opteron 246, siege says it serves a 690-byte /get request at about 945 transactions per second, average -response time 0.05 seconds, with 100 concurrent accesses), but a -high-efficiency HTTP server implementation could knock this out of the -park. +response time 0.05 seconds, with 100 concurrent accesses — but a +fast HTTP server implementation could knock this out of the park.
Libmicrohttpd is also really difficult to work with. If you look at
-the code, http_blerg.c is about 70% longer than cgi_blerg.c simply
-because of all the iterator hoops I had to jump through to process POST
-requests. And if you can believe it, I wrote http_blerg.c first. If
-I'd done it the other way around, I probably would have given up on
-libmicrohttpd. :-/
+the code, http_blerg.c
is about 70% longer than
+cgi_blerg.c
simply because of all the iterator hoops I had
+to jump through to process POST requests. And if you can believe it, I
+wrote http_blerg.c
first. If I'd done it the other way
+around, I probably would have given up on libmicrohttpd. :-/
The data structures written to disk are dependent on the size and endianness of the primitive data types on your architecture and OS.