File | Type | Mode | Size |
---|---|---|---|
.gitignore | file | 644 | 20 |
Dreamlands/ | dir | 0 | 0 |
README.md | file | 644 | 2593 |
env.sh | file | 644 | 68 |
generate.pl | file | 755 | 298 |
http.pl | file | 755 | 1909 |
init.sh | file | 755 | 118 |
schema.sql | file | 644 | 325 |
templates/ | dir | 0 | 0 |
train.pl | file | 755 | 2203 |
The Dreamlands
The Dreamlands is a (probably) infinite web of text and links, meant to ensnare unwitting visitors in a never ending dream. It produces text from a markov generator with a two-token lookup. The markov data is stored in a local sqlite database, which means the memory usage is minimal at the cost of some CPU and I/O.
Setup
The Dreamlands installs all of its dependencies locally, needing only a
functioning Perl 5 (for values of 5 greater than 5.10) environment and
cpanm
(most
likely available in your package manager as some variation of
cpanminus
).
First, set up the environment.
$ . ./env.sh # Note the leading dot, it's important
Then, install dependencies and initialize the database.
$ ./init.sh
Markov Training
The markov generator must be trained before The Dreamlands can run. You
will need UTF-8 text files to ingest into the markov generator. Give
them to ./train.pl
.
$ ./train.pl a_novel.txt
This will take some time as it tokenizes the text and calculates the
token relationships. The process can be restarted; already inserted
tokens will be skipped. But the process is not incremental. The
token relationships are always erased and rebuilt every time it is
trained. If you want to change the text source, you should remove
markov.db
, re-run ./init.sh
, and re-train.
Running
If you want to test that the markov generator is working properly, you
can run ./generate.pl
. It will output one paragraph of text by
default, or you can give it a number to output that many paragraphs.
The web server runs from ./http.pl
, and by default starts on port
8080. It takes no arguments, but you can change the port and path prefix
by editing the script.
The Nitty Gritty
The obvious way to use this is as a trap for badly behaved web crawlers.
I recommend you add the path to /robots.txt
, so that well behaved
crawlers avoid it. Then any crawlers that do get stuck in it are
obviously badly configured or hostile. You might use their presence as a
source for a blocklist using e.g. fail2ban.
Or you might simply wish to give them a hard time by letting them go
'round and 'round until they fall over.
Each page randomly selects tokens seeded by a hash of the path, so each page's content should be stable. Links are randomly placed within the text, linking to other pages ad infinitum.
Templating
Pages are rendered with
Template::Toolkit. The template is in
templates/page.tt
.
Clone: https://git.bytex64.net/dreamlands.git