File	Type	Mode	Size
.gitignore	file	644	20
Dreamlands/	dir	0	0
README.md	file	644	2602
env.sh	file	644	68
generate.pl	file	755	298
http.pl	file	755	1984
init.sh	file	755	118
schema.sql	file	644	325
templates/	dir	0	0
train.pl	file	755	2203

README.md

The Dreamlands

The Dreamlands is a (probably) infinite web of text and links, meant to ensnare unwitting visitors in a never ending dream. It produces text from a markov generator with a two-token lookup. The markov data is stored in a local sqlite database, which means the memory usage is minimal at the cost of some CPU and I/O.

Setup

The Dreamlands installs all of its dependencies locally, needing only a functioning Perl 5 (for values of 5 greater than 5.10) environment and cpanm (most likely available in your package manager as some variation of cpanminus).

First, set up the environment.

$ . ./env.sh  # Note the leading dot, it's important

Then, install dependencies and initialize the database.

$ ./init.sh

Markov Training

The markov generator must be trained before The Dreamlands can run. You will need UTF-8 text files to ingest into the markov generator. Give them to ./train.pl.

$ ./train.pl a_novel.txt

This will take some time as it tokenizes the text and calculates the token relationships. The process can be restarted; already inserted tokens will be skipped. But the process is not incremental. The token relationships are always erased and rebuilt every time it is trained. If you want to change the text source, you should remove markov.db, re-run ./init.sh, and re-train.

Running

If you want to test that the markov generator is working properly, you can run ./generate.pl. It will output one paragraph of text by default, or you can give it a number to output that many paragraphs.

The web server runs from ./http.pl, and by default starts on port 8080. It takes no arguments, but you can change the port and path prefix by editing the script.

The Nitty Gritty

The obvious way to use this is as a trap for badly behaved web crawlers. I recommend you add the path to /robots.txt, so that well behaved crawlers avoid it. Then any crawlers that do get stuck in it are obviously badly configured or hostile. You might use their presence as a source for a blocklist using e.g. fail2ban. Or you might simply wish to give them a hard time by letting them go 'round and 'round until they fall over.

Each page randomly selects tokens seeded by a hash of the host and path, so each page's content should be stable. Links are randomly placed within the text, linking to other pages ad infinitum.

Templating

Pages are rendered with Template::Toolkit. The template is in templates/page.tt.

Clone: https://git.bytex64.net/dreamlands.git