Hello, World

— 25 December, 2016

It's cold out and Christmas is here, time to curl up with a hot cocoa and blanket and read about how this blog was set up! ☕☕☕

A "Hello, World!" program is a computer program that outputs or displays "Hello, World!" to the user. Being a very simple program in most programming languages, it is often used to illustrate the basic syntax of a programming language for a working program. -- Wikipedia

If you're reading this, hopefully it means that my little "Hello, World" program worked! What program, you ask? As it turns out, this entire blog was built with a simple static site generator that I wrote in a weekend using Python! Essentially, it's operates quite similarly to programs like Jekyll and Hexo, only with fewer features and significantly more bugs! All jokes aside, I had way more fun solving this one myself than I would have if I had just taken one of these prebuilt solutions. Better to learn something, right? :)

The Goal

We want our script to take a source directory that has posts organized in a nice, readable way (read: markdown, folders, etc.) and convert it to something accessible by a web browser (read: horrifying tag hell). I didn't really put a lot of thought into the directory structure I chose for the source files, but something like this seems to look pretty nice, at least:

$ tree
├── base
│   └── css
│       └── style.css
├── posts
│   ├── Hello, World
│   │   ├── date
│   │   └── post.md
│   └── Post Number 2
│       ├── date
│       ├── image1.png
│       ├── image2.png
│       └── post.md
└── templates
    ├── homepage.html
    └── post.html

Everything inside the base folder is copied indiscriminately to the final build. This is useful for pages that never change (like an "about me" section, CSS, etc.). The posts directory contains a folder for every post that the blog has. Each of these folder is required to have a post.md file with the text content (formatted in markdown!). If there is a date file, the date inside will be used instead of the time the folder was last modified for the post timestamp. Any extra resources needed for the post (images, videos, audio, downloadable scripts, etc.) can also be included in the post's folder. Finally, pages in the blog are generated using the templates located under the templates folder.

After our build script runs, we should end up with a website that looks like this:

$ tree
├── 2016
│   └── 12
│       ├── hello-world.html
│       └── image-example-post.html
├── audio
├── css
│   └── style.css
├── img
│   ├── 20161225_image1.png
│   └── 20161225_image2.png
├── index.html
├── other
└── vid

Copying some files

This isn't exactly a one step process, so let's break it up into pieces. First, we copy everything from base to our build directory, where our finished website will reside. This is pretty easy using Python's pathlib and shutil, however, shutil refuses to overwrite files when copying, so we need to ensure that it never tries to.

def copy_base(base_dir, build_dir):
    for f in base_dir.iterdir():
        # check if there's a conflict, fix if there is
        if build_dir.joinpath(f.relative_to('base')).exists():
            rmtree(str(build_dir.joinpath(f.relative_to('base'))))
        # copy the files
        copytree(str(f), str(build_dir.joinpath(f.relative_to('base'))))

One interesting thing to note about this code is the fact that we have to convert our pathlib objects to strings before we can use them in the shutil file manipulation functions. This is due to a weird bit of incosistency in Python's standard library that should be fixed in version 3.6 (see PEP 519)

Getting the posts

Next, we'll retrieve information about all of the blog posts in the posts folder, again, using pathlib. We'll store the results of our search in a list of dictionaries, each dictionary containing the actual post (as markdown), the date the post was last modified at, the timestamp from the date file (if there is one), and lists of paths for all other resources used.

# retrieves information about the post in directory 'dir'
def get_post(dir):
    modified = 0
    post = None
    date = None
    img = []
    vid = []
    audio = []
    other = []
    for f in dir.iterdir():
        if f.is_dir():
            raise OSError('subdirectories are not allowed for posts')
        # store most recent modification time
        if f.stat().st_mtime > modified:
            modified = f.stat().st_mtime

        if f.name == 'post.md':
            post = f
        elif f.name == 'date':
            date = f

        # categorize resource types
        elif f.name.endswith(('png', 'jpg', 'jpeg', 'gif')):
            img.append(f)
        elif f.name.endswith(('mp4', 'avi')):
            vid.append(f)
        elif f.name.endswith(('mp3', 'wav', 'ogg')):
            audio.append(f)
        else:
            other.append(f)

    # make sure post.md was found
    if post is None:
        raise OSError('post.md not found in ' + dir.name)

    # return everything in a dictionary
    d = {
        'name': dir.name,
        'md': post.read_text(),
        'img': img,
        'vid': vid,
        'audio': audio,
        'other': other,
        'modified': datetime.fromtimestamp(modified)
    }
    if date is not None:
        d['date'] = parse(date.read_text())

    return d

# gets the info for all posts in directory 'dir'
def get_posts(dir):
    posts = []
    for p in dir.iterdir():
        posts.append(get_post(p))
    return posts

A keen reader may have noticed that simply copying the original paths of our extra resources (like images) to the new build directory will not produce the result given earlier. Instead, we copy them to new directories based on their type (/img/, /vid/, /audio/, /other/) and salt their filenames with the date to avoid name conflicts in the future.

salt = post['modified'].strftime('%Y%m%d_')
for i in post['img']:
    new_name = salt + i.name
    copy(str(i), str(BUILD_DIR.joinpath('img/' + new_name)))
for v in post['vid']:
    new_name = salt + v.name
    copy(str(v), str(BUILD_DIR.joinpath('vid/' + new_name)))
for a in post['audio']:
    new_name = salt + a.name
    copy(str(a), str(BUILD_DIR.joinpath('audio/' + new_name)))
for o in post['other']:
    new_name = salt + o.name
    copy(str(o), str(BUILD_DIR.joinpath('other/' + new_name)))

HTML conversion

Finally, we fix all links in the original markdown to reference the new location and name. The approach used here may be a little redundant, considering we already have the new filenames, but I'm not going to worry about it too much.

def fix_links(md, salt, filenames):
    for f in filenames:
        if f.endswith(IMG_EXT):
            md = md.replace(f, '/img/' + salt + f)
        elif f.endswith(VID_EXT):
            md = md.replace(f, '/vid/' + salt + f)
        elif f.endswith(AUDIO_EXT):
            md = md.replace(f, '/audio/' + salt + f)
        else:
            md = md.replace(f, '/other/' + salt + f)
    return md

Now that our markdown is ready, we can convert it to HTML using one of Python's many markdown modules. For convenience, we'll select this one. Conversion is done in a single line of code:

html = markdown.markdown(post['md'])

All that's left is to construct the final HTML of the posts and homepage. For this, we'll use Mako, which is a superfast and easy to use HTML generation tool. Mako is a templating library, which means that it works by dropping variables and functionality into a normal HTML document. For example, consider this template:

<html>
    <body>Hey $(name), how's it going?</body>
</html>

If we want our document to address someone named "Slick", then we can call this Python code:

from mako.template import Template

t = Template("<html><body>Hey $(name), how's it going?</body</html>")
print(t.render(name='Slick')

This outputs an HTML page that looks like this:

<html>
    <body>Hey Slick, how's it going?</body>
</html>

This is a pretty simplistic example, but luckily we're running a pretty simplistic blog, so writing the dynamic section of our post template shouldn't be too difficult:

<div id="wrapper">
    <h1><a href="#">${name}</a></h1>
    <h2>&mdash; ${date}</h2>
    ${content}
</div>

Similarly, the homepage:

<div id="wrapper">
    <ul>
    % for n, t, l in items:
        <li>${n} - <a href=${l}>${t}</a></li>
    % endfor
    </ul>
</div>

Much of the static HTML has been omitted for brevity, but this should give you an idea of how the templating works. With that finished, all that's left to do is create files in the build directory for index.html and the each post. I'll leave this as an exercise to the reader, as this blog post has gotten much longer than I ever intended it to. The final build.py script for this site will be available soon™, I just have to clean up the code a bit. For a detailed look at the results, just look around you!

Anyway, It's been a fun first entry friends... and don't worry, probably one of the most code heavy that will ever be written! If nothing else, at least it will help me document my own code :)

Until next time,
Stephan <3

Update 1/4/17: Build script now available here! Make sure all of the dependencies are installed :)