Blog de Nathan Story

Using Jekyll to Power a Large Site (15k Pages)

| Comments

I was browsing the website, and found a data-set I though was interesting: the Annual Survey of School System Finances. Lacking any better ideas, I decided to take all that data and make it available as a website: School District Finances.

In the past, I might have approached a project like this by first loading the data in a relational database (MySQL most likely), and then built a web interface using Ruby on Rails or PHP. For this project, however, I thought it would be more fun to use the Ruby-based static site generator Jekyll.

Use a generator to automate creating pages

The website consists of a great many pages (14,541 HTML documents!) I create these pages programmatically using a Generator Plugin. The Jekyll manual has documentation on how to do this using “category pages” as an example. I generate all my pages using a similar approach.

Deploy using s3_website

To host the site, I use an S3 bucket configured to serve a website (this blog is hosted similarly). See Amazon’s document Hosting a Static Website on Amazon S3.

In order to deploy from my local machine to S3, I use the s3_website gem. I have an s3_website.yml file at project root, so, deployment is as simple as jekyll build && s3_website push.

But is it fast enough?

My biggest concern going into this was that Jekyll might not be fast enough to make generating a large static site practical. Here I am building the 14,541 pages on my ancient Core 2 Duo MacBook:

$ time jekyll build
Configuration file: /Users/nstory/src/schools/_config.yml
            Source: /Users/nstory/src/schools
       Destination: /tmp/school_site
 Auto-regeneration: disabled. Use --watch to enable.

real  7m17.794s
user  6m5.595s
sys   0m6.286s

As I reckon it, seven minutes, while not ideal, is acceptable for the occasional build. The issue is what to do during the development process i.e. I don’t want to wait seven minutes to see how a minor tweak to my plugin changes the site. For that case, I use an environment variable to turn on a special “sample” mode for development. When in this mode, only a small number of pages are generated.

In my plugin, I have a line similar to the following:

  # only create 100 pages
  # create all the pages

During development, I execute Jekyll as so:

$ SAMPLE=true jekyll serve -w

The above sets the environment variable SAMPLE just for the executed command.