I was browsing the Data.gov website, and found a data-set I though was interesting: the Annual Survey of School System Finances. Lacking any better ideas, I decided to take all that data and make it available as a website: School District Finances.
In the past, I might have approached a project like this by first loading the data in a relational database (MySQL most likely), and then built a web interface using Ruby on Rails or PHP. For this project, however, I thought it would be more fun to use the Ruby-based static site generator Jekyll.
Use a generator to automate creating pages
The website consists of a great many pages (14,541 HTML documents!) I create these pages programmatically using a Generator Plugin. The Jekyll manual has documentation on how to do this using “category pages” as an example. I generate all my pages using a similar approach.
Deploy using s3_website
To host the site, I use an S3 bucket configured to serve a website (this blog is hosted similarly). See Amazon’s document Hosting a Static Website on Amazon S3.
In order to deploy from my local machine to S3, I use the s3_website gem. I have an
s3_website.yml file at project root, so, deployment is as simple as
jekyll build && s3_website push.
But is it fast enough?
My biggest concern going into this was that Jekyll might not be fast enough to make generating a large static site practical. Here I am building the 14,541 pages on my ancient Core 2 Duo MacBook:
1 2 3 4 5 6 7 8 9 10 11
As I reckon it, seven minutes, while not ideal, is acceptable for the occasional build. The issue is what to do during the development process i.e. I don’t want to wait seven minutes to see how a minor tweak to my plugin changes the site. For that case, I use an environment variable to turn on a special “sample” mode for development. When in this mode, only a small number of pages are generated.
In my plugin, I have a line similar to the following:
if ENV['SAMPLE'] # only create 100 pages else # create all the pages end
During development, I execute Jekyll as so:
$ SAMPLE=true jekyll serve -w
The above sets the environment variable SAMPLE just for the executed command.