Building Beautiful Data Science Blogs with Astro
You asked for it (you didn’t) and I wrote it.
Before diving into the stack and how this site runs I wanted to discuss the
requirements I had in mind at the time I put this together.
- Ownership, principally I didn’t want my content living inside of a domain owned by someone else. I had to be able to pick it up and redeploy where I want or self host if needed.
- Static, there’s no need for a Blog with a bit of styling to be shipping the 3 people that read it 30MB of JS overhead to read a little text and show some images. Besides who doesn’t love a 10/10 Lighthouse.
- JS or Python based generation. The goal of the site was to allow me to share my thoughts with as little friction as possible, so introducing another language was out of the question.
- Customisation, I want to control my own CSS and styling as a first class citizen. I struggled with a few Python SSGs that treat the theme and content as two entirely different entities - it just doesn’t work for me.
What we’ll cover
This post will show you how to build and deploy a minimal site which can convert
your notebooks into blogposts and deploy them automatically with master merges
on github.
I’ll mostly neglect styling and any deep customisation, the goal here is to get
something working you can make your own later - or explore the many
themes that exist 😌.
The Stack
Taking the above into the consideration, I landed on the following:
Content
Markdown - this all exists as markdown files. This allows me to create posts
directly (as this one is) as a markdown file.
Most Data Science notebook formats (Jupyter + Quarto are the two I use currently)
both support markdown conversion without much hassle, allowing me to embed notebooks
with ease.
Static Site Generator
Astro. While a lot more feature rich than is needed for a static blog, it also has the flexibility to customise the site to my liking without needing to fit within the confines of predefined theme layout.
Styling
Astro has great support for plugins and so picking up and adding tailwind into this build was super straightforward.
Deployment
Cloudflare Pages. Most importantly to every frugal
Data Science practicioner absolutely Free 🤑.
There are plenty of options here, but being sat behind a cache layer also helps
me in particular delivering the photos hosted on this site without big egress bills.
Lets Build It
Fire up your favourite definitely not AI generated youtube mix and lets copy paste some commands
Kickoff your Astro project
The Astro CLI puts together our skeleton project with most of what we need. Assuming this is your first dive into Astro I highly recommend following this guide without using a theme at first. While super powerful (and much more aesthetic then what we’ll be building) they bundle together too many advanced features that you need a groundwork first to understand if you even need.
npm create astro@latest -- --add tailwind
# once we're built
npm run dev
What goes where
For the most part of astro setup we’ll spend in the src
folder. So lets breakdown
what is going where.
Assets
: Reusable Page Assests such as imagesComponents
: Reusbale visual code blocksLayouts
: Reusable page structuresPages
: Navigatable Pages on our site
Lets make some changes to these pages to get a better sense of this flow.
We can start by changing our Layout.astro
file to the following:
// src/layouts/Layout.astro
---
const { title } = Astro.props;
---
<html lang="en">
<head>
<meta charset="utf-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width" />
<meta name="generator" content={Astro.generator} />
<title>{title}</title>
</head>
<body class="w-screen">
<div class="mx-auto max-w-2xl pt-5">
<slot />
</div>
</body>
</html>
We should get a compilate error at this point. We’re now using the Astro.props
object to allow our rendered pages to pass back some customisations which exist
outside of that <slot />
.
To fix this we’ll add that within our index.astro
alongside simplifying it.
// src/pages/index.astro
---
import Layout from '../layouts/Layout.astro';
---
<Layout title="Homepage">
<h1 class="text-7xl">My Blog Site</h1>
</Layout>
Adding in Some Content
Lets add a new blog
folder under the src
directory.
Creating the following:
[comment]: src/blog/first-post.md:
---
title: "The Blog Title"
excerpt: "A short summary of our blog"
pubDate: 2024-12-10
---
Just feel so proud, finally got a blog going after reading
one tutorial.
## An Example heading
### More heading, idk feeling lazy
```python
def oooh():
return "a code block!"
```
The frontmatter at the top of this post is totally customisable. It allows us to fill in posts with additional metadata outside of their content. This is mostly useful for populating other parts of the site that may link to the post as well as providing filters and tags that could be used for topics.
To find this content we need to define an astro “collection” this will live within.
To do this we create a content.config.ts
within our src
directory.
This describes the expected frontmatter on our posts in the schema, in addition
to where and how to find it.
// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";
const blog = defineCollection({
loader: glob({ pattern: "*.md", base: "./src/blog" }),
schema: z.object({
title: z.string(),
excerpt: z.string(),
pubDate: z.date(),
})
})
export const collections = { blog }
Central to how content management will work in astro, we should take a moment to
consider what is going on here.
Working backwards, ultimately we need to return a collections
object back from
this file which can then be imported into either page templates (we’ll make one
in a moment) or other components that may want to use the metadata.
Astro uses the defineCollection
call to build these, taking two arguments.
loader
, a required parameter defining where our data source lives.schema
, optional (but invaluable) that allows us to define the values being passed back. You’ll see here how this matches our blog frontmatter.
Our loader
here uses one of two inbuilt “content-grabbers”. glob
lets us define
a location to search for content. file
(unused here) allows you to define a
single file to parse to pull that content from. This is how the photography section
of this site has been defined 🥳.
Custom
loaders
are also definable if you have a more complex use case.
AutoGenerating your first page
We now need to define a template page which astro can plug our content into.
To do so create a new astro file at src/pages/blog/[...slug].astro
.
// src/pages/blog/[...slug].astro
---
import Layout from "../../layouts/Layout.astro";
import { getCollection, render } from "astro:content";
export async function getStaticPaths() {
const posts = await getCollection("blog");
return posts.map((post) => ({
params: { slug: post.id },
props: { post },
}));
}
const { post } = Astro.props;
const { Content } = await render(post);
---
<Layout>
<article>
<h1>{post.data.title}</h1>
<Content />
</article>
</Layout>
The slug
here will simply return the name of the file, but can again be tailored
to your liking. We also includes a props param containing the post content. The
render
function then allows us to convert the markdown into html.
You may need to restart your server running npm run astro sync
so that the new
type definitions for the blog can be refreshed.
Tailwind Typography
You may notice while your blog post has rendered out correctly, it doesn’t look
the best out the box as we’ve not styled any content.
For a quick win the tailwind typography
plugin give us a nice out the box experience.
Simply run npm install -D @tailwindcss/typography
. And add the following to
your tailwind.config.mjs
.
// tailwind.config.mjs
/** @type {import('tailwindcss').Config} */
module.exports = {
theme: {
// ...
},
plugins: [
require('@tailwindcss/typography'),
// ...
],
}
We can then just throw the prose keyword on our article <article class="prose">
.
Letting Users Find this post
This page should now be generated at localhost:$PORT/blog/my-first-post
. But we
still need to allow people to navigate to it.
We can start by first creating a quick Card
component that’ll act as a link
to click into from our index.
// src/components/Card.astro
---
const { title, excerpt, pubDate } = Astro.props;
---
<div class="shadow rounded-lg p-8 relative">
<div class="md:flex items-center justify-between">
<h2 class="text-2xl font-semibold leading-6 text-gray-800">{title}</h2>
<p class="text-sm font-semibold md:mt-0 mt-4 leading-6 text-gray-800">{pubDate.toLocaleDateString("en-GB")}</p>
</div>
<p class="md:w-80 text-base leading-6 mt-4 text-gray-600">{excerpt}</p>
</div>
You’ll notice this just takes the frontmatter from our post.
We can then simply iterate over this in our index.astro
file.
We can use the id
object to give us the dynamic URL to the post location.
// src/pages/index.astro
<div class="mt-4 mx-auto max-w-4xl space-y-4">
{
blogPosts.map(({data, id}) => (
<a href=`/blog/${id}/`>
<Card {...data} />
</a>
))
}
</div>
</Layout>
Rest Here a While 🏕️
SO…
This gives us a minimal wrapper moving from markdown files into blog post pages.
However, there are a few Data Science specific bits of tooling to help our posts
along.
Math Support
Currently any equations placed in the markdown files will render out simply as
text.
Astro uses remark to parse markdown, so we’ll add the
relevant math extension - in addition to the mathjax rehype
plugin which renders it.
npm install rehype-mathjax remark-math
A latex-esque plugin is also available but that requires you to add a cdn link to correctly style your content.
We then just need to tell astro about these plugins.
// astro.config.mjs
// ...
import remarkMath from 'remark-math';
import rehypeMathJaxSvg from 'rehype-mathjax';
export default defineConfig({
// ...
markdown: {
remarkPlugins: [remarkMath],
rehypePlugins: [rehypeMathJaxSvg],
}
});
If we go ahead and add some mathjax to our blog post, we should see it render.
$$
x=\frac{a}{b}
$$
Writing Posts
Finally, while writing in a straight markdown file is a reasonable experience,
we really want to be able to just pipe our Jupyter notebooks straight into place
with all the cell outputs we care about.
Simply create a new ipynb
and then run the following:
jupyter nbconvert <my-notebook>.ipynb --to=markdown
Make sure to copy over the generated notebook file and any corresponding output
images into your blog folder and it should all render as a new post 🙌.
Make sure you include the frontmatter discussed above as a single markdown cell
at the top of your notebook.
Code Styling
Broadly, code styling this way works very well. Under the hood Astro uses
shiki for the
code syntax highlights. You can change to one of many themes by changing
the selected style.
The only obvious issue I have found here is in the presentation of the
cell outputs, which render as though they were simply another code block.
To address this we can add a custom transformer which conditionally modifies our plaintext cells to add some custom styling.
// astro.config.mjs
export default defineConfig({
// ...
markdown: {
shikiConfig: {
transformers: [
{
pre(hast) {
if (hast.properties["dataLanguage"] == "plaintext") {
// inject a > before the line
this.lines.map(line => {
let prepend = {
type: 'element',
tagName: 'span',
properties: {},
children: [{ type: 'text', value: '>' }]
}
line.children.unshift(prepend)
})
// remove the top margin to push to previous element
hast.properties["style"] = hast.properties["style"].concat("margin-top:-32px")
}
hast
},
},
]
},
});
NOTE: At the time of writing astro has a bug where any updates to the config
file do not result in the dev server refreshing. Simply delete your .astro/data-store.json
file and rerun the dev server to pickup any changes!
Deployment
There are a huge range of deployment options
for astro. Ultimately our final built site can be put into anything that will
serve static files.
My preferenance has been for Cloudflare pages as it handles auto builds when pushing
to master, in addition to caching images on the sites photography slashes.\
- Push your code to your remote git repository.
- Create an account and login to the Cloudflare dashboard and select your account in Account Home > Workers & Pages.
- Hit Create, then the Pages tab and then select the Connect to Git option.
- You’ll need to authorise Cloudflare access to your repos.
- Use the following build settings:
Framework preset: Astro
Build command: npm run build
Build output directory: dist
That’s it! On every push to master you should now auto trigger a new update to your site.