Automating blogging workflow - Part Three
Mar 13, 2022 06:10 · 2162 words · 11 minute read
Introduction
In this third blog post of about the management of my site, I will go through the final iteration of improvements.
Background
In the final iteration, I wanted to automate spelling and grammar checks of my site, so that I could prevent trivial issues with the quality of my blog. I considered a few alternatives such as PySpelling, but ultimately I elected to go with Vale. More on this later.
Secondly, I wanted to issue a new identity and access management (IAM) user with the most restrictive access as possible. This IAM user should only be able to have permissions to perform the tasks directly related to the S3 bucket and the CloudFront distribution ID.
Finally, although not an explicit goal, I wanted to optimise the GitHub Actions configuration to use branch protection features to prevent deviations from the process and remove any duplication of jobs.
Final Iteration Workflow
The end state at the end of the final iteration is shown below:
Note that the only difference to the workflow is the automation of the editorial review, which I will go into more detail below.
Editorial Review
As mentioned above, I have decided to automate the spelling, grammar and writing style checks using Vale.
Vale is a command-line tool that brings code-like linting to prose, It’s written in Golang and works on multiple platforms.
Some of the key features of using Vale are:
- Support for markup files, which is perfect for this use-case
- Flexible extension system, so that you can enforce an editorial writing style
- Easy-to-install and comes in a standalone binary, supported on multiple platforms
The GitHub documentation for Vale has a functionality table which explains it benefits against alternatives.
I will now go through the components which comprise the Vale linting checks.
Vale Components
The main components of Vale are as follows:
- Configuration File - An INI configuration file where you can configure core settings, format associations, and format-specific settings.
- Styles - A powerful extension system to fully customise your own writing style or spelling checks.
- Vocab - A way to maintain custom lists of terminology independent of your styles.
I will now expand on how each component is configured for my site.
.vale.ini - Vale Configuration File
The vale configuration file is used to control the majority of Vale’s behavior, including what files to lint and how to lint them. The file adopts the INI configuration format to manage the configuration.
I will refer to snippets of code, so please use the copy on my GitHub repo which will display with line numbers included as a reference when following along:
# Vale configuration file
# Docs: https://docs.errata.ai/vale/config
# Define the styles/ directory as the path to find styles configuration
StylesPath = "styles"
# Only alert on errors
MinAlertLevel = error
# Specify a vocabulary called 'Blog'. This will contain any exception words.
Vocab = Blog
# Use Vale, write-good, Microsoft, Readability and Google styles for editorial checks on Markdown files only
[*.md]
BasedOnStyles = Vale, write-good, Microsoft, Readability, Google
# Tune the usage of exclamation in text down to 'warning' level.
Google.Exclamation = warning
On lines 1 to 3, I am documenting what this file is for. I like to do this where possible so I remember what it does, but also help others looking at the repo:
# Vale configuration file
# Docs: https://docs.errata.ai/vale/config
I will be using Vale styles to add editorial writing style and spelling checks to the site. Therefore, I need to tell Vale which directory the styles folders can be found as per lines 4 to 5:
# Define the styles/ directory as the path to find styles configuration
StylesPath = "styles"
Vale has three types of alert levels; suggestions
, warnings
, and errors
. By default, it will alert on warnings
and errors
.
In lines 7 to 8, I am adjusting the alerting to only inform me of errors
as the alerting is overwhelming for an existing content repository with approximately 15,000 words. If I was starting a new content repository from scratch, it would be worth leaving the level at default:
# Only alert on errors
MinAlertLevel = error
Next, I want to specify a Vocab, which I can use to maintain custom exception words on lines 9 to 10. I will go into this later, but it’s essentially a list of words which might not be in a standard dictionary, but I’ve whitelisted as legitimate words.
# Specify a vocabulary called 'Blog'. This will contain any exception words.
Vocab = Blog
On lines 13 to 15, we’re configuring Vale to perform five different writing style checks across all markdown extension files in the repo. These styles are the official styles provided by Vale, but there is nothing preventing you writing your own.
In fact, many companies have written their own writing styles
# Use Vale, write-good, Microsoft, Readability and Google styles for editorial checks on Markdown files only
[*.md]
BasedOnStyles = Vale, write-good, Microsoft, Readability, Google
Finally, the Google writing style doesn’t like the usage of exclamation points in text. Note that this is configured in the Google Exclamation YAML configuration file for reference. I like to use exclamation points, so I’ve tuned the alert level down from error
to warning
on lines 17 to 18.
# Tune the usage of exclamation in text down to 'warning' level.
Google.Exclamation = warning
Styles
Styles are used by Vale to enforce particular writing constructs. An individual style is made up of a collection of YAML files, also known as rules.
An example of a rule is below:
# An example rule from the "Microsoft" style.
extends: existence
message: "Don't use end punctuation in headings."
link: https://docs.microsoft.com/en-us/style-guide/punctuation/periods
nonword: true
level: warning
scope: heading
action:
name: edit
params:
- remove
- ".?!"
tokens:
- '[a-z0-9][.?!](?:\s|$)'
The collection of YAML files are housed inside a folder, which must correlate with the BasedOnStyles
setting in the Vale configuration file.
In this project, I’m using the following five styles for this site:
- Vale (built-in style)
- write-good
- Microsoft
- Readability
Each have their own benefits, but by using multiple style linters, I get better coverage.
Vocab
Vocabularies (or Vocab) are a way to maintain custom lists of terminology independent of your styles. Within this repository, I use a lot of network or automation specific words which would normally fail spell checks.
The name of the Vocab configured in the configuration file is called Blog
. This aligns with the file structure below:
styles/
└── Vocab
└── Blog
└── accept.txt
Within the accept.txt
file, you can add individual entries or use regular expression to match multiple accepted words. A truncated output is shown below:
head -n 20 styles/Vocab/Blog/accept.txt
(?i)Ansible
automators
blowback
boolean
Catalin
config
dfjt
gcloud
getters
GitHub
(?i)Hostname
impactful
incentivised
Makefil(e|es)
(i|e|jun|nx)os
Mihai
minimalistic
nautobot
(?i)netbox
(?i)netmiko
These regular expression patterns or words are added to every exception list in all styles listed in BasedOnStyles
, meaning that you now only need to update your project’s vocabulary to customize third-party styles (rather than the styles themselves).
The full Vocab file can be found at this link.
GitHub Actions - Vale Editorial Review
After going through the main components which make up Vale, I will now show how I use GitHub Actions to automate the editorial review.
I will refer to snippets of code in the GitHub Actions workflow file, so please use the copy on my GitHub repo which will display with line numbers included as a reference when following along:
---
name: Vale - Editorial Review
on: # yamllint disable rule:truthy
push:
branches:
- feature/*
jobs:
prose:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@master
- name: Vale - Editorial Review
uses: errata-ai/vale-action@v1.5.0
with:
# Use Vale, write-good, Microsoft, Readability and Google styles for editorial style review.
styles: |
https://github.com/errata-ai/Microsoft/releases/latest/download/Microsoft.zip
https://github.com/errata-ai/write-good/releases/latest/download/write-good.zip
https://github.com/errata-ai/Google/releases/latest/download/Google.zip
https://github.com/errata-ai/Readability/releases/latest/download/Readability.zip
files: content/
env:
# Required, set by GitHub actions automatically:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication#about-the-github_token-secret
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
On line 2, I name the workflow Vale - Editorial Review
so that it’s different from the other workflow used to build and deploy the site.
---
name: Vale - Editorial Review
On lines 3 to 6, we’re specifying to trigger this workflow on any pushes to branches which meet the feature/*
pattern. This means when I create a branch using my feature naming standard convention.
Note that there this workflow won’t trigger on pushes to the master
branch. We could potentially get into a situation where our editorial review wouldn’t run with a direct commit to master
. This is mitigated by using branch protection which I will cover in a later section.
On lines 8 to 9, we’re configuring GitHub Actions to run the job
named prose
.
jobs:
prose:
On lines 10 to 13, we’re performing this action on an ubuntu-latest
runner and checking out the repositories code.
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@master
On lines 14 to 15, I am using the official GitHub Action for Vale, to simplify my setup and configuration:
- name: Vale - Editorial Review
uses: errata-ai/vale-action@v1.5.0
The final lines of code are inputs to the GitHub Action, which are described more below.
On lines 16 to 22, I am configuring the runner to use the four non-default writing styles when performing the Vale checks:
with:
# Use Vale, write-good, Microsoft, Readability and Google styles for editorial style review.
styles: |
https://github.com/errata-ai/Microsoft/releases/latest/download/Microsoft.zip
https://github.com/errata-ai/write-good/releases/latest/download/write-good.zip
https://github.com/errata-ai/Google/releases/latest/download/Google.zip
https://github.com/errata-ai/Readability/releases/latest/download/Readability.zip
On lines 23 to 24, we’re specifying to only perform the checks on the top level content/
directory. This is where the Markdown files for the site are hosted:
files: content/
On lines 25 to 28, we’re addressing a limitation with the current workflow which is documented here.
env:
# Required, set by GitHub actions automatically:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication#about-the-github_token-secret
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
CI Examples
To show that this workflow actually detects errors, I have deliberately misspelled a word, not used proper spacing after a sentence, and not contracted words correctly.
Does not look like fun.
The end.Is near.
I can't spell chiar.
The summary workflow indicates that there was a failure and 4 errors:
It also shows the issues detected by Vale:
Below is another example of the Vale action passing as expected:
You can view the full history of the workflow at this link.
I have now shown how the editorial review is automated using GitHub Actions and Vale. It should be noted that you can install Vale locally to perform these checks whilst developing blog posts.
For more information, please consult the documentation:
https://docs.errata.ai/vale/install
IAM Role
As mentioned in the background, I sought to issue a new identity and access management (IAM) user with the most restrictive access as possible.
Recapping what’s needed to manage my site via GitHub Actions, it needs to perform the following actions:
- Synchronise my static site files to my specific Amazon Simple Storage Service (S3) bucket
- Invalidate the CloudFront cache after the new blog post is published for a specific distribution ID
As a result, I have built the following IAM policy document which was associated to my new IAM user:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"cloudfront:ListDistributions",
"cloudfront:ListStreamingDistributions"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:DeleteObjectVersion",
"cloudfront:GetInvalidation",
"s3:ListBucket",
"cloudfront:CreateInvalidation",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"cloudfront:GetDistribution",
"cloudfront:GetStreamingDistribution",
"cloudfront:ListInvalidations",
"s3:DeleteObject",
"cloudfront:GetDistributionConfig",
"s3:PutObjectAcl",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::<S3_BUCKET_NAME>",
"arn:aws:s3:::<S3_BUCKET_NAME>/*",
"arn:aws:cloudfront::<ACCOUNT_ID>:distribution/<DISTRIBUTION_ID>",
"arn:aws:cloudfront::<ACCOUNT_ID>:streaming-distribution/<DISTRIBUTION_ID>"
]
}
]
}
The following substitutions were made on the JSON policy document:
<S3_BUCKET_NAME>
- Is the name of my S3 bucket<ACCOUNT_ID>
- Is my account ID<DISTRIBUTION_ID>
- Is my CloudFront distribution ID
The net result of this policy is that this user can’t manage any other S3 buckets or distribution IDs not explicitly stated here, which aligns granting least privilege access.
GitHub Branch Protection
As mentioned earlier, GitHub has the ability to protect certain branches..
On my repo, I’ve configured branch protection on the master
to ensure:
- All merges must be made via a pull request
- Ensure that the
Vale
status check has passed before merging
This is shown below in the following screenshots:
Conclusion
This concludes this blog post series into making improvements to the management of my site.
Below are some of my concluding thoughts:
It was a lot of content to cover, and upon reflection it shows that I’ve learned a significant amount in the last few years, other than just pure network automation.
Vale is a promising tool, and could be used to manage internal or public documentation. The idea of enforcing an agreed style certainly has benefits of consistency and quality documentation.
Whilst writing this blog post, I found over 130 errors with the current site content. Using automated tools to detect these errors has resulted in an uplift in quality of the site. I simply wouldn’t have been able to perform this at a timely scale on 15,000 words.
It also shows the value of maintaining data in version control, and the many benefits of using Git to manage that data.
Thank you for taking the time to read this and I hope this is of value to you.