blog.ndumas.com/content/posts/validating-yaml-frontmatter.../index.md

---
draft: false
title: "Validating YAML frontmatter with JSONSchema"
aliases: ["Validating YAML frontmatter with JSONSchema"]
series: []
date: "2023-06-01"
author: "Nick Dumas"
cover: ""
keywords: ["", ""]
summary: "As a collection of Markdown documents grows organically, maintaining consistency is important. JSONSchema offers a way to automatically ensure frontmatter stays up to spec."
showFullContent: false
tags:
- yaml
- jsonschema
- golang 
- obsidian 
- 
---
## Consistency is hard
Over my time using Obsidian, I've independently authored around 400 notes. Over time I've had a relatively consistent schema for my tags and frontmatter attributes:
```markdown
---
publish: false
summary: ""
aliases: []
title: ""
source: []
tags:

- Status/New
---
```

Getting too deep into what all of these mean is outside the scope of this post. For now, it's enough to know that for any Obsidian note, these properties must be present in order for my pipelines to do their job.

## Manually Managed Metadata
Until now, I managed my note frontmatter by hand, or with `sed`/`grep`. I've got a bit of experience using these tools to manipulate text files, so it's been relatively comfortable but extremely manual.

## Configuration Drift
The problem is that over time, humans get sloppy, forget things, decide to do things differently. In practice, this doesn't impact the usage of my vault in Obsidian; I access most of my notes via the Quick Switcher so filenames and aliases are the things I really focus on. 

A place where consistency does matter is when you're automating tasks. Tools that work with Markdown like static site generators care a lot about frontmatter metadata.

For these tools to work the way I expect and need them to, I need to **guarantee** that my notes are configured correctly. 

## What are the options?
This is a project I've been meditating on for a long time. The specific problem I had is that most markdown frontmatter is YAML. I'd done cursory searching and come up with no satisfying results for a "YAML schema engine", something to formally validate the structure and content of a YAML document.

I was a fool. For years I'd know that YAML was a superset of JSON, and I'd assume that the superset part meant that no tool that expects JSON could ever be guaranteed work on YAML and that's not acceptable for automation.

The detail that matters is that only the *syntax* is a superset of JSON. The underlying data types: null, bool, integer, string, array, and object, still map onto JSON 1 to 1. With that revelation, my work could finally begin.
## golang and jsonschema
My implementation language of choice is Go, naturally. Speed, type-safety, and cross-compilation all make for a great pipeline.

```go
import (
        "fmt"
        "io"

        "github.com/santhosh-tekuri/jsonschema/v5"
        _ "github.com/santhosh-tekuri/jsonschema/v5/httploader"
        "gopkg.in/yaml.v3"
)

func Validate(schemaURL string, r io.Reader) error {
        var m interface{}

        dec := yaml.NewDecoder(r)
        err := dec.Decode(&m)
        if err != nil {
                return fmt.Errorf("error decoding YAML: %w", err)
        }

        compiler := jsonschema.NewCompiler()
        schema, err := compiler.Compile(schemaURL)
        if err != nil {
                return fmt.Errorf("error compiling schema: %w", err)
        }
        if err := schema.Validate(m); err != nil {
                return fmt.Errorf("error validating target: %w", err)
        }

        return nil
}
```

`Validate()` is basically all you need in terms of Go code. The [full code repo](https://code.ndumas.com/ndumas/obsidian-pipeline) has a bit more complexity because I'm wiring things through Cobra and stuff, but here's some sample output:

```
go run cmd/obp/*.go validate -s https://schemas.ndumas.com/obsidian/note.schema.json -t Resources/blog/published/
2023/06/01 10:31:27 scanning "mapping-aardwolf.md"
2023/06/01 10:31:27 scanning "schema-bad.md"
2023/06/01 10:31:27 validation error: &fmt.wrapError{msg:"error validating target: jsonschema: '' does not validate with https://schemas.ndumas.com/obsidian/note.schema.json#/required: missing properties: 'title', 'summary', 'tags'", err:(*jsonschema.ValidationError)(0xc0000b3740)}
2023/06/01 10:31:27 error count for "schema-bad.md": 1
2023/06/01 10:31:27 scanning "schema-good.md"
```

You get a relatively detailed summary of why validation failed and a non-zero exit code, exactly what you need to prevent malformed data from entering your pipeline.

### how to schema library? 
You might notice that when I specify a schema, it's hosted at `schemas.ndumas.com`. [Here](https://code.ndumas.com/ndumas/json-schemas) you can find the repository powering that domain. 

It's pretty simple, just a handful of folders and the following Drone pipeline:
```yaml
kind: pipeline
name: publish-schemas

clone:
  depth: 1


steps:
- name: publish
  image: drillster/drone-rsync
  settings:
    key:
      from_secret: BLOG_DEPLOY_KEY
    user: blog
    port: 22
    delete: true
    recursive: true
    hosts: ["schemas.ndumas.com"]
    source: /drone/src/
    target: /var/www/schemas.ndumas.com/
    include: ["*.schema.json"]
    exclude: ["**.*"]
```

and this Caddy configuration block:
```caddy
schemas.ndumas.com {
    encode gzip
    file_server {
      browse
    }
    root * /var/www/schemas.ndumas.com
}
```

Feel free to browse around the [schema site](https://schemas.ndumas.com).

## Success Story???
At time of writing, I haven't folded this into any pipelines. This code is basically my proof-of-concept for only a small small part of a larger rewrite of my pipeline.

### Future Use Cases 
The one use-case that seemed really relevant was for users of the Breadcrumbs plugin. That one uses YAML metadata extensively to create complex hierarchies and relationships. Perfect candidate for a schema validation tool.
moving blog posts into the blog repo 5 months ago			`---`
			`draft: false`
			`title: "Validating YAML frontmatter with JSONSchema"`
			`aliases: ["Validating YAML frontmatter with JSONSchema"]`
			`series: []`
			`date: "2023-06-01"`
			`author: "Nick Dumas"`
			`cover: ""`
			`keywords: ["", ""]`
A lot more polish, showing summaries 5 months ago			`summary: "As a collection of Markdown documents grows organically, maintaining consistency is important. JSONSchema offers a way to automatically ensure frontmatter stays up to spec."`
moving blog posts into the blog repo 5 months ago			`showFullContent: false`
			`tags:`
			`- yaml`
			`- jsonschema`
			`- golang`
			`- obsidian`
			`-`
			`---`
			`## Consistency is hard`
			`Over my time using Obsidian, I've independently authored around 400 notes. Over time I've had a relatively consistent schema for my tags and frontmatter attributes:`
			```markdown
			`---`
			`publish: false`
A lot more polish, showing summaries 5 months ago			`summary: ""`
moving blog posts into the blog repo 5 months ago			`aliases: []`
			`title: ""`
			`source: []`
			`tags:`

			`- Status/New`
			`---`
			```

			`Getting too deep into what all of these mean is outside the scope of this post. For now, it's enough to know that for any Obsidian note, these properties must be present in order for my pipelines to do their job.`

			`## Manually Managed Metadata`
			Until now, I managed my note frontmatter by hand, or with `sed`/`grep`. I've got a bit of experience using these tools to manipulate text files, so it's been relatively comfortable but extremely manual.

			`## Configuration Drift`
			`The problem is that over time, humans get sloppy, forget things, decide to do things differently. In practice, this doesn't impact the usage of my vault in Obsidian; I access most of my notes via the Quick Switcher so filenames and aliases are the things I really focus on.`

			`A place where consistency does matter is when you're automating tasks. Tools that work with Markdown like static site generators care a lot about frontmatter metadata.`

			`For these tools to work the way I expect and need them to, I need to guarantee that my notes are configured correctly.`

			`## What are the options?`
			`This is a project I've been meditating on for a long time. The specific problem I had is that most markdown frontmatter is YAML. I'd done cursory searching and come up with no satisfying results for a "YAML schema engine", something to formally validate the structure and content of a YAML document.`

			`I was a fool. For years I'd know that YAML was a superset of JSON, and I'd assume that the superset part meant that no tool that expects JSON could ever be guaranteed work on YAML and that's not acceptable for automation.`

			`The detail that matters is that only the syntax is a superset of JSON. The underlying data types: null, bool, integer, string, array, and object, still map onto JSON 1 to 1. With that revelation, my work could finally begin.`
			`## golang and jsonschema`
			`My implementation language of choice is Go, naturally. Speed, type-safety, and cross-compilation all make for a great pipeline.`

			```go
			`import (`
			`"fmt"`
			`"io"`

			`"github.com/santhosh-tekuri/jsonschema/v5"`
			`_ "github.com/santhosh-tekuri/jsonschema/v5/httploader"`
			`"gopkg.in/yaml.v3"`
			`)`

			`func Validate(schemaURL string, r io.Reader) error {`
			`var m interface{}`

			`dec := yaml.NewDecoder(r)`
			`err := dec.Decode(&m)`
			`if err != nil {`
			`return fmt.Errorf("error decoding YAML: %w", err)`
			`}`

			`compiler := jsonschema.NewCompiler()`
			`schema, err := compiler.Compile(schemaURL)`
			`if err != nil {`
			`return fmt.Errorf("error compiling schema: %w", err)`
			`}`
			`if err := schema.Validate(m); err != nil {`
			`return fmt.Errorf("error validating target: %w", err)`
			`}`

			`return nil`
			`}`
			```

			`Validate()` is basically all you need in terms of Go code. The [full code repo](https://code.ndumas.com/ndumas/obsidian-pipeline) has a bit more complexity because I'm wiring things through Cobra and stuff, but here's some sample output:

			```
			`go run cmd/obp/*.go validate -s https://schemas.ndumas.com/obsidian/note.schema.json -t Resources/blog/published/`
			`2023/06/01 10:31:27 scanning "mapping-aardwolf.md"`
			`2023/06/01 10:31:27 scanning "schema-bad.md"`
A lot more polish, showing summaries 5 months ago			`2023/06/01 10:31:27 validation error: &fmt.wrapError{msg:"error validating target: jsonschema: '' does not validate with https://schemas.ndumas.com/obsidian/note.schema.json#/required: missing properties: 'title', 'summary', 'tags'", err:(*jsonschema.ValidationError)(0xc0000b3740)}`
moving blog posts into the blog repo 5 months ago			`2023/06/01 10:31:27 error count for "schema-bad.md": 1`
			`2023/06/01 10:31:27 scanning "schema-good.md"`
			```

A lot more polish, showing summaries 5 months ago			`You get a relatively detailed summary of why validation failed and a non-zero exit code, exactly what you need to prevent malformed data from entering your pipeline.`
moving blog posts into the blog repo 5 months ago
			`### how to schema library?`
			You might notice that when I specify a schema, it's hosted at `schemas.ndumas.com`. [Here](https://code.ndumas.com/ndumas/json-schemas) you can find the repository powering that domain.

			`It's pretty simple, just a handful of folders and the following Drone pipeline:`
			```yaml
			`kind: pipeline`
			`name: publish-schemas`

			`clone:`
			`depth: 1`


			`steps:`
			`- name: publish`
			`image: drillster/drone-rsync`
			`settings:`
			`key:`
			`from_secret: BLOG_DEPLOY_KEY`
			`user: blog`
			`port: 22`
			`delete: true`
			`recursive: true`
			`hosts: ["schemas.ndumas.com"]`
			`source: /drone/src/`
			`target: /var/www/schemas.ndumas.com/`
			`include: ["*.schema.json"]`
			`exclude: ["*."]`
			```

			`and this Caddy configuration block:`
			```caddy
			`schemas.ndumas.com {`
			`encode gzip`
			`file_server {`
			`browse`
			`}`
			`root * /var/www/schemas.ndumas.com`
			`}`
			```

			`Feel free to browse around the [schema site](https://schemas.ndumas.com).`

			`## Success Story???`
			`At time of writing, I haven't folded this into any pipelines. This code is basically my proof-of-concept for only a small small part of a larger rewrite of my pipeline.`

			`### Future Use Cases`
			`The one use-case that seemed really relevant was for users of the Breadcrumbs plugin. That one uses YAML metadata extensively to create complex hierarchies and relationships. Perfect candidate for a schema validation tool.`