Simpler times

Markdown format

I’ve been writing posts weekly but I haven’t published them yet. That’s because publishing a post here is not straightforward: I write it on Google Docs, then I pass it to my wife for corrections (she is more fluent in English than me) and then it must be reformatted to HTML. After that, I should regenerate the blog and upload it to the host using ssh.

The idea is to make this process more efficient. First thing I was tackling was HTML formatting. The posts don’t have a complex HTML structure: they are

paragraphs, inline code, blocks of code and some figures and tables when needed.

When I paste them into Visual Studio Code for editing, I end with one long line for each paragraph where I must add the start and end tag manually. Them, I must break the lines in 80 columns, more or less. After that, I add the code’s tags and additional HTML when needed. It’s not very much work but it’s boring. And if I modify the contents after formatting, the lines aren’t 80 cols width anymore and I must readjust them.

To solve that I thought of writing the posts using Markdown. I could use it directly in Google Docs (it’s quite human-readable) and then I could paste it directly on the VS Code’s editor. I think it should exist an extension to reformat it to 80 cols, so I shouldn’t do anything more. After that I would convert Markdown to HTML before rendering the post, keeping the original markdown version in the database.

The idea with this blog is to learn, so: why not implement my own markdown to HTML converter? I know there are converters available, but I’d learn much more if I built one myself. It’s been years since I implemented my last compiler (in Visual Basic 6!) and I could refresh my knowledge.

The first thing I did was searching for Markdown’s specification. For my surprise, that specification doesn’t exist. There are lots of different flavours and an attempt to unify the versions with something called Commonmark. I was expecting to find the syntax rules, something like this:


expression = atom   | list
atom       = number | symbol    
number     = [+-]?['0'-'9']+
symbol     = ['A'-'Z']['A'-'Z''0'-'9'].*
list       = '(', expression*, ')'

But I only found something similar in one of the Markdown’s implementations.

After that, I downloaded a pair of compiler design books: the classic “Compiler: principles, techniques and tools” and “Modern compiler design”, from Dick Grune et al. I went through the books a little bit and refreshed my knowledge, revising what are the steps to analyze a language and generating something from it.

The last step was to abandon the project.

I didn’t remember that building a compiler was so much work, and I didn’t know Markdown was so complex. Although I’d need to implement only a subset of the language, given the time I have to code, programming a simple compiler would take me at least a month. Moreover, it wouldn’t have any practical utility beyond practising some Python. So, after weighing the pros and cons I think my time will be best spent doing other things.

That’s one of the things I’m learning as I age: my time is limited and, if I do something, there is another thing I won’t be able to do. I prefer to spend more time learning Artificial Intelligence: in the time I would spend programming a compiler maybe I can do one of deeplearning.ai courses.

But formatting paragraphs still bother me, so I’ll try something more simple: writing some code to reformat

tags and paragraphs. Perhaps I’ll take a look to VS Code extensions to implement my own beautifier, I think it will be quick to implement one to do that simple formatting on HTML documents.