ngeor.com

My Maven release workflow v2

2022-12-03T07:53:17+00:00

An update on my Maven release workflow, in other words, how I release Maven libraries into the Maven central repository.

This is a followup from the previous post.

I haven’t updated the blog a lot lately but also I haven’t been coding much in something new, so there isn’t much to update. The most coding I did at home was in Rust, and that was trying to refactor some parts in rusty-basic, no new features.

However, I did make some changes in how I (seldom) release my Java libraries into the Maven central repository.

Instead of using a specially named branch as the CI pipeline trigger, I use a tag. So instead of a branch named release-x.y.z, the release starts with a tag vx.y.z.
Not using the Maven release plugin anymore. I find it a bit difficult to work with, so I’ve stopped using it. I wrote a custom tool to replace it, which switches from snapshot to release version, generates the changelog with git-cliff, pushes the tag, and finally switches back to the snapshot version. The tool supports also npm and python projects (then again, like I said, I haven’t been using the tool a lot lately). When deploying to the Maven central, I just run mvn deploy.

The previous approach with the release branch would run all the release ceremony on the CI server, while with this the release is kicked off on a local laptop.

In terms of running commands, in the previous approach it would be something like:

git checkout -b release-1.2.3
git push -u origin HEAD

and then wait until the release is magically done on the CI server.

In the new approach, using the custom tool I mentioned before:

krt --type=maven minor

which will prepare the release (changelog, changing the pom.xml, creating the git tag) locally and then push to the remote where the CI pipeline on the tag will deploy to Maven central.

The downside is that it requires some tooling to be present on the local machine (git-cliff and the krt tool) but on the positive side the CI pipeline is less complicated.

I still have a Python script copy pasted across every single Java library project which performs the release on the tag pipeline, which I would like one day to move to a central point to avoid duplication.

My Maven release workflow

2022-02-05T07:33:40+00:00

In this post I’m describing my current setup regarding releasing a versioned library into Maven central repository.

Update: see latest in v2

Initiating the release

The release starts from the default branch. There shouldn’t be any pending changes. The version in the pom.xml must be a snapshot version. To trigger the release, I create and push a new branch named release-x.y.z, where x.y.z is the version that I want to release. This branch pattern is picked up by a dedicate GitHub Actions workflow which does the work.

To create and push the branch, I use a Python script (release.py) which I also to perform and finalize the release. So for this first step, I run something like ./scripts/release.py initialize --version x.y.z.

Performing the release

The release is done through a GitHub Action. The custom release.py script does all the work, which is a lot:

Determine the release version. It checks the environment variables GITHUB_REF_TYPE, which must be set to branch, and GITHUB_REF_NAME, which must follow the pattern release-x.y.z.
Import a GPG key. In order to publish to the central Maven repository, there’s a process which I have long forgotten (not as easy as publishing to npm), which involves having your own GPG key. I store the key as a file in the repo in a secure way. To import and use the key, I need to configure two secrets in GitHub: GPG_KEY and GPG_PASSPHRASE. This also needs the gpg binary to be present on the system.
Configure git identity. Since we’re going to be making a few commits, git needs to know who we are. The script just runs git config user.name "My name" and git config user.email "My email".
Preparing the release. This uses the Maven release plugin and runs the release:prepare goal. I am passing -DreleaseVersion=x.y.z to use the version I indicated when I created the release branch. Also, -DpushChanges=false, because the plugin has some difficulty pushing when running through GitHub Actions which I didn’t want to dive into. When this step finishes, there are new commits on the current branch and a new git tag. The first commit switches from snapshot to release version (e.g. from 1.2.3-SNAPSHOT to 1.2.3) and then the next switches to the snapshot version of the next iteration (e.g. 1.3.0-SNAPSHOT).
Update the changelog. I use git cliff to automatically generate a changelog stored in the git repo as CHANGELOG.md. This is a good opportunity to do so, as we just got a fresh tag and we haven’t pushed the changes yet. I install git cliff in an eariler step of the GitHub Action by curling the latest release and extracting it somewhere in the PATH. Another important point, for git cliff to work, the checkout action needs to fetch all commits with fetch-depth: 0, otherwise the changelog will be empty.
Push changes to the current branch. This is apparently supported out of the box in GitHub Actions, a pleasant surprise. It also doesn’t result in an some infinite loop of workflows being executed. Simply run git push --follow-tags (to also push the tag created by the Maven release plugin).
Perform the release. This uses again the Maven release plugin but this time runs the release:perform goal. The script creates dynamically a Maven settings.xml file which contains my credentials for the Maven central repository server (stored also as GitHub secrets) and the GPG secrets as properties gpg.keyname and gpg.passphrase. The server id in the settings.xml needs to match the server id in my pom.xml’s distributionManagement section, as well as the nexus-staging-maven-plugin serverId property. I pass -DlocalCheckout=true to the release plugin so that it doesn’t attempt to clone anything from GitHub.
Clean up: delete the GPG key.

Finalizing the release

Meanwhile, on my computer, I just wait for the pipeline to finish. I get an e-mail from Sonatype saying that it has analyzed my release and hasn’t found any vulnerabilities. At this point, my release is done. I need to merge the release branch and delete it. I use the same release.py script for it and run ./scripts/release.py finalize.

Show me the code

release.py The custom Python script to orchestrate all the above.
release.yml The GitHub Action workflow.
pom.xml

Painpoints

Some of the complexity probably comes from Maven’s notion of snapshot and release versions, but I wanted to follow that convention.

One obvious problem is that it’s very easy to make a mistake in the version, e.g. push a branch like release-30.0.0 instead of release-3.0.0.

Furthermore, this probably only works for a team of one. It’s fine for me and my hobby project, but if you have two people, it doesn’t offer any protection from two people accidentally trying to create a release (one person thinking they should be pushing version 1.2.3 as a hotfix and another person thinking they should be pushing version 1.3.0 as a minor version). Essentially, the choice of the version number is outside the approval process. The person who initiates the release can do so if they have the right to push a branch, without double checking with someone else.

Maven and monorepo

2021-09-26T07:47:00+00:00

In this post, I’m playing with releasing a subset of libraries from a Maven monorepo.

Nothing simpler that a git repo that contains just one Maven project without Maven modules. Everything is simple: versioning, releasing, creating the automatic CI/CD pipeline… until the time when the second project comes.

If the second project is totally unrelated to the first one, you might as well create a second git repository. Arguably, some copy pasting will be annoying, e.g. copy pasting the CI/CD pipeline, setting up again the credentials required to authenticate against the remote Maven repository. But it’s not the end of the world. Maybe with some tooling, you can even coordinate things like upgrading common non-functional stuff like checkstyle.

On the other hand, if the second project depends on the first one, things are a bit more difficult. The change cycle is longer. You need to first implement the change on the first project, publish it to the Maven repository, consume it on the second project. If a mistake was made, you need to make a patch release on the first project. There’s more ceremony.

A monorepo starts to sound like a better solution. Setup one and only one git repository where all your Maven projects live, under a common parent Maven project. Life is good again, all source code is at your fingertips and you can change multiple libraries in a single commit.

But then you need to publish your libraries to a remote Maven repository (because they’re so awesome). And now versioning becomes a bit more complicated.

Let’s see what happens if we use the maven release plugin. Preparing the release with mvn release:prepare will do the following:

interactively ask for the next release version for each library
interactively ask for the next snapshot version for each library
interactively ask for the git tag
create a commit with the release versions and tag it
create a follow-up commit setting the snapshot versions

Performing the release with mvn release:perform will deploy the release versions to the remote Maven repository.

The thing is, this will deploy all of the libraries. If your monorepo contains multiple libraries, which are all related (think of Spring for example, or Jackson), then maybe this is perfect for you. Your libraries all form a logical product, so it makes sense to release them all in one go, even if there haven’t been any changes to some of them. The versioning will probably be simple as well, one version number for everything in the monorepo.

But in a monorepo where libraries are not necessarily related to each other, this doesn’t scale well. You’ll have to answer a lot of questions if you want to have control over the versions. You can also skip the excessive interrogation with -B for batch mode, but then all versions will be automatically chosen (if using SemVer policy, it will bump a minor version).

Deploying all of your libraries, even if they had no changes since their last release, might be confusing to your users, who might be wondering why they need to upgrade at all.

In my search for a better approach, I found two nice links:

The motivation behind the first article is sparse checkouts and git performance when you’re dealing with a huge monorepo that tests the boundaries of git. But the trick we’re using is the same: modify the parent pom file, keeping only a subset of the modules we are interested in.

An interesting detail is that they don’t commit pom.xml files at all into the monorepo, to avoid git noise. They commit pom.xml.template files, which are used to auto-generate pom.xml files based on the desired Maven modules and version numbers.

In my repo, I already have pom.xml files and I didn’t want to remove them for now, but I think the template approach they’re taking is better.

I hacked together some code to automate this (works on my machine), but the principle is the following:

Make a backup of the parent pom.xml in case things go wrong
Make sure we’re on the default branch and there aren’t any pending changes
Create a branch for the release
Modify the parent pom.xml, keeping only the modules we want to release
Commit the pom.xml
Prepare the release with mvn release:prepare. This will create two commits, one with the release versions and one with the next snapshot versions. The first commit will be tagged and the command will push the tag to the remote git repository.
Have a CI job pick up the pushed tag from the previous step and deploy to the remote Maven repository. When the job finishes, you have published your Maven artifacts.
Modify again the parent pom.xml, putting back all the modules.
Because the version of the parent pom.xml has changed, we need to fix that version in all child modules that were previously excluded from the parent pom (you might want to write some tooling for that).
Commit the modified pom.xml files, merge the release branch into the default branch and delete it

Bonus: in the era of gofmt, rustfmt, etc, I like formatters that are opinionated and are not configurable. I keep pom.xml files sorted with a Maven plugin:

mvn com.github.ekryd.sortpom:sortpom-maven-plugin:sort \
  -Dsort.createBackupFile=false

How can this be improved further?

First of all, dependencies. If a library has changed, it is probably a good idea to automatically publish any libraries that depend on it.
Second, I shouldn’t even have to tell it which libraries have changed. It should go through git history, since the previous release, and figure out what paths have changed.
Finally, changelog. It should be possible to go over the git history and create a somewhat organized changelog, dividing the changes by affected library.

CI for Visual C++ 6

2021-09-18T06:02:01+00:00

Some time ago, I thought of hacking on some really old C/C++ code I had written. I have a VirtualBox image running Windows 2000 and Visual Studio 6 (yes, that old). However, I also wanted to be able to run the code in modern Visual Studio 2019. Verifying that I haven’t broken anything means I would have to manually build my code on two different IDEs. That’s just boring. Instead, I hacked together a way of running my build for both Visual Studio versions.

Normally, this would have been an easy task. Install TeamCity server. Install one TeamCity agent on a machine that has Visual Studio 2019 and one agent on a machine that has Visual Studio 6. The problem is that Windows 2000 is rather old, so you can’t install TeamCity there (or anything else for that matter, I’ve tried to install various programming languages to total failure).

So the Visual Studio 2019 part is easy. I install TeamCity (both server and agent) on my laptop, define a job, done. For the Visual Studio 6 part, I needed to somehow mimic the work of a build agent, which is more or less the following:

start the VM that has Visual Studio 6
get the source code into the machine
run the build
get the build artifacts out of the machine
stop the VM

You can go with variations regarding the lifecycle management of the VM, e.g. just leave it running instead of shutting it down, clone a new VM instead of reusing the same to achieve parallel builds, etc. In my case, I just start and stop it per job, as it’s not too time consuming. I rely on TeamCity’s shared resources feature to ensure only one job can use that VM at any time.

Luckily, VirtualBox has a CLI (VBoxManage.exe) which supports all needed operations:

startvm --type headless to start the VM without having it pop up in my screen while I’m working.
controlvm poweroff to turn off the machine and snapshot restorecurrent to fall back to the VirtualBox snapshot that brings the VM to the pristine state.
copyto to copy files into the VM (the source code)
run to run the build
copyfrom to copy files out of the VM (the interesting files are the artifacts, exe and dll files, and the build log)

Of course, in practice I encountered more problems. First of all, the startvm command returns immediately (if this was a real machine, it returns as soon as you press the power button to turn the machine on). I can’t just start copying files into the VM yet, Windows 2000 hasn’t loaded. As a workaround, I used the guestcontrol stat C:\ command in a wait loop. This command checks if the C:\ drive exists in the VM. If it succeeds it means not only that Windows 2000 has loaded, but also the VirtualBox Guest Extensions are ready (which is what enables file transfers).

The second problem was that file tranfers were unpredicatable: sometimes they would fail, or time out. In the end, I figured out that this had to do with how VirtualBox logs in into the VM. There is an added cost for that log in operation. To solve that, I configured Windows 2000 to automatically log in with my username and make sure I use that same user for the file transfers. This way, it seems VirtualBox is smart enough to reuse the same session and everything goes smooth.

Running the build itself was a bit tricky, but it worked. The command is something like $MSDEV $SlnInGuest /OUT $BuildLogAbsolute /MAKE ALL /REBUILD, where:

MSDEV points to Visual Studio 6’s CLI "C:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin\MSDEV.COM" (I think I had to use the .com instead of the .exe file…)
SlnInGuest points to the solution file
BuildLogAbsolute points to the build log file

The reason for capturing the log file is that I couldn’t get the build output to show directly in TeamCity. So instead I redirect it to a file, copy it out of the VM, and print it manually.

The exit code of MSDEV is fortunately returned by VirtualBox so I can indicate if the build was successful or not, without having to do some more hacky stuff like searching for “ERROR” in the build log.

Overall, the approach worked and I never ended up in some unexpected intermediate state where I had to go manually into the VM. I even configured TeamCity to publish the build status on GitHub. It is kind of cool to see on GitHub “VC6 build passed” in 2021.

Git repo juggling

2021-05-28T12:02:59+00:00

This post shows how to use git subtree commands to move git repositories around. Perfect if you’re indecisive about monorepo vs multirepo.

Move a repo in a different repo as a subfolder

Use case: you want to move a library into a monorepo (because you read a post that says it’s cool).

Before:

Source repo             Destination repo
|                       |
|-- src                 |-- packages
|   \-- index.ts        |   \-- bar
|-- package.json        \-- README.md
|-- package-lock.json
\-- README.md

After:

Destination repo
|
|-- packages
|   |-- bar
|   \-- foo (**moved here from source repo**)
|       |-- src
|       |   \-- index.ts
|       |-- package.json
|       |-- package-lock.json
|       \-- README.md
\-- README.md

Steps:

Clone the source repo in a folder named source
Clone the destination repo in a sibling folder named destination
In the destination repo, run: git subtree add -P packages/foo ../source master (assuming the main branch is called master)
Delete the source repo (I’m sure you won’t regret it)

Move a subfolder to a new repo

Use case: you want to move a library out of a monorepo into its own dedicated repo (because you read a different post that says monorepos are actually not cool).

Before:

Source repo
|
|-- packages
|   |-- bar
|   \-- foo
|       |-- package.json
|       \-- README.md
\-- README.md

After:

Destination repo (**keeping only the contents of foo**)
|-- package.json
\-- README.md

Steps:

Clone the source repo in a folder named source
Prepare a new empty bare repo in a folder named destination (with git init --bare)
In the source folder, run git subtree push -P packages/foo ../destination master
To double check everything went fine, you can clone the bare repo into a regular repo and inspect its contents.
Delete the library from the source repo

Move a subfolder to an existing repo

Use case: you have two monorepos and you want to move one library from one to the other (because the oxymoron of having multiple monorepos escapes you).

Before:

Source repo    Destination repo
|              |
|-- packages   |-- libs
    |-- bar        \-- bar
    \-- foo

After:

Source repo    Destination repo
|              |
|-- packages   |-- libs
    \-- foo        |-- bar
                   \-- bar_v2 (** moved here from source repo where it was called bar **)

Clone the source repository into a folder named source
Clone the destination repository into a sibling folder named destination
In the source repo, keep only the desired library with git subtree split -P packages/bar -b temp (temp is the name of a branch we’ll use next)
Checkout the temp branch and double check the source repo folder only has the desired library
In the destination repo, add the library with git subtree add -P libs/bar_v2 ../source temp
Delete the library from the source repo

Update: 2021-10-03

I have a put together a script that handles some of these cases (especially the moving back to multirepo part).

Using Lerna for TypeScript and npm

2021-05-13T07:08:32+00:00

In this post, I’m showing a way to setup a monorepo with Lerna, taking into account some pitfalls when publishing to npm.

Lerna is a tool for managing JavaScript projects with multiple packages. In my very simple example, I have a project with 3 packages: a library, a CLI, and a VS Code Extension. The library package is used by both the CLI and the VS Code Extension packages.

Lerna makes the following things easy in my workflow:

Linking dependent packages together for local development. Normally, you’d have to use the npm link command (multiple times, depending on how many packages you need to link). With Lerna, lerna bootstrap takes care of that when you first clone a project (together with doing npm install for you).
Version management. Conceptually, my packages are all part of the same project, so I would like all of them to have the same version. Lerna takes care of that easily, the version is declared in the lerna.json file and controls the version for all packages. Bumping the version is also easily done with the lerna version command.
Publishing to npm. For my workflow, I like to publish all packages to npm, even if they didn’t have any changes since the previous release. I’ll dive into npm publishing in more details later, but this is done in two steps. On my local machine, I use the lerna version command to bump the version. This creates and pushes a git tag. On the CI side, I run the lerna publish command whenever a new tag is pushed.

Repo structure

Things get a bit more complicated because I use TypeScript instead of JavaScript. I’m going to show a little bit the structure of my repo:

[packages]
  |- [html-fmt-cli]
  |    |- [src]
  |    |    |- index.ts
  |    |    \- index.test.ts
  |    |- package.json
  |    |- tsconfig.json
  |- [html-fmt-core]
  \- [html-fmt-vscode]
README.md
lerna.json
package.json

On the top level, there’s the root package.json and lerna.json files. The root package.json just lists Lerna as a dev dependency. It’s flagged as private to prevent accidentally publishing this to npm. lerna.json holds the version of the packages and indicates that packages are to be found under the packages folder.

The three packages underneath the packages folder have a similar structure. Inside you’ll find a package.json and a tsconfig.json. To avoid mixing the code with other files, code is kept in a separate sub-directory, src. This will make life a bit more complicated when publishing to npm, which I’ll discuss later. Tests are kept side by side with the code they’re testing (e.g. index.test.ts is the unit test file for index.ts). I like this option more compared to keeping tests on a separate directory because it’s easy to locate and easier to write the import statement without trying to figure out how many ../ to add to match the directory structure.

tsconfig

Moving further to TypeScript, this is how my tsconfig.json looks like for the library project, html-fmt-core:

{
  "compilerOptions": {
    "declaration": true,
    "outDir": "./out",
    "allowJs": false,
    "target": "es6",
    "module": "commonjs",
    "moduleResolution": "node",
    "sourceMap": true,
    "strict": true
  },
  "include": ["src/**/*.ts"]
}

The key elements here are:

Source code is in src folder
Output JavaScript code will be in out folder (sibling of src)
Generate declaration files ("declaration": true), this makes VS Code happy

Main script and subdirectories

This brings us to the next point, specifying the entry point in package.json:

{
  "main": "out/index.js"
}

The complications start from the fact that the main file is in a subdirectory. Let’s say that index.js exports a function named hello. You can use it in a different package like you would expect:

import { hello } from "html-fmt-core";

Let’s say now that we have a file named Formatter.ts (compiled into Formatter.js).

You would expect this might work:

import { whatever } from "html-fmt-core/Formatter";

but unfortunately it doesn’t. What does work is specifying the out directory:

import { whatever } from "html-fmt-core/out/Formatter";

To avoid this altogether, what I did is to re-export what I want in the index.ts file like this:

export * from "./Formatter";

and use it as if it was part of the index file:

import { hello, whatever } from "html-fmt-core";

Note that this complication is happening because I’ve got the source code in the src folder and the generated code in the out folder. It is also possible to avoid this pain by simply having the code at the root level of the package (side by side with package.json and tsconfig.json) and output the generated code also there (with some rules in .gitignore to avoid committing it to git accidentally).

Publishing to npm

As I said in the beginning, it’s easy to publish all lerna packages with one command. With my current setup, I will have some problems:

It will publish not only the JavaScript code but also the TypeScript code. I would like it to publish only the JavaScript code (together with the declaration files for TypeScript users). This can be solved with some .npmignore lines but I’ll do it a bit differently.
It will publish not only the code, but also the tests.
My README file will be missing, because it’s at the root folder of the project and I don’t have (and don’t want to have) a README per package.

The thing is, my JavaScript code is already nicely contained in the out folder, so I’d like to publish just that subdirectory. Lerna supports this, with a very spot on description:

If you’re into unnecessarily complicated publishing, this will give you joy.

So as per the docs, I need to write a custom script that will run in the prepack step and:

create an artificial package.json
copy the README.md from the project root into the generated package root

My custom prepack script looks like this:

const fs = require("fs");

fs.copyFileSync("../../README.md", "out/README.md");

if (fs.existsSync(".npmignore")) {
  fs.copyFileSync(".npmignore", "out/.npmignore");
}

let packageJson = JSON.parse(
  fs.readFileSync("package.json", { encoding: "utf8" })
);
packageJson.main = "index.js";

if (packageJson.bin) {
  packageJson.bin = "index.js";
}

packageJson.scripts = {};
fs.writeFileSync("out/package.json", JSON.stringify(packageJson));

It copies over the README file from the project root into the output folder of the package
If an .npmignore file exists on the package root, copy it over to the out folder (I use the .npmignore file to ignore the tests)
Create an out/package.json based on the original with the following alterations:
- the main entry point will be index.js instead of out/index.js
- same for the bin script, if one exists (it exists for the CLI project)
- clear out the scripts because why not

The publish command for this is lerna publish -y --contents out from-package.

Summary

Looking back at this, perhaps the easiest way is to leave the TypeScript code at the package root and let the output JavaScript code live there too. However, if you like to have a separate source subdirectory and a separate output subdirectory, it’s definitely possible, with a bit of extra work.

Learning QBasic part 2

2020-11-30T06:50:25+00:00

This is the 2nd part of learning QBasic, still figuring out how QBasic works in order to write an interpreter for it.

Quick-ish summary of what was covered in the previous post:

5 types. There are 5 built-in types: integer, long, single, double, string (so 4 numeric types and one type for strings).
Implicit & explicit variables. A variable can be explicitly declared with the DIM statement, but it can also be implicitly declared simply by using it (e.g. A = 42).
Compact and extended DIM. A variable declaration can specify the type with the AS type clause, in which case I call it extended. Example: DIM A AS STRING. If it’s without the AS type clause, I call it compact. Example: DIM A, DIM A$. Implicit variables are by definition compact.
Bare and qualified variables. A variable name can be optionally followed by a type qualifier to indicate its type. The qualifiers are % for integer, & for long, ! for single, # for double, and $ for string. Example: Msg$ = "Hello, world!".
Resolving type. The type of a bare compact variable is determined by its name (in all other cases the type is known). The default type is single. The DEFxxx statements can change the default, based on the first letter of the variable name (e.g. a common idiom is DEFINT A-Z to make all bare compact variables integers).

You can read the previous post for a bit more context on how all these behave on edge cases.

More on the numeric types

The LEN built-in function can be used to measure the size of a variable. For the built-in numeric types, it shows how many bytes they use. For the string type, it shows the length of the string. Let’s use it to see how big the built-in types are:

PRINT "Size of integer is:", LEN(A%)
PRINT "Size of long is:", LEN(B&)
PRINT "Size of single is:", LEN(C!)
PRINT "Size of double is:", LEN(D#)

This will print out 2 for integer (16 bits), 4 for long (32 bits), 4 for single and 8 for double (64 bits). For the two integer types, the minimum and maximum represented values are:

Type	Min	Max
Integer	-32,768	32,767
Long	-2,147,483,648	2,147,483,647

Side note: In the current implementation of rusty basic, I use 32 and 64 bit types internally (i32 and i64) and the min-max limits for each type is enforced programmatically. This might change in the future.

Literal types

Given the min-max values described above, consider the following program:

PRINT 32767     ' max int
PRINT 32768
PRINT 32767 + 1 ' oops!
PRINT 32768 + 1

The program gives an overflow error on the third line. This might be counterintuitive. If it can print the value 32,768 on the second line, why can’t it print 32,767 + 1? It’s discoveries like these that shed some light on how QBasic evaluates expressions under the hood (and help me with re-implementing them in rusty basic). What (probably) happens is that QBasic tries to use the smaller type that can hold the literal expression. On the first line, it sees 32,767 and it stores it in a integer as it fits. On the second line, it uses a long type. On the third line, there are two values which both fit in the integer type on their own. Then, at runtime, it tries to add two integer values together, leading to an overflow as the result doesn’t fit anymore.

It is possible to qualify a number with a type qualifier to be explicit about its intended type (I haven’t fully implemented this in rusty basic yet):

'PRINT 32767 + 1 ' overflow error!
PRINT 32767& + 1 ' works

Constants

QBasic supports constants with the CONST statement, e.g. CONST Answer = 42. What we’ve discussed so far regarding variable names and types goes a bit different for constants.

Type comes from the right side

For bare constants, i.e. where the name isn’t followed by a type qualifier, it is the constant value that determines the type of the constant. In the statement CONST A = 42, the type of A will be an integer, despite the fact that we haven’t added DEFINT A-Z in the program. We can even do CONST A = "hello". It is the type of the right hand expression that determines the type of the constant.

Let’s see this with an example:

DEFINT A-Z      ' bare variables are integers
A = 32768       ' overflow!
CONST B = 32768 ' allowed, B is long

In the variable A, the type is derived from the name. We have a DEFINT A-Z statement, which means any unqualified variable starting with a letter A-Z (so all variables) is an integer. Since A is an integer, assigning the value 32768 to it results to an overflow error, as it doesn’t fit the integer type. For B however, the type is determined by the const value. The smallest type that can accommodate 32768 is long, so B becomes a long constant.

Constants always follow extended naming rules

We saw in the previous post that it’s possible to do this with compact style variables:

DIM A$
DIM A%
A$ = "The answer is"
A% = 42
PRINT A$, A%

but not with extended style variables:

DIM A AS STRING
DIM A% ' duplicate definition!

Constants behave always like extended style variables in this case, even though they can only be declared in the compact style (there is no such thing as CONST A AS STRING = "hello").

So if we have a constant with CONST Msg = "Hello" (or even CONST Msg$ = "Hello"):

Msg and Msg$ are both valid ways to reference that constant
It is not possible to reference Msg with any other qualifier, e.g. Msg% gives a duplicate definition error
It is not possible to declare a variable named Msg (with or without qualifier)

Within the same sub-program, it is not possible to declare another constant named Msg (with or without qualifier). It is possible however to re-define a constant within a sub-program (because why not):

CONST Msg = "hi"
PRINT Msg ' prints hi
Greetings ' prints 42

SUB Greetings
  ' shadow global constant with a local constant
  CONST Msg = 42
  PRINT Msg
END SUB

SUB Oops
  ' error: cannot shadow global constant with a variable
  DIM Msg$
END SUB

Wrap up

While writing rusty basic, one of the most challenging things was to identify what a name is, and do it in the same way QBasic does. This is what we have so far:

Defining a constant

Type determined by value, not from name
Cannot co-exist with constant or variable of the same name, qualified or unqualified
Can shadow global constant inside subprogram (FUNCTION or SUB)

graph TD A[CONST A = 42] --> B{constant exists in scope?} B --> |Yes| DD[Duplicate definition] B --> |No| C{variable exists in scope?} C --> |Yes| DD C --> |No| S[Success]

Defining an extended variable

Type provided by AS type clause
Cannot co-exist with constant of the same name, qualified or unqualified, local or global
Cannot co-exist with variable of the same name, qualified or unqualified

graph TD A[DIM A AS STRING] --> B{const exists in *any* scope?} B --> |Yes| DD[Duplicate definition] B --> |No| C{variable exists in scope?} C --> |Yes| DD C --> |No| S[Success]

Defining a compact variable

Type provided by qualifier if qualified, resolved by name if bare (affected by DEFxxx statements)
Cannot co-exist with constant of the same name, qualified or unqualified, local or global
Cannot co-exist with extended variable of the same name
Can co-exist with compact variables of the same name and different type

graph TD A[DIM A$] --> B{const exists in *any* scope?} B --> |Yes| DD[Duplicate definition] B --> |No| C{extended variable exists in scope?} C --> |Yes| DD C --> |No| E{compact variable A$ exists in scope?} E --> |Yes| DD E --> |No| S[Success]

Adding mermaid diagrams

2020-11-27T20:42:47+00:00

I noticed at work the other day that GitLab renders mermaid diagrams automatically in README files. I wanted to implement the same for my blog.

In case you don’t know mermaid, it’s a text base approach to creating diagrams such as flow charts, sequence diagrams, etc. It’s similar to PlantUML, but this one can run in the browser too, which makes it very easy to integrate. Here’s an example:

This is some text before the graph.

graph TD A[Client] --> B[Load Balancer] B --> C[Server1] B --> D[Server2]

This is some text after the graph.

This is the code that renders the graph:

<div class="mermaid">
  graph TD
  A[Client] --> B[Load Balancer]
  B --> C[Server1]
  B --> D[Server2]
</div>

And enabling it on my blog is as simple as adding these lines before the closing </body> tag:

<script src="https://unpkg.com/mermaid@8.8.0/dist/mermaid.min.js"></script>
<script>mermaid.initialize({startOnLoad:true});</script>

Update 2020-11-30: I changed the last line to:

<script>mermaid.initialize({startOnLoad:true, flowchart: { useMaxWidth: true }});</script>

so that the size of the diagram fits mobile devices too.

Dropwizard with Immutables

2020-10-11T12:19:06+00:00

In this post, I’m using Dropwizard together with the Immutables library and sharing some thoughts on OOP.

Dropwizard is a Java framework for developing RESTful web services. I’ve been using it at work for almost a year now. It’s not a mega framework like Spring, but it gets the job done.

Immutables is a Java library for creating immutable value objects.

Let’s imagine we have a Dropwizard resource that deals with books and has two methods: one for returning a list of books and one for adding a new book:

@GET
public List<Book> getBooks() {
  // get the books
}

@POST
public Response addBook(@NotNull @Valid Book book) {
  // add a book
}

The most common way of implementing Book is to create a POJO class, with its getters and setters, in 20 lines of traditionally verbose Java code:

public class Book {
  private String isbn;
  private String title;

  public String getIsbn() {
    return isbn;
  }

  public void setIsbn(String isbn) {
    this.isbn = isbn;
  }

  public String getTitle() {
    return title;
  }

  public void setTitle(String title) {
    this.title = title;
  }
}

You could also skip the getters and setters and go for public fields:

public class Book {
  public String isbn;
  public String title;
}

In OOP, the above is frowned upon (and that’s putting it mildly). Exposing public fields is a violation of one of the basic principles of OOP. So why would you go for this?

In my mind, OOP design is about classes that have state and behaviour (fields and methods). That’s all totally valid. But, when you’re programming methods that receive or return JSON objects, these classes are supposed to be nothing more than that. I think the fact that Java is adding records (C# too by the way) is a recognition to the fact that data transfer objects deserve to have a first-class support without having to write unnecessary boilerplate just to avoid the stigma of the heretic.

When I think of practices like immutability, or favoring composition over inheritance, I wonder if perhaps there ought to be a book “OOP - the good parts”.

Back to the Immutables library: I like to use Immutables instead of the two above approaches (especially when it comes to responses):

@Value.Immutable
@JsonSerialize(as = ImmutableBook.class)
@JsonDeserialize(as = ImmutableBook.class)
public interface Book {
  String getIsbn();

  String getTitle();
}

Validation works as expected (e.g. let’s demand that both ISBN and title are not blank):

public interface Book {
  @NotBlank
  String getIsbn();

  @NotBlank
  String getTitle();
}

Caveat: it only works when using getter style methods. It is not unusual for Immutables to use a different naming convention for the methods like String isbn(). In that case, validation doesn’t work.

Creating the responses is done with a builder object:

public List<Book> getBooks() {
  return List.of(
    ImmutableBook.builder()
      .isbn("978-0078817038")
      .title("Turbo PASCAL 6: The Complete Reference (Programming series)")
      .build()
  );
}

I personally like this approach more, especially because of the builder object.

Learning QBasic part 1

2020-08-08T05:20:58+00:00

I’ve been writing an interpreter for QBasic for quite some time now, since March 28th to be precise. This was something I always wanted to do. In the process, I am learning Rust, which I like a lot. But, as I implement feature after feature, I am learning QBasic as well, especially things a developer normally never has to worry about.

Dynamic Typing? Nope

The first misconception I had was that QBasic is a dynamic typed language. In Ruby, you can do the following:

i = 42
puts i
i = "is the answer"
puts i

The variable i gets first the numeric value 42 and then the string value “is the answer”.

In QBasic, that doesn’t work. You can’t re-assign a variable to a different type. In fact, it goes a bit beyond that. The name of the variable dictates the type of values it can hold.

A = 42      ' works
A = "oops"  ' does not work

In my interpreter, rusty basic, I refer to these variable names as bare. They consist of just letters and optionally numbers, e.g. A, B, Age, MysteriousVariable42.

Side note: I figured out variable names in QBasic can also contain dots for crying out loud, which I haven’t supported yet.

By default, bare names hold float values of single precision. So the assignment A = 42 is actually creating a single value. To create a different variable, we need to use what I call in rusty basic a qualified name: a variable name followed by a type qualifier. There are five such characters:

% declares an integer
& declares a long
! declares a single (which is the default)
# declares a double
$ declares a string

Some examples:

A% = 42
B& = 5653376574648
C! = 3.14
D# = 45644564.3353
E$ = "hello world"

When coding the interpreter, I often worry about edge cases. Things that a developer wouldn’t normally do, but out of curiosity I dive into. An example is, what would happen if I define the same variable name with different type qualifiers:

A% = 42
A$ = "the answer"
PRINT A%, A$

Will the above program work? Turns out it works just fine, as far as QBasic is concerned, A% and A$ are two different variables.

Default type resolution

I mentioned that a bare variable holds a single float value by default. Consider the following program:

A = 41
A! = A! + 1
PRINT A

It prints 42. As the A variable is a single, it is resolved to A!. A and A! are the same variable.

It is possible to change the default type to one of the other 4 types with the keywords DEFINT (for integers), DEFLNG (for longs), DEFDBL (for doubles) and DEFSTR (for strings). There’s also of course DEFSNG for singles. In fact, in the games that came with QBasic, Gorillas and Nibbles, the first statement in the game is:

' Set default data type to integer for faster game play
DEFINT A-Z

The above statement says that any variable that starts with a letter between A and Z (so all variables) is an integer. I don’t know why integers aren’t the default to begin with, might be some backwards compatibility issue with an even older BASIC flavor.

If we use this statement in the above program:

DEFINT A-Z
A = 41
A! = A! + 1
PRINT A

It will print 41 instead. Now we have two variables A (which is now an integer, so A%) and A!.

In rusty-basic, I have a layer called “linting” which sits between getting the parse tree and generating instructions. Among other things, the linter makes sure that all bare names are resolved to qualified names and that there are no mismatch types or duplicate definitions.

Making it more complicated: DIM A AS STRING

I’m currently in the process of implementing the DIM statement. With DIM, you can declare a variable without an assignment. It’s even possible to declare arrays and user defined types, but I haven’t gone that far in implementation.

There’s two ways you can declare a variable with DIM. The one is exactly the same as seen so far:

DIM A
DIM A$
A = 42
PRINT A!
A$ = "the answer"
PRINT A$

The program works fine. The variable A is a bare variable resolved to a single (so it’s the same as A!) and A$ is a different variable.

It gets complicated when using the second way of declaring a variable:

DIM A AS INTEGER
A = 42
PRINT A

The keywords that match the five data types are INTEGER, LONG, SINGLE, DOUBLE and STRING.

For the lack of a better name, I call this an extended variable (naming things is hard) while I call the variables seen earlier as compact. It’s important to flag this as something special in the interpreter because it follows different naming resolution rules.

An extended variable cannot coexist with other variables of the same bare name. The following throw a duplicate definition error:

Trying to use the same extended variable with different type (this one makes sense):

DIM A AS INTEGER
DIM A AS STRING

Trying to use an extended integer with a bare compact (which should be single):

DIM A AS INTEGER
DIM A

Trying to use an extended integer with a compact string:

DIM A AS INTEGER
DIM A$

Trying to use an extended integer with an implicit compact string on assignment:

DIM A AS INTEGER
A$ = "hello"

Side note: every such edge case discovery has become a unit test in rusty basic.

You could argue at least the latter might be something the interpreter should allow, given it allows it for compact style variables, but it doesn’t.

The next difference is that A now is resolved to A% (as it declared as an integer), bypassing the default type resolution rules:

DIM A AS INTEGER
A% = 42
PRINT A

So it’s still possible to use a type qualifier, as long as it matches the type declared at the DIM statement.

Again, these are all probably things that a developer wouldn’t do. It would be also tempting on my side to just say “a variable must be unique, throw an error otherwise” and call it a day. But I’m curious to discover how it is supposed to work.