Archive for the ‘Open Source Software’ Category

Upgraded to Postgres 16.2

Tuesday, March 26th, 2024

PostgreSQL.jpg

I noticed that Postgres 16 was updated in Homebrew, so I took the time to upgrade the installation on my MacBook Pro, and it was remarkably easy. Because it was a simple "dot" upgrade, the installer did all the heavy lifting:

 $ brew upgrade postgresql@16

and then when it was done, simply restart the service with:

 $ brew services restart postgresql@16

And that was it. Everything is up and running just fine. What a treat. 🙂

Postgres JSONB Field Manipulations

Monday, December 4th, 2023

PostgreSQL.jpg

Since it was announced, the JSONB field in Postgres has been one of the most useful fields I've encountered. It's the ability to have complex data as a single field, with very powerful querying tools into the data - as needed. But one thing that has always been a little tough for me is the manipulation of the data, in SQL, so that it stays as JSON, but it's been altered from it's as-stored value.

Let's say we have a table that has a JSONB field that has an array of Objects, each looking something like this:

  {
    "id": 101,
    "size": "large",
    "score": 91,
    "count": 1232
  }

and there can be more, but what you really want in your SELECT is just an Array of the id and score, and nothing else. It still needs to be an Array, and each element needs to be an Object, but a smaller Object, with just those two attributes.

From the docs, Postgres has jsonb_array_elements() that can turn an Array into something that looks like a table... and we can use the ->> notation to get at the value of an attribute, but how does it all fit together to disassemble, modify, and re-assemble all the elements? A sub-select.

Let's walk through one.

  SELECT jsonb_agg(
           jsonb_build_object(
             'id', t.value->>'id',
             'score', t.value->>'score'
           )
         )
    FROM jsonb_array_elements(c.targets) t

the json_build_object() will take the key/value pairs it's given, and create a new Object. The source data for this is t and is the output of the jsonb_array_elements() function in the targets field, which is the JSONB file that is holding this data.

Then to pull this together, you might have a query like:

  SELECT c.id, c.name,
         (SELECT jsonb_agg(jsonb_build_object(
                   'id', t.value->>'id',
                   'score', t.value->>'score'))
            FROM jsonb_array_elements(c.targets) t
           WHERE t.id = c.id) AS targets
    FROM company c
   WHERE ...

It's not easy... but it's powerful, and there are a lot of things that can be done with this kind of manipulation... it's just going to take a little practice. 🙂

Moving to Postgres 16.0

Friday, October 20th, 2023

PostgreSQL.jpg

This morning I noticed that not only was Postgres 14.9 out, they had released 15.x and even 16.0. It's unusual for me to be a full major version behind on my main laptop, but to be two was just something that had to be corrected.

Several months ago, it was clear that the official Postgres builds were no longer being done by the Postgres group, and so the support for 15.0 wasn't in Homebrew. I figured it'd just be a little bit, and then things would start back up again. But that was not the case. What happened, instead, was that the Homebrew volunteers took it upon themselves to build the packages for 14.x, 15, and now 16.

So let's write this all down so it's easy to do next time we change a major version of Postgres. Start by saving everything in all the databases:

  $ pg_dumpall > dump.sql
  $ brew services stop postgresql@14

Now we can wipe out the old install and it's data:

  $ brew uninstall postgresql@14
  $ rm -rf /opt/homebrew/var/postgresql@14

Now we install the new version, start it, and load back up the data:

  $ brew install postgresql@16
  $ brew services start postgresql@16
  $ psql -d postgres -f dump.sql
  $ psql -l

If the command psql doesn't show up in the path, just relink the package:

  $ brew link postgresql@16

Then it should be in the right spot.

At this point, it's all loaded up and you can ditch the dump.sql file, as it's no longer needed, and the new version is active:

  $ psql --version
  psql (PostgreSQL) 16.0 (Homebrew)

Not bad at all. 🙂

The Passing of a Legend

Sunday, August 6th, 2023

vim.jpg

This morning, a friend sent an email with a link mentioning the passing of Bram Moolenaar, the initial creator of Vim. There aren't many folks who have impacted my professional life as much as the creators of Vi, and then Bram and Vim.

I remember first using Vi at Purdue in my final year of Undergrad on the ADM terminals, and then making sure I could find termcap entries for all the terminals I could get my hands on in the days when you had terminals hooked by serial lines to the Dual VAX 11/780 system at the Electrical Engineering Department. After that, it was Auburn, and Vim on my Amiga, and then Mac System 6, and on virtually every system I had from 1985 on.

The only tool that even comples close to that longevity is GCC.

I know nothing lasts forever, and I know people pass on, but I honestly didn't expect to be so surprised by this news. I read it again, a couple of hours later on one of the RSS Feeds I read on my iPad, where, again, I have Vim. Still the same sense of sadness.

Ask not for whom the bell tolls...

Nice Postgresql Trick

Thursday, June 29th, 2023

PostgreSQL.jpg

This morning I really wanted to be able to set the psql prompt in my sessions because the system we have for creating databases doesn't really create nicely human-readable name, and even so, I'd like to have the name in the prompt match the branch of the code I'm working on... it just works out better.

So I started digging, and the -c parameter is OK, but the psql session terminates after that - so that's not going to work. And piping in the \set commands seemed to be problematic, and then I found this:

  $ psql --set=PROMPT1="${br}%R%#%x "

where br is the name of the branch I'm working on. This could then be obtained from git easily, and then put into a function, and it works great!

  #
  # "platter psql (postgres command line tool)" - start a psql session on
  # the provided branch in the PLATTER_INSTANCE in the .env file in the
  # current directory (a Node project repo). If no branch is provided, then
  # the current git branch will be used.
  #
  function ppsql() {
    if [ -f .env ]; then
      set -o allexport; source .env; set +o allexport
      local br=$1
      if [ ${#br} -eq 0 ]; then
        br=`cat .git/HEAD | sed -e 's:^.*/::'`
      fi
      local url="`npx platter postgres branch url ${br} \
           --instance $PLATTER_INSTANCE | tr -d '\n'`?sslmode=require"
      psql --set=PROMPT1="${br}%R%#%x " --set=PROMPT2="${br}%R%#%x " $url
    else
      echo "Not in a Node project directory!"
    fi
  }

With this, it's so easy now to be able to keep track of the database (branch) I'm on with Platter, and that makes a really big different to my peace of mind. 🙂

Interesting Issues with Clearbit

Monday, April 24th, 2023

Alarm icon

This weekend we had an interesting issue with Clearbit's Logo Search interface - a free service they provide on their Business Information Service system. You can basically hit their endpoint with a query param of the name of a Company, and they will respond with something that looks like:

  {
    name: 'Flexbase',
    domain: 'flexbase.app',
    logo: 'https://logo.clearbit.com/flexbase.app'
  }

which is a nice little thumbnail logos of the Company. It's a very nice tool, and for the most part works flawlessly.

Until it doesn't.

The real issue was the Open Source Node client that was hitting the company's endpoint. It started with:

  topSuggestion(name){
    return new Promise((resolve, reject) => {
      resolve(getTopSuggestion(name));
    });
  }

which called:

  let getTopSuggestion = (query) => {
    return new Promise((resolve, reject) => {
      request(config.api.autocomplete + '?query=' + query, (err, response, body) => {
        resolve(JSON.parse(body)[0]);
      });
    });
  }

Now when everything is working as it should, this code is just fine. But on the weekend, the response from the endpoint in getTopSuggestion() was returning:

  Welcome to the Company API. For docs see https://clearbit.com/docs#company-api

which, of course, isn't JSON, and so the JSON.parse() was throwing an exception. But the function getTopSuggestion() was using the resolve() for the Promise, so it was creating an exception that could not be caught. This was bad news.

Now as it turned out, a coworker found that Clearbit was doing some maintenance, and that might have been the genesis of the issue, but it was made much worse because when we debugged this on our machines - several of us, the issue didn't present itself. Only in production.

Still, it was clear this wasn't the right code to use, and the library was 6 years old without an update, and the code was small. So a coworker suggested we just make the one call ourselves:

    let res = {}
    try {
      // Get the top URL Suggestion for a store name
      const url = new URL(config.api.autocomplete)
      url.searchParams.append('query', name)
      // ...now make the call, and parse the JSON payload
      const payload = await fetch(url).then(response => response.json())
      if (Array.isArray(payload) && payload.length > 0) {
        // ...pick off the top suggestion
        res = payload[0]
      }
    } catch (err) {
      log.error(`[logoFinder] ...message... Name: '${name}', Details: "${err.message}"`)
      return {
        success: false,
        error: errorMessages.badClearBitRequest,
        exception: err,
      }
    }
    return {
      success: true,
      ...res,
    }
  }

where the error message is really up to you, but the point was that this was something that would handle the simple text being returned by the endpoint and throw the exception on the JSON parsing without causing all the trouble of the library we were using.

There were a few things I liked about the new implementation we came up with:

  • Explicitly setting the query param on the URL - while it's possible that 90% of all name values would not lead to an issue, it's always nice to be safe and make sure that the proper encodings are done with the query params. It's two lines of code, but it makes sure that it's all handled properly.
  • The chaining of fetch() and then() - both fetch() and response.json() are async functions, so you might expect to see two await prependers on the functions, but there's only one. This is a nice feature of the then(), in that it unrolls the async nature of the fetch() so that the async nature of the .json() comes through - returning the value to the caller.

Sure, we still need to get the first element in the Array, but we also test that to make sure it's actually an array, and that there's something to get. It's just a lot more defensive coding than the original client had, and when we did this, we still got the good results on the dev machines, and at the same time, we got proper exception catching on the production instances.

Thankfully, the issues resided about the time we got the fix into the code, tested, and into production, so it wasn't long-lived, but it was a problem for a while, and we were able to recover the errors due to queues and retries, which is another saving grace that I was very thankful for.

Nothing like a little production outage to make the day exciting. 🙂

Ivory for Mastodon by Tapbots

Monday, January 30th, 2023

Ivory

With the Early Release of Ivory by Tapbots, I think I've found the clear winner for a Mastodon client. This is very good - and it's got a lot more work that they want to do on it... so it's only going to get better.

Right now, it's got everything I used on Twitter, plus other nice features that are part of Mastodon natively. All in all, it's actually a nice upgrade.

I would still like to see the local town alerts (weather, service interruptions, etc.) come to Mastodon, but even if they don't, it's far better than whatever Twitter is turning into.

Interesting Node sleep() Function

Wednesday, October 26th, 2022

NodeJS

Today I had a reason to look at some Node issues with async processing, and ran across these two little functions that are interesting, but quite deadly. They pause the Node Runtime for the specified number of milliseconds, or seconds, and this is nice - when you have to have a delay, but all Node processing stops. This means all the async calls won't get processed, either.

  function msleep(n) {
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, n);
  }
 
  function sleep(n) {
    msleep(n*1000);
  }

Simple. Easy. But not really what I was looking for. 🙂

Found a Nice Async Batching Library

Tuesday, October 18th, 2022

NodeJS

Yesterday, I was doing a little work, and noticed that I was getting a lot of connection resets on a service that has been flawless for more than 18 months, but to be fair, the load has been rising, and after digging into the cause, it appeared that the issue was overloading the Client with so many requests, it just failed.

Typically, a client will apply back-pressure on the caller to make sure that things don't get to this point, or they will queue the requests in memory so that they will be processed, in turn, as they arrived. I'm not exactly sure what's happening, the developers of the Client are looking at this, but I needed to find something to ease the load, and so I found asyncBatch().

Let's say I had the following code:

  const balances = (await Promise.all(companies
    .map(async c => {
      const bal = await minimumDueForCompany(user, c)
      if (bal?.success && !isNil(bal?.interestDue) && bal.billDate === today) {
        bal.company = c
        return bal
      }
      return undefined
    })))
    .filter(bal => bal !== undefined)

we're running through all the items in the companies array, and for each, we are calling minimumDueForCompany() and then checking a few things, and then filtering on those that we want to see. Simple.

But if we have more than 200 elements in the companies array, and the minimumDueForCompany() employs several database queries, we could get to the point of launching more than a thousand hits at nearly the same time. If this is a background task, this might be able to starve some more important tasks with all the database aork.

A batching solution was needed. And so I went looking.

asyncBatch() follows much the same style as the Promise.all(), it just takes the values as arguments: the array, the function, and the batch size:

  const asyncBatch = require('async-batch').default
 
  const balances = (await asyncBatch(companies,
    async c => {
      const bal = await minimumDueForCompany(user, c)
      if (bal?.success && !isNil(bal?.interestDue) && bal.billDate === today) {
        bal.company = c
        return bal
      }
      return undefined
    }, 2))
    .filter(bal => bal !== undefined)

With a batch size of 2, we'll start simply, and let the background task take a little longer, while preserving the more immediate user-facing calls can have priority access.

Put this in and things are working better. It's not a perfect solution, and we still need to have the Client improved, but it gets around the two problems: Flooding the database when the use-case doesn't require it... and Failures on the Client to handle the flood. We can fine-tune the batch size later.

UPDATE: it turned out that the library launched all the work in an initial Promise.all() so it really wasn't batching the work as I'd expected. So I wrote my own using the chunk library:

  const chunk = require('chunk')
 
  /*
   * We need a function that will batch the equivalent of:
   *
   *   const resp = await Promise.all(arr.map(itm => fcn(itm)))
   *
   * but do it in batches, so that when we get a large workload, we don't
   * overwhelm the system. This is that function. The first argument is the
   * array to process, the second is the async function, that takes one
   * argument, and the last is the batch size that defaults to a reasonable
   * value.
   */
  const asyncBatch = async (arr, fcn, batchSize = 4) => {
    const ans = []
    for (const b of chunk(arr, batchSize)) {
      const blk = await Promise.all(b.map(async itm => await fcn.apply(null, [itm])))
      ans.push(...blk)
    }
    return ans
  }

This works exactly as expected, working on n of the elements at a time, and then moving to the next batch. Much cleaner.

Adding Let’s Encrypt Certs to Nginx

Thursday, October 13th, 2022

Linode

This morning I had some time and wanted to finish up the work of getting my Cloud VM running Ubuntu 22.04 working just fine as a development box - including inbound webhooks from vendors, and calls from apps like HTTPbot on my iPad Pro. The key was that I needed to be able to install and configure nginx to forward all port 443 traffic to port 6543, and that also meant getting the nginx server to be listening on port 443 with a legit certificate.

Turns out, it wasn't as bad as I thought it might be. 🙂

Starting with my Ubuntu 22.04 install, I added the packages I was going to need, based on this blog post on the nginx site.

  $ sudo apt-get -y install --no-install-recommends nginx certbot python3-certbot-nginx

Once these are installed, we could set the server_name in the nginx config:

  $ sudo /etc/nginx/sites-enabled/default

and update the server_name line to be:

  server_name mybox.mydomain.com;

and then we can get the initial certificate from Let's Encrypt and register a new email account with them with:

  $ sudo certbot --nginx -d mybox.mydomain.com -d mydomain.com

and the second -d argument is for an additional domain for the certificate. I didn't need it, so I just had the one -d pair on my certbot command.

After this, we edit the config file again, updating the port 443 section's location specification with:

  location / {
    # forward all HTTPS traffic to port 6543
    proxy_set_header  X-Forward-For $remote_addr;
    proxy_set_header  Host $http_host;
    proxy_pass        "http://127.0.0.1:6543";
  }

and then verify the nginx config with:

  $ sudo nginx -t

and then tell nginx to reload the config with:

  $ sudo nginx -s reload

At this point, the box is answering HTTPS traffic, and forwarding it on to the Node service at port 6543. Excellent. 🙂

In order to refresh the Let's Encrypt Certificate on time, let's add a simple crontab entry:

  $ crontab -e

and then have the entries:

  # run all the commands on Bash not Bourne Shell
  SHELL=/bin/bash
  # send all the mail to my main account
  MAILTO=bob@mydomain.com
 
  # check the Let's Encrypt certificate each dat at noon UTC
  0 12 * * *   sudo /usr/bin/certbot renew --quiet

And that should do it.