Visualizing the Hacker News API with HCL and SQL

feature
Jul 27, 20227 mins
APIsCloud ComputingDatabases

Use Steampipe and GitHub Actions to create a ‘dashboard as code’ for exploring thousands of Hacker News stories, right up to the current hour.

Binary stream passing over rows of monitors, each also displaying binary streams.
Credit: Loops7 / Getty Images

In the last episode we looked at a Steampipe dashboard that joins search results across diverse APIs. That metasearch dashboard is defined entirely in code that blends two languages: SQL to query the APIs, and HCL (HashiCorp Configuration Language) to flow results into a dashboard table.

This dashboards-as-code approach leans into the infrastructure-as-code philosophy. We believe that dashboards are another kind of infrastructure that should be defined in software, managed in version-controlled repositories, and deployed from repos.

In this episode we’ll look at a dashboard that explores items posted to Hacker News. It illustrates several ways in which Steampipe can participate in software-defined workflows.

Here’s the dashboard in action.

And here’s the quick start to run it for yourself.

The visualizations cover more than 15,000 stories since June 21. That’s way more history than you can conveniently slurp from the Hacker News API in one gulp. So where does the historical data come from? It’s in the repo!

History

On an hourly basis a GitHub Action checks out the repo, runs Steampipe, queries for new items, exports them to a CSV file, commits that file, and pushes to the origin. So when you clone or update the repo you get fresh data. With Steampipe and the CSV plugin installed, here’s how you view the dashboard.

cd ~/steampipe-samples/all/hackernews
./update.sh
steampipe dashboard

And then visit https://localhost:9194. Repeat ./update.sh at any point to pull and incorporate up-to-the-hour data. The GitHub Action, scheduled to run hourly, does the following.

  • Check out the repo
  • Install Steampipe in the GitHub runner
  • Install the Hacker News plugin
  • Query the latest stories and send the output to a CSV file
  • Commit the new CSV file and push the changes

I was late to the GitHub Actions party but I’m there now. This is such a nice way to durably accumulate small snapshots!

For those keeping score, there are three instances of Steampipe in this workflow:

#1 takes the snapshots in a GitHub Action using the Hacker News plugin. #2 builds the table in your local Steampipe database, using the CSV plugin. #3 launches the local dashboard server your browser connects to.

HCL + SQL

The Sources panel offers a picklist of domains, and finds stories whose URLs match a selected domain. Here is the SQL query that reads all the URLs and extracts their domains. The trickiest part is the regular expression, but that’s true for regexes in all languages. The SQL itself is simple and easy to understand.

with domains as (
  select distinct
    substring(url from 'http[s]*://([^/$]+)') as domain
  from
    hn_items_all
)
select
  domain as label,
  domain as value
from
  domains
order by
  domain

The HCL that creates the picklist is likewise simple and easy to understand.

input "domain" {
  width = 3
  sql = <<EOQ
    -- as above
  EOQ    
}

That’s the pattern everywhere: SQL to get the data, a thin HCL wrapper to flow results into a widget. These two flavors of code, which together define the dashboard, are born in a repo, evolve there, and deploy from there. In low-or-no-code systems all the same information lives in a data store managed by an app. You may (I once did) have to export that info in order to record the system’s evolution in a repo. It’s better to have it always there.

The point of low-or-no-code systems is, of course, to wrap code in tooling that unlocks the creativity of people who don’t code. There isn’t yet such tooling for this HCL + SQL combo, but there can be. Systems defined in code are inherently friendly to tools that read and write that code. The Hacker News repo, for example, includes a whimsical example that animates a chart by programmatically editing a dashboard file and changing certain HCL values.

Although I didn’t use a parser for that example, there are of course HCL and SQL parsers that render code as data. In an earlier episode we showed how another Steampipe plugin turns Terraform files into Steampipe tables, thus enabling a compliance checker that validates the Terraform definitions. As you build and evolve dashboards, wouldn’t you want to validate them in the same way? An as-code architecture guarantees that it’s possible.

That said, coding Steampipe dashboards in pure HCL + SQL is easy, fun, and wildly productive. Here are two of my favorite things that make it so.

Live editing

When you edit dashboard code in an editor that auto-saves, the dashboard reacts immediately. As shown here, that’s true for changes in both the HCL code that orchestrates the data and the SQL code that fetches it.

This is a kind of direct manipulation interface: “an approach to interfaces which involves continuous representation of objects of interest together with rapid, reversible, and incremental actions and feedback.” (Wikipedia)

Bret Victor is a great champion of this approach. In this segment of Inventing on Principle he nails the problem that direct manipulation addresses.

I’ve got this code, and I’ve got this picture over here, and I have to maintain the mapping between the two in my head.

The longer it takes to see the results of code changes, the harder it is to maintain that mapping. Immediate feedback is priceless, and too often unavailable. We took it for granted when coding the early web, but now there is often a build step that thwarts direct manipulation.

When coding a Steampipe dashboard you get immediate feedback from both HCL and SQL. I’ve written a lot of Postgres queries in recent years. Writing them in an environment that refreshes query results as I type, or displays Postgres error messages when I make mistakes, has been transformative.

Web sensibility

Coding Steampipe dashboards reminds me of the early web in other ways too. Links, for example, are first-class citizens of Steampipe dashboards, and that’s not always true nowadays. The Submissions panel links usernames to another dashboard that explores all submissions from a selected user. It all works just as a web developer would expect.

There’s also a View Source button. Remember View Source? It used to matter on the web. Nowadays it usually just shows you minified line noise. But the HCL + SQL combo reminds me of the early HTML + JS combo — in a good way! Here’s a peek at the sources behind one of the dashboard panels.

Along with direct feedback, this kind of observability helped the early web take off. You’d see something, wonder how it was done, view the source, learn, and imitate. Newbie web builders can no longer participate in that kind of virtuous cycle. Newbie Steampipe dashboard builders can, and that bodes well for a thriving ecosystem.

Visualize data at the speed of thought

Because Steampipe extends Postgres you can use any tool to create dashboards backed by Steampipe data. If you’re proficient with Metabase, Tableau, or another tool that connects to Postgres, Steampipe’s API-to-database-table superpower will accelerate your ability to visualize data from diverse sources in those environments. But if you do try dashboarding the Steampipe way, let me know how it goes. In my own experience, it’s the fastest and most enjoyable way to turn an idea into a useful dashboard.