A primer on n8n

I've been wanting to use this thing for a while, and recently found a easy 20 minute-ish project to learn about this.

Reading Time: ~5 min
card.jpg

What even is n8n?

Some sort of low-code / no-code automation pipeline thing. A friend of mine uses it at work and I saw a NetworkChuck video about it too.

I’m mildly interested, and I’ve been waiting for the opportunity to use it.

It does market itself as a tool to perform tasks that shouldn’t exist, like user management (literally their first example). Have a proper identity provider for those things. But I get the potential.

“n8n GitHub Stars”

It did absolutely explode because of the AI hype train. So that’s probably its strong point or just a marketing scheme to get more people to use a tool that basically just manages flow of data.

The “quick little project”

Yes, it’s just a web scraping project. It has to be something simple that I have already done using another technology, so I can truly weigh its pros and cons.

Besides, n8n markets itself as workflow automation tool. What is that if not: “collect data”, “parse data”, “perform action”.

I’m going to collect HTML pages, parse and re-collect if necessary, and skip the “perform action” step entirely because I don’t actually have anything do to with this data.

note

Even though it’s open-source there’s telemetry, which is creepy as hell.

Making HTTP requests

This is as simple as it gets: Create a “HTTP Request Node”, put the URL in there.

“An HTML Node”

Parsing a DOM

Very cool, now I want to filter out elements and stuff.

One way of doing this is using Regex. For reasons I hope you understand I’ll avoid that. It’d be simpler if I was doing some small match of any sort, but what we really want here is basically parsing.

Approach number 2: Use JavaScript’s DOMParser() inside a “Code Node”. Very elegant… if it worked…

The issue with JavaScript is that the code being executed here is not a “browser JavaScript”, it is NodeJS code. That means that there is absolutely no DOMParser() or DOM, for that matter. So what can we do?

There’s quite a few packages that we can use to parse DOM, one of which is the jsdom package.

So a Code Node with something like:

const jsdom = require("jsdom");
const dom = new jsdom.JSDOM($input.all());

Should be enough to get started, right?

Module Not Found

Cannot find module 'jsdom' [line 1]… Of course! We didn’t install it! How can we do that?

Using external JS modules

A typical n8n development docker deployment with Docker (and literally the one I’m using) looks something like this:

services:
  n8n:
    container_name: n8n
    image: docker.n8n.io/n8nio/n8n
    restart: always
    ports:
      - "5678:5678"
    environment:
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
      - N8N_RUNNERS_ENABLED=true
      - N8N_SECURE_COOKIE=false
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - N8N_PORT=5678
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - TZ=${GENERIC_TIMEZONE}
    volumes:
      - n8n_data:/home/node/.n8n
      - ./local-files:/files

volumes:
  n8n_data:

The key here is the image. It runs Node inside it somehow, and it is there that the packages are. What we can try to do is the following:

  1. Get inside the container:
docker exec --user root -it n8n sh
  1. Install JSOM with npm i -g jsdom.

  2. We can then try to use the jsdom module in our Code Node!

Aaaaaand it didn’t work… Turns out you need to change the environment variable NODE_FUNCTION_ALLOW_EXTERNAL=jsdom, according to the docs.

We can then make it work, and also make the module install permanent by creating a Dockerfile with the following contents:

FROM docker.n8n.io/n8nio/n8n
USER root
RUN npm install -g jsdom
USER node

And changing our compose.yaml to:

services:
  n8n:
    container_name: n8n
    # image: docker.n8n.io/n8nio/n8n
    build:
      context: .
      dockerfile: Dockerfile
    restart: always
    ports:
      - "5678:5678"
    environment:
      - NODE_FUNCTION_ALLOW_EXTERNAL=jsdom
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
      - N8N_RUNNERS_ENABLED=true
      - N8N_SECURE_COOKIE=false
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - N8N_PORT=5678
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - TZ=${GENERIC_TIMEZONE}
    volumes:
      - n8n_data:/home/node/.n8n
      - ./local-files:/files

volumes:
  n8n_data:

Now when we start our app with docker compose up --build -d and execute the code again:

JSDOM problems

Hell nah.

Now doing it properly

Very nice, we can get what we want. But that image manipulation there bothers me. I generally only resort to such methods only when everything else fails.

[!note] And since the initial jsdom idea already failed I’m not even gonna bother.

You see, n8n is meant for this sort of task (One of its “marketing points” is the “low-codeness” it offers). I’d be pretty surprised if there is no way to parse HTML output from a HTTP Node “natively”. So let’s look for a way to do this internally: HTML Nodes!

There’s a particular HTML Node precisely meant to extract HTML, so we’re using that.

All we need is to extract the list of hrefs from the .sitemap unordered-list, so we can continue. Look how convenient the HTML Node is:

HTML Node works

Very weird thinking “low-code” / “no-code”, feels unnatural.

Iterating over results

Now, for each of them we have to do a similar task: run an HTTP request, parse the result. In this case though, we’re looking for “Brazil” in the breadcrumbs, to ensure we only query events from Brazil.

Hey, wait a second, we only made the first of 26 queries… the /cidades/a of all 26 alphabet letters… goddamn.

Conclusions

Could I have done all of this in a tiny fraction of the time by beaultifulsouping my way around? Absolutely. But I’m sure the power of this tools shines more on the AI integrations than the actual data processing.

However, it’s definetely interesting that there are tools that allow you to pipeline and process data with little to no coding experience. This actually may allow some people to perform automation tasks instead of asking you to do it.

The best kind of automation is not needing to do anything.

Maybe I’ll find a more complex project eventually where this tool makes more sense. But this was a very good “hello, world” with it.