UP | HOME

Hello World Wide Web

Intro

A new website is born! It makes me wonder - how many websites are being born every day? or every hour/minute/second? And how many websites are there right now? We cannot know for sure and the definition of a website may vary, but various estimates indicate that there are over a billion websites out there. And to think that there was a first website at all is astonishing.

The moment I set up osherz.com I witnessed many incoming web requests - most of them look like bots, crawlers, and scanners looking for various vulnerabilities. I was questioning how did they find this website so quickly? I did not publish it anywhere or informed anyone about it, did they just happen to guess the URL?

Then I thought how cool it would be if we could also see other websites as they're being created in real time, each one like a new twinkle in a sky full of stars.

But how can this be achieved? How do scanners / bots find new websites right when they emerge?

Certificate Transparency Logs

So when I said I did not inform anyone about the creation of this site, I was half-lying… Since setting up this website (or any other) usually involves issuing a new TLS certificate for the domain name, which means a Certificate Authority (CA) must be informed about it. I was pleasantly surprised to find out about Certificate Transparency (CT), which is a system to publicly track all issued certificates for the purpose of monitoring for fraudulent or duplicate certificates. Apparently the issuance of new certificates could also be used as a tool for finding new websites, or for conducting target reconnaissance.

The CT log can be seen as an event stream of newly issued certificates. Each event usually corresponds to a newly published website or to a mundane certificate renewal.

At the top of this page is a LIVE stream from a CT log managed by Let's Encrypt where each "star" is a certificate that was recently issued for some website. You can click any of the stars to visit those websites. In order to achieve this, I wrote a small Go program to parse and stream that CT log, while this page reads it over a WebSocket.

Other approaches

The following are different possible approaches for finding new websites I ended up not pursuing.

Keeping track of the CT logs

Keeping track of the logs over a longer period of time can help us determine if the certificate was issued for a new website or whether it was just a renewal (or an actual malicious issuance). A good tool that does this is: crt.sh

Scanning the entire IPv4 address range

A completely different approach that is basically a large-scale port scan. Doing that is a grey area and will probably flag me for abuse, although some organizations manage to do it. In addition, I would have to store the state of the scan and periodically look for new changes in it eg. new active hosts with new open ports.

Monitoring or querying a DNS server for record updates / newly seen queries

DNS zone transfer is usually blocked and I do not own a very popular DNS server, so this was ruled out pretty quickly.

WHOIS querying

I started incorporating WHOIS queries on the domains found in the CT logs, this way I can tell when each domain was created. Unfortunately, the additional latency created by these queries was clashing with the real-time effect I was looking for. This approach is probably still very feasible given better engineering and more time at hand, might do a future post about it.

Date: 2026-01-10

Author: Osher Jacob

Emacs 30.2 (Org mode 9.7.11)