The People Hacker
Posts
Old Web Pages, New Clues: How to Investigate What’s No Longer There

Old Web Pages, New Clues: How to Investigate What’s No Longer There

In the world of Open Source Intelligence (OSINT), time is a crucial yet often underestimated dimension. We will explore 3 ways of using time as a tool to conduct investigations and garner new information about our mark.

John Ruddy
July 09, 2025 • Estimated Reading Time: 5 minutes

Temporal OSINT refers to the practice of studying how online information changes over time. This includes archived websites, cached pages, metadata timestamps, and content that may have been removed but still exists in backup systems or digital archives.

A colleague once advised: “Don’t put write anything down you wouldn’t want in the Sunday papers—and definitely don’t expect it to disappear just because you deleted it.” This warning captures the core of temporal OSINT. Once published, digital content often leaves lasting traces that can be recovered and analysed.

Why Time Matters in OSINT Investigations Investigations that rely on the evolution of digital evidence over weeks, months, or even years benefit enormously from temporal analysis. Key insights that emerge from this approach include:

Identifying staffing changes through historical job postings.
Discovering security misconfigurations from old site directories or removed admin links.
Recovering deleted email addresses or bios from archived contact pages.
Detecting changes in tone or policy through comparisons of web copy over time.
Understanding email naming conventions via older pages or PDF documents.

By examining the past states of a website or domain, investigators can construct a clearer picture of the organisation’s digital footprint and strategic shifts.

Technique 1: Scraping the Wayback Machine

The Wayback Machine from the Internet Archive is the cornerstone of temporal OSINT. It provides historical snapshots of web pages going back to the late 1990s.

To automate access to archived pages, use the waybackpy Python library:

Example Script

from waybackpy import WaybackMachineCDXServerAPI

url = "https://example.com"
user_agent = "Mozilla/5.0 (compatible; TemporalOSINTBot/1.0)"

wayback = WaybackMachineCDXServerAPI(url, user_agent)

for snapshot in wayback.snapshots():
    print(snapshot.timestamp(), snapshot.archive_url)

This script returns a list of archived snapshots, which can then be individually reviewed or compared to detect changes in content, structure, or metadata.

Technique 2: Comparing Historical Page Versions

Comparing different versions of the same web page can highlight meaningful changes, such as updated contact details, revised policies, or removed resources.

import requests
from difflib import unified_diff

old_url = "https://web.archive.org/web/20200101000000/https://example.com"
new_url = "https://web.archive.org/web/20220101000000/https://example.com"

old_page = requests.get(old_url).text.splitlines()
new_page = requests.get(new_url).text.splitlines()

diff = unified_diff(old_page, new_page, fromfile='2020', tofile='2022')
print("\n".join(diff))

Added or removed staff members.
Changes in file paths or directory structure.
Altered terms of service, privacy policies, or disclaimers.

Technique 3: Using Search Engine Caches

Search engines frequently cache public web pages. Although less reliable for long-term access, cached pages can reveal recently deleted or edited content.

Google Cache Lookup You can perform a manual lookup using the google search:

cache:example.com/contact

Or generate the URL programmatically:

def google_cache_url(path):
    return f"https://webcache.googleusercontent.com/search?q=cache:{path}"

print(google_cache_url("example.com/contact"))

Conclusion

Temporal OSINT is not just about looking into the past—it is about building a narrative of digital change. By exploring how content evolves, disappears, or resurfaces over time, investigators gain critical context that static scraping techniques often miss.

🔥 Tooling Roundup

This week’s tooling is our very own Temporal OSINT Notebook

What it does: This Jupyter notebook teaches you how to analyse websites and digital content across time using the Internet Archive's Wayback Machine. It includes tools to discover historical snapshots, compare content changes, track keywords over time, and identify gaps in archival coverage.
Why it's useful: If you're investigating digital information, the ability to go back in time is a game-changer.
- For journalists: Verify timelines, uncover edits, and detect censorship or misinformation by comparing past and present web content.
- For security professionals: Conduct digital forensics, monitor online threat activity, and track changes in organisational behaviour.
- For the general public: Fact-check claims, research brand histories, or preserve digital evidence — all without specialised tools.
Links:
- https://github.com/JohnRuddy/Temporal-OSINT
- https://colab.research.google.com/github/JohnRuddy/Temporal-OSINT/blob/main/Temporal%20OSINT%20-%20Wayback%20Machine.ipynb

Did You Know? As part of the Navy’s Smart Ship program, the USS Yorktown was outfitted with 27 Pentium Pro PCs running Windows NT to automate various systems, including propulsion. A crew member accidentally entered a “0” into a database field. This caused a divide-by-zero error in the ship’s Remote Database Manager, which was not properly handled. The error crashed the ship’s entire control LAN and disabled propulsion, leaving the cruiser dead in the water for nearly three hours. This showed how a simple data entry error—tied to time or logic operations—can disable mission-critical systems.

Till next time,

John

Disclaimer: All of the above tools should only be used in controlled, ethical environments — such as red team engagements, security testing, or awareness training. Using these tools without permission is illegal and unethical. Just so you know.