Bug Bounty Recon: Content Discovery (Efficiency pays $)

2 years ago 136
BOOK THIS SPACE FOR AD
ARTICLE AD

Sm9l

Banner

Content Discovery — The process of finding endpoints; URLs, Parameters and Resources.

Example: We start with domain.com but how do we find domain.com/potentially/vulnerable/page, domain.com/?vulnerableParameter= or domain.com/secretPasswords.bak?

NOTE: This is the fourth step in bug bounty hunting, which follows from the third, Fingerprinting:

The Last step of reconnaissance is Content Discovery. We know which assets exist now, and what they are… but what do they do? How can we find site functionality and contents which can be exploited?

Right now, we’re going to cover four areas of Content Discovery and show you how to take Content Discovery to a higher level, in ways other hunters don’t:

Active Discovery — Brute Force (the right way) and Self Crawling.Passive Discovery — Common Crawl, WayBackMachine, Existing databases, and More.Manual Discovery — Google Dorking, Aquatone.Essential Utility Tools.

Active Discovery means going straight to the source! Interacting with the target and letting them know.

Brute Force, the Correct Way.

Brute Forcing during content discovery just means trying different endpoints, over and over, untill you find one that exists. We are looking for new URLs, new parameters, and files/resources.

Most hunters are familiar with this, and will be eager to fire up gobuster to do the job. Whilst gobuster definitely does this, I have recently been introduced to a new tool called feroxbuster.

Feroxbuster Logo. Source: https://github.com/epi052/feroxbuster/blob/main/img/logo/default-cropped.png

A simple, fast, recursive content discovery tool written in Rust.

From my tests*, Feroxbuster was 330.7% faster, than gobuster!

Chart showing the difference between gobuster and feroxbuster during my test.

It is important to consider that most hunters will run automated recon tools on a VPS, which adds a financial cost to hunting. A VPS (with similar specs to the machine that ran my tests) would cost $48 per month, or $576 a year.

If we could reduce the time spent running these tools by 330.7%, we could save $401 a year worth of opportunity cost, for doing nothing more than swapping tools!

Efficiency pays.

*My primitive test was on a content dense domain, using a wordlist of length 4615. Recreations of the test will differ in specific outcomes, yet the trend that feroxbuster should be significantly faster, will remain.

Meg

Another thing most hunter’s overlook: websites don’t like being spammed with requests.

If you send to many consecutive requests, you may be blocked, filtered, or be given false results. Which will slow you down and waste your time.

In order to avoid annoying a server, we can use the tool meg by the amazing TomNomNom

Demonstration of using meg.

Meg will take two parameters, a list of domains to h̶a̶r̶a̶s̶s̶ test, and a wordlist of url paths (or just one). It will then try every domain against one entry from the wordlist, before moving onto the next. This way, the domains have more time between each request.

Self Crawling

Self Crawling: Automated Curiosity. If we find a link on a page, where does it go? Where do the new links on that page go? And so on.

For this, we use what we call a “crawler” or a “spider”. There are a number of great tools used for this, and it’s also a fun project to build your own (it’s suprisingly simple). I recommend hakrawler, it is a concise tool written in golang.

Example of using hakrawler on tesla.com

Passive Discovery is about visiting third party sources, that have information on our target. The target never receives requests from us. stealthy.

For this section, there are tools which automated and aggregate these methods. The most commonly known is gau (Get All URLS).

Common Crawl

Would you like 200TB worth of internet data for free? Welcome Common Crawl. A 7 year long project to make internet data available to everyone.

Common Crawl Logo. https://commoncrawl.org/

This dataset can be accessed via the GAU or it can accessed directly via the Common Crawl Index.

WayBackMachine

The WayBackMachine (AKA The Internet Archive), is another free dataset of historical urls, endpoints and resources.

WayBackMachine Logo. Source: https://archive.org/web/

Fun Fact: Gau is based of TomNomNom’s original tool which exclusively got urls from the waybackmachine.

Existing Databases: AlienVault’s OTX and URLScan.io

AlienVault’s OTX and URLscan.io are vast open source security tools which also have datasets that can be queried. AlientVault’s OTX and URLScan is included in gau.

AlientVault OTX Logo. https://otx.alienvault.com/

Manual Discovery!? If we have all these amazing automated ways of Content Discovery, why on earth would we waste our precious time? It may sound contradictory, but access to all these automated tools is exactly why we need manual discovery.

If we have access to them, so does everyone else on the internet, 10,000s of hunters running the same tools means most of the low hanging fruit will be picked before you can. So if you put more effort into discovery than other lazy hunters, you will reap more bounties.

Google Dorks

This method requires the lowest skill and will rewards the biggest bounties*.

Source code, passwords and Personal Information shouldn’t be public on a site, but developers are human, they make mistakes. We can search for these by using google dorks!

site:example.com filetype:bak — backups, source code, databases, logs, etc.

Demonstration of using google dorks to find secretssite:example.com filetype:mdf— SQL databasessite:example.com filetype:zip — source code, large datasets, anything. (Other compression extensions apply here too!)site:example.com filetype:sql— SQL related filessite:example.com filetype:db— Generic databasessite:example.com filetype:pdf— Internal secret reports

Sometimes we can find this stuff without even looking at the target site! Developers use lots of third party tools for organisation, and sometime reveal to much.

site:trello.com “target” — secret endpoints, PII, etc

Demonstration of using a trello google dork to find secret information about an internal API.site:pastebin.com “target” — leakssite:codepen.io “target” — source code.

*BONUS TIP: These are low hanging fruits and require quite a bit of luck, you can increase your chances of finding these by using other search engines that have different indexes, such as Bing, Yandex, Duckduckgo, etc!

Aquatone

Qualitative information cannot be automated, but is incredibly valuable. Visiting each website to physically look at the site, is tedious and slow. Aquatone is a tool which can take a screenshot of many URLs, subdomains and domains, which allows you to easily browse them.

graph generated from aquatone. (This is why you should use aquatone and not do things manually).

What are we actually looking for?

This is hard to describe because you need to use your “hacker mentality” to tell if a page is interesting and worth investigating. But I can give you some questions to ask yourself about what you see:

Does the page look old? — it could be out of date or unpatched.Does it look odd? — pages that are custom made are prone to bugs.Is there interesting functionality? — Inputs can be fuzzed.Does it look like it shouldnt be public? — Index pages or internal pages are sometimes exposed.Are there any error messages? — Errors are bad for developers, they are good for us.

Index page exposed to the internet found by aquatone.

Very briefly we’ll cover some essential tools which you will need, now that you have endpoints.

GF By Tomnomnom — This allows you to sorts URLs into different vulnerability groups, letting you find bugs more easily.

httpx by ProjectDiscovery — This tells you which URLs are live (lot’s of urls from these methods will be dead).

uniq/sort — cat subdomains.txt | sort | uniq will allow you to remove duplicate lines from files, you will get duplicate subdomains from various sources.

By now you should have a long list of endpoints which will be a strong addition to your information gathered from the prior stage, Fingerprinting, Finishing the Recon stage, you now have a good foundation for bug hunting and the Exploitation stage.

If you thought any of this information was useful, liking the article 👏 or following would be a free and easy way to support me. I look forward to posting more informational content.

DISCLAIMER: I take no responsibility for any actions committed by readers. Only perform enumeration on targets you have permission for. Always read and check the scope of a program for guidance.

Read Entire Article