BOOK THIS SPACE FOR AD
ARTICLE ADPart1: https://medium.com/system-weakness/bug-hunting-recon-methodology-part1-legionhunter-975b7bbe3231
Welcome to Part 2 of the Bug Hunting Recon Methodology Series.
In this section, I will outline the steps I use to extract valuable details from archived URLs, which can often reveal vulnerabilities such as information disclosure, IDOR (Insecure Direct Object Reference), and other sensitive endpoints worth investigating.
waymore -i domain.com -mode U -oU waymore_domain.com.txtwaymore is used to gather archived URLs from sources like Wayback Machine(web.archive.org) , common crawl(index.commoncrawl.org), alien vault OTX(otx.alienvault.com), URLScan (urlscan.io), virus total(virustotal.com)-i domain.com specifies domain.com as the target domain.-mode U retrieve URLs only without downloading response-oU waymore_domain.com.txt saves the unique URLs to output file waymore_domain.com.txtIf waymore is functioning properly, there is no need to run the following command.
waybackurls domain.com > wayback_domain.com.txtBelow is a mini helper script that I use to refine the URLs and assist in my manual URL analysis.
import osfrom colorama import Fore, Style, init
# Initialize colorama for Windows support
init(autoreset=True)
def display_banner():
# ASCII art in purple
banner = r"""
_ _ __
/.\ _ ___ ____ FJ___ LJ _ _ ____
//_\\ J '__ ", F ___J. J __ `. J | | L F __ J
/ ___ \ | |__|-J| |---LJ | |--| | FJ J J F L| _____J
/ L___J \ F L `-'F L___--. F L J J J LJ\ \/ /FF L___--.
J__L J__LJ__L J\______/FJ__L J__LJ__L \\__//J\______/F
|__L J__||__L J______F |__L J__||__| \__/ J______F
___ _ _ _
F __". ____ ____ _ ___ FJ L] _ _ _ ___ FJ_
J |--\ L F __ J F __ J J '__ J J |__| L J | | L J '__ J J _|
| | J | | _____J | _____J | |--| | | __ | | | | | | |__| | | |-'
F L__J | F L___--. F L___--. F L__J J F L__J J F L__J J F L J J F |__-.
J______/FJ\______/FJ\______/FJ _____/LJ__L J__LJ\____,__LJ__L J__L\_____
|______F J______F J______F |_J_____F |__L J__| J____,__F|__L J__|J_____F
L_J
"""
# Print the banner in purple
print(Fore.MAGENTA + banner)
# Print the "Script by LegionHunter" in green
print(Fore.GREEN + "Script by LegionHunter\n" + Style.RESET_ALL)
def get_unique_extensions(filename):
extensions = set() # Use a set to avoid duplicates
with open(filename, 'r') as file:
for line in file:
# Strip the line of any whitespace characters
url = line.strip()
# Split URL into path and get the extension if present
path = os.path.splitext(url)[1]
if path: # Add non-empty extensions
extensions.add(path)
# Print the unique extensions found
print("Unique Extensions:")
for ext in sorted(extensions):
print(ext)
# Display banner
display_banner()
# Example usage
filename = input("Enter the filename of waybackurls output: ")
get_unique_extensions(filename)
Below regexes tries to declutter stuff, doesn’t imply it’s 100% accurate.
UUID🆔
A UUID (Universally Unique Identifier) is a 128-bit unique identifier used for resources like user accounts or records. Extracting UUIDs during bug hunting helps identify sensitive resources, which can lead to vulnerabilities like IDOR (Insecure Direct Object Reference) or access control flaws. Finding UUIDs can also expose hidden or deprecated endpoints for further analysis.
grep -Eo '[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}' wayback_domain.com.txt | sort -uJWT (Json Web Token)💰
A JWT (JSON Web Token) is a compact, URL-safe token that represents claims transferred between two parties, consisting of a header, payload, and signature. Extracting JWT tokens in bug hunting is vital because they often contain sensitive information about user identities and permissions, which can lead to potential unauthorized access. Additionally, JWTs may contain excessive information that can be exploited, such as user roles or scopes, allowing attackers to manipulate claims and escalate privileges. Analyzing JWTs can also expose weaknesses in session handling, making them a critical target in security assessments.
cat wayback_domain.com.txt | grep "eyJ"jwt.io
Any suspicious keyword/path/number👻
grep -Eo '([a-zA-Z0-9_-]{20,})' wayback_domain.com.txtSSN (Social Security Number)🔢
grep -Eo '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b' wayback_domain.com.txtCredit Card Numbers💳
grep -Eo '\b[0-9]{13,16}\b' wayback_domain.com.txtPotential SessionIDs and cookies
grep -Eo '[a-zA-Z0-9]{32,}' wayback_domain.com.txtTokens + Secrets
cat wayback_domain.com.txt | grep "token"cat wayback_domain.com.txt | grep "token="
cat wayback_domain.com.txt | grep "code"
cat wayback_domain.com.txt | grep "code="
cat wayback_domain.com.txt | grep "secret"
cat wayback_domain.com.txt | grep "secret="
Others
cat wayback_domain.com.txt | grep "admin"cat wayback_domain.com.txt | grep "pass"
cat wayback_domain.com.txt | grep "pwd"
cat wayback_domain.com.txt | grep "passwd"
cat wayback_domain.com.txt | grep "password"
cat wayback_domain.com.txt | grep "phone"
cat wayback_domain.com.txt | grep "mobile"
cat wayback_domain.com.txt | grep "number"
cat wayback_domain.com.txt | grep "mail"
Private IP Address🚨
Identifying private IP addresses is essential for uncovering hidden internal services that could be vulnerable to exploitation. It can reveal potential security misconfigurations that expose sensitive data or systems to unauthorized access. Furthermore, this information assists in mapping out the internal network.
grep -Eo '((10|172\.(1[6-9]|2[0-9]|3[0-1])|192\.168)\.[0-9]{1,3}\.[0-9]{1,3})' wayback_domain.com.txtIPv4🟢
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' wayback_domain.com.txtIPv6🔴
grep -Eo '([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}' wayback_domain.com.txtPayment💸
grep "payment" wayback_domain.com.txtgrep "order" wayback_domain.com.txt
grep "orderid" wayback_domain.com.txt
grep "payid" wayback_domain.com.txt
grep "invoice" wayback_domain.com.txt
grep "pay" wayback_domain.com.txt
API Endpoint👾
grep "/api/" wayback_domain.com.txtgrep "api." wayback_domain.com.txt
grep "api" wayback_domain.com.txt
grep "/graphql" wayback_domain.com.txt
grep "graphql" wayback_domain.com.txt
Authentication & Authorization👮♂️
cat wayback_domain.com.txt | grep "sso"cat wayback_domain.com.txt | grep "/sso"
cat wayback_domain.com.txt | grep "saml"
cat wayback_domain.com.txt | grep "/saml"
cat wayback_domain.com.txt | grep "oauth"
cat wayback_domain.com.txt | grep "/oauth"
cat wayback_domain.com.txt | grep "auth"
cat wayback_domain.com.txt | grep "/auth"
cat wayback_domain.com.txt | grep "callback"
cat wayback_domain.com.txt | grep "/callback"
Try to identify endpoints related to SSO, SAML, OAuth, and authentication because they are critical for managing user identities and access control. These endpoints are often complex and can be misconfigured, leading to vulnerabilities such as unauthorized access or privilege escalation. Specifically, misconfigured SSO or OAuth providers can expose sensitive data and create open redirect vulnerabilities, allowing attackers to redirect users to malicious sites. By examining these endpoints, bug hunters can identify and exploit these weaknesses, ensuring robust authentication and authorization mechanisms are implemented to enhance overall application security.
grep -Eo 'https?://[^ ]+\.(env|yaml|yml|json|xml|log|sql|ini|bak|conf|config|db|dbf|tar|gz|backup|swp|old|key|pem|crt|pfx|pdf|xlsx|xls|ppt|pptx)' wayback_domain.com.txtManually we need to crawl each file and observe for any sensitive information that is disclosed and where the document is marked as “CONFIDENTIAL” , “INTERNAL USE ONLY”, “HIGHLY CONFIDENTIAL”, “PRIVATE USE ONLY”, etc..
Thank you for joining me in Part 2 of this series! Stay tuned to my YouTube channel for the next part.