BOOK THIS SPACE FOR AD
ARTICLE ADHellow world!
How y’all doing? Hope every body is good.
As was promised before, I am here to start a new series on building our very own continues recon system.
In my last series, we discovered how to build and maintain our very own recon database, where we organized our records (subdomains) according to the organization (program or scope) they belong to and their status code. If you haven’t checked it out, I highly recommend you to do so here.
Today, and in this series, I want to show you how we can do this on a continues basis, and simply build our own automation system. At the time of writing this system is very simple, but hopefully as we go further we should be able to develop a more advanced system together.
Before starting off, I would like to mention that a big part of this project was inspired by G0LDEN_INFOSEC (Gunnar Andrews) so a big shout out to him. Please go ahead and check out his YouTube video that inspired me on this topic as well.
One of the most important parts of recon, at least for me, is not wasting my time doing things that do not require human supervision and can be easily automated. I am of course talking about [passive] subdomain enumeration.
In my previous article we saw that after gathering subdomains, I like to find out which ones are dead and which ones are alive. So first, I had them running through dnsx for a quick resolve.
Next I would run httxp on live subdomains (by live I mean subdomains that resolve to a specific IP) to gather some information on each host, get their status code, and in case of a redirect, finding out where they’re heading to.
Afterwards, we’d have a huge text file with all these info, which we then separated into smaller text files according to status code. We then went ahead and formatted our text file in the way that we wanted and inserted those records into the database.
As you can imagine, this is not a lot of fun and can be very time consuming if done manually. Even worse, if you are doing this on 3–4 or even 5 different programs on a daily basis, you’ll find yourself wasting an hour or two that can be spent on actually hacking on an application every other day.
The idea here is, to write a script (or many) that takes care of these actions for us on a continued basis.
For simplicity, I have divided this project into 5 main components:
Stage1.shStage2.shStage3.shInsert.pyUpdate.pyLet’s go over each, one by one.
#!/bin/bashbaseDir="/root/dev"
# Pompt user for domain input
read -p "Enter root domain: " root_domain
read -p "Enter organization name: " org_name
if [ ! -d "${baseDir}/${org_name}" ]; then
mkdir ${baseDir}/${org_name}
echo "Created directory: '$org_name'"
echo ${root_domain} > "${baseDir}/${org_name}/rootdomain.txt"
else
echo "'$org_name' already exists."
echo "Finding subdomains..."
fi
./stage2.sh "${baseDir}/${org_name}" $org_nam
This script, written in bash, simply prompts you to provide it with 2 arguments: a root domain, and an organization name.
The root domain is what we will be starting our subdomain discovery from, and the organization name is going to be the name of the directory we will keep the results in, and also will be used as the “org” name in our database later.
In the example below, the script is going to create a “hackerone” directory, and in it, will create a rootdomain.txt which has “hackerone.com” inside it.
It then goes ahead and calls the second script (stage2.sh) and provides it with the directory path and the organization name. Pretty simple right?
#!/bin/bashscriptsDir="/root/dev/scripts"
# Checking if the correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 /path/to/directory program/organization name"
exit 1
fi
# Extract provided arguments
dir=$1
programName=$2
echo "Gathering subs for $programName..."
subfinder -dL "${dir}/rootdomain.txt" -all -silent | anew -q "${dir}/all_subs.txt"
echo "Resolving found subdomains..."
dnsx -l "${dir}/all_subs.txt" -silent | anew -q "${dir}/resolved.txt"
echo "Gathering http metadata..."
httpx -l "${dir}/resolved.txt" -sc -title -ct -location -server -td -method -ip -cname -asn -cdn | anew "${dir}/metadata.txt"
echo "Separating subs by status code..."
# Using sed to remove the color output in metadata file so the grep doens't freak out later on
sed 's/\x1B\[[0-9;]*[JKmsu]//g' "${dir}/metadata.txt" > "${dir}/metadata.tmp"
grep '\[200\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/200.txt"
grep '\[301\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/301.txt"
grep '\[302\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/302.txt"
grep '\[401\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/401.txt"
grep '\[403\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/403.txt"
grep '\[404\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/404.txt"
grep '\[502\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/502.txt"
grep '\[503\]' "${dir}/metadata.tmp" | cut -d " " -f 1 | cut -d "/" -f 3 > "${dir}/503.txt"
echo "Inserting records into the database..."
programName=$(basename "$dir")
python3 "${scriptsDir}/insert.py" "${dir}/all_subs.txt" ${programName}
echo "Updating status codes..."
python3 "${scriptsDir}/update.py" "${dir}" ${programName}
Now stage2 is actually where the magic happens.
First its going to define a scriptDir where we actually put all our bash and python files. (change this according to your own needs)
It then checks to see if the number of arguments provided is correct (you don’t need to worry about this since you won’t be doing it manually, but know that you actually can 😉)
Next, it will start the subdomain discovery form the rootdomain.txtfile that we created in stage1. It runs subfinder to get the subdomains, then passes the results to dnsx for resolving purposes, and then passed the results from dnsx to httpx to collect metadata and status codes.
And then there’s a whole bunch of data manipulation with sed , grep , and cut to change the data into a usable format for us to be put into the database later.
Then it’s going to call the insert.py and provides it with the path to discovered subdomains files + the org (program) name.
And lastly, it calls the update.py to update the records already inserted into the database, along with the path to the directory where the status code text files exist, + the org (program) name.
To show you how these two scripts work together, I am going to use “hackerone” as an example:
As you can see we have a “hackerone” directory created on our machine with all the results saved inside.
We can also double check the database to make sure the records are inserted properly.
We are going to stop here today, so that I have something to write about in part two as well haha 😜. We’ll be discussing insert.py , update.py and stage3.sh which is actually what makes this whole process automated and continues.
But I would to point out that this is a highly customizable script. With a little bit of googling you can go ahead and implement your own recon methodology/workflow into it. As I mentioned in the beginning of this article it is a very simple script that pipes a few well known tools together. (subfinder, dnsx, httpx, etc.)
You should have no problem integrating other subdomain discovery tools like sublist3r, amass, github-subs or even tools for subdomain bruteforcing. Nuclei can also be easily implemented on top of these.
The point of this article is to give you the view to understand how and why to develop your own automation system. You are going to learn a lot in the process, save a ton of time in future, and you will have a highly customizable script (some might even call it a framework 😆) for your automation.
Okay enough talking and screen time for today I feel like my eyes are popping right out.
Hope you like todays writing, if you did please consider giving it a clap.
Till next time.
Peace.