Creating a recon database with Flask, MongoDB, REST Api and ChatGPT — Part Two

9 months ago 83
BOOK THIS SPACE FOR AD
ARTICLE AD

Ali

Hellow world!

This is my second blog post of the series “Creating a recon database with Flask, MongoDB, REST Api and ChatGPT”, so if you haven’t read the first one make sure to check it out here.

In the last post we covered everything up to inserting our subdomain records into the database and making sure the entries are unique. So without further ado, let’s take it up from there.

Our next goal would be updating the existing records with a new status code. (This will make more sense in about a minute)

The workflow of my recon is something like this:

Gather all possible subdomains dead or alive → all-subs.txt)

2. Resolve those subs using dnsx for subs that have an “A” record → resolved-subs.txt

3. Run everything through httpx to get status code and some metadata → metadata.txt

4. Separating subs in metadata.txt according to their status code → for example we’ll have 200-subs.txt , 403-subs.txt, 301-subs.txt and so on

Now what I want to do here is having an script that is constantly running through sc-subs.txt (200/301/403 etc.) and inserts/updates records in database.

So for example if yesterday we had a document of “dev-123.booking.com” with 500 status code, and today the status code is changed to 200, the code should be able to query “dev-123.booking.com” and change the status code accordingly (500 → 200).

Actually since I keep subs with different status codes in different text files it was easier than I initially thought.

Here’s what we have to do:

first define the path for 200-subs.txtdeclare an empty list list_200=[]insert each sub from the text file into the list with a for loopusing find_one_and_update() function and another for loop update the records

Code:

filename = "C:\\Users\\1337\\Desktop\db\\sample-200.txt"
subdomains_list = read_subdomains_from_file(filename)

list_200 = []

for subdomain in subdomains_list:
x = {}
x['subdomain'] = subdomain
x['org'] = 'booking.com'
x['status'] = '200'
x['update'] = time.time()
list_200.append(x)

for subdomain in list_200:
collection_name.find_one_and_update(
{"subdomain": subdomain['subdomain']},
{"$set":
{
"status": 200
}
},
)

Only thing is since I am using Jupyter Notebook for now I’m running the code in small chunks one at a time. Need to remember to clean the code up and turn this into a function later on.

Okay now I have pulled all the text files from my VPS into my local machine. Looks something like this:

Its time to clean the code a little bit and create different functions for each status code (tbh I’m not sure this is the best or even the fastest approach but for now I don’t have a better idea so I’ll just stick with this till later maybe 😜)

As the first step now, I dropped the collection entirely to start fresh, and now I’m going to add all-subs.txt into the db with a function:

def insert_all_subs():
filename = "C:\\Users\\1337\\Desktop\db\\all-subs.txt"
subdomains_list = read_subdomains_from_file(filename)

all_subs = []

for subdomain in subdomains_list:
x = {}
x['subdomain'] = subdomain
x['org'] = 'booking.com'
x['status'] = 'null'
x['update'] = time.time()
all_subs.append(x)

collection_name.insert_many(all_subs)

and simply calling the function with insert_all_subs() inserts the records into our collection.

It’s worth mentioning that the read_subdomain_from_file() function was described earlier:

# Function to read subdomains from a text file and return as a list
def read_subdomains_from_file(filename):
subdomains = []
with open(filename, 'r') as file:
for line in file:
subdomains.append(line.strip()) # Remove newline characters and append to the list
return subdomains

Almost forgot this one:

collection_name.create_index([("subdomain", pymongo.ASCENDING)], unique=True)

and tbh I’m not sure where best to put this…

Now that we have all the records in the db, its time to update them according to their status codes.

As mentioned earlier, I will be defining a separate function for each status code:

#For some reason this is not working
def update_200():
filename = "C:\\Users\\1337\\Desktop\db\\200.txt"
subdomains_list = read_subdomains_from_file(filename)

for subdomain in subdomains_list:
collection_name.find_one_and_update(
{"subdomain": subdomain},
{"$set":
{"status": "200"}
}
)
#It doesn't throw an error, it runs, just doesn't seem to update anything

It doesn’t throw an error, it runs, just doesn’t seem to update anything!

If you are wondering the same thing as I was, WHY? here’s why:

Silly me! Instead of update_one() I was using find_one_and_update() and I don’t know why :)) 30 Minutes gone just like that…

So you guys be carefull, DO NOT USE find_one_and_update()!! I don’t even really know where did I read about this dman function 😅

Correction:

def update_200():
filename = "C:\\Users\\1337\\Desktop\db\\200.txt"
subdomains_list = read_subdomains_from_file(filename)

for subdomain in subdomains_list:
collection_name.update_one({"subdomain": subdomain}, {"$set": {"status": "200"}})

OKAY NEW UPDATE AND IM PISSED OFF! But this is a good one:

Long story short, after checking the results I realized that the code above is not working as intended either (in only updated one record). So I spent another 40 minutes tinkering with the code and arguing with ChatPTG that its code doesn’t seem to work either…

ONLY TO REALIZE THIS:

My all-subs.txt had records in the following format:

addigygw.itspublic.booking.com
autocomplete.booking.com
business.booking.com
booking.com

and all the other status code text files were in this formant:

https://addigygw.itspublic.booking.com
https://autocomplete.booking.com
https://business.booking.com
https://booking.com

FFFFFFFUUUUUUUUU 👹👺👺👺👺

But screw it at least I learned something new from ChatGPT:

# Update all records with the specified subdomains
result = collection.update_many({"subdomain": {"$in": subdomains_list}}, {"$set": {"status": status_to_update}})

# Print the number of documents matched and modified
print("Matched documents:", result.matched_count)
print("Modified documents:", result.modified_count)

You can look apparently loop directly inside the query with $in and also you can save the query result and count later, kewl!

So our final function looks like this:

def status_200():
filename = "C:\\Users\\1337\\Desktop\db\\200.txt"
subdomains_list = read_subdomains_from_file(filename)

for subdomain in subdomains_list:
collection_name.update_one({"subdomain": subdomain}, {"$set": {"status": 200}})

Onto the next one:

def status_301():
filename = "C:\\Users\\1337\\Desktop\db\\301.txt"
subdomains_list = read_subdomains_from_file(filename)

for subdomain in subdomains_list:
collection_name.update_one({"subdomain": subdomain}, {"$set": {"status": 301}})

Double checking everything in the db:

**BEGINGING OF INTERESTING IDEA FOR LATER***

Since my sc.txt files seemed to contain http/https, in can format them later to domain.tld:80/443

Basically https://sub.booking.com would turn into sub.booking.com:443

**END OF INTERESTING IDEA FOR LATER***

And it’s pretty much just repetition after this until I update all the subs with their proper status codes.

Now that the database is populated, the next step is starting with the API. For now I’m thinking where best to deploy it. The db I’ve been using so far is on Mongo Atlas so I’m not sure where to go next after this. I’m thinking either do everything locally or on my VPS. Hmm …

If you read until here, thank you very much for following along.

Staying tuned for part three and we’ll figure something out together ;-)

Read Entire Article