Creating a recon database with Flask, MongoDB, REST Api and ChatGPT — Part Three

9 months ago 76

BOOK THIS SPACE FOR AD

ARTICLE AD

Hellow world!

How are you all doing? Hope everyone is ready for part three of our series. If you didn’t read the first two parts, this article probably won’t make a lot of sense to you 😆 So I highly recommend you the check them out.

Click here for part one & Click here for part two.

As we discovered in the earlier posts, our flow looks something like this:

First, we start our subdomain discovery phase. This usually leaves us with a huge text file (we called it all-subs.txt) that needs to be cleaned up and filtered a little bit. But before doing any of those we added all the entries into our database.

Second, via name resolution, we grabbed the subdomains that have an A record or simply are resolving and saved them in our resolved-sub.txt file.

Third, we separated each resolving subdomain according to their status codes. We then updated each subdomain in our database with the proper status code.

Now that we have our database populated with our initial recon data, it is time to setup a system that makes it easy for us to interact with the database. After all this is the whole point of everything we have been doing up until now.

For that, as explained in the title, we are going to be using Flask. So we are going to have to install Flask on our local machine and simply create a very simple Flask App with a connection to our Mongo Atlas database.

Here’s how it looks like:

from flask import Flask, request, jsonify
from pymongo import MongoClient

app = Flask(__name__)

# Create a new client and connect to the server
client = MongoClient("<ENTER YOUR CONNECTION STRING HERE")
db = client['assets']
collection = db['booking']

@app.route('/assets', methods=['GET'])
def get_assets():
# Get query parameters
org = request.args.get('org')
status = request.args.get('status')

# Prepare query based on parameters
query = {}
if org:
query['org'] = org
if status:
query['status'] = int(status)

# Query MongoDB
assets = collection.find(query)

# Prepare response
result = []
for asset in assets:
result.append({
'id': str(asset['_id']),
'org': asset.get('org', ''),
'status': asset.get('status', ''),
'subdomain': asset.get('subdomain', '')
# Add other fields as needed
})
return jsonify(result)

if __name__ == '__main__':
app.run(debug=True)

As can be seen in the code, we’ll be able grab all the data in our database by visiting the “/asset” rout on our app (that would 127.0.01:5000/assets if you are running this on your localhost).

The route also accepts two parameters that are “org” and “status”.

The “org” parameter is really of no use to us right now as we only have assets of one organization in our database. But later on once you have a huge database hunting on many different programs, this would be one way to navigate through different organization and scopes. You can easily tinker with the app yourself, or with the help of ChatGPT, and make it accept different parameters as you’d like.

The “status” parameter is what we will be focusing on for now. So let’s say if I wanted to grab all subdomains (owned by booking.com) with a status code of 200, I could simply visit (or make a CURL request to) the following URL:

http://127.0.0.1:5000/assets?org=booking.com&status=200

Output:

[
{
"id": "65b93164e5fbe1b942ed10f1",
"org": "booking.com",
"status": 200,
"subdomain": "authorityportal.booking.com"
},
{
"id": "65b93164e5fbe1b942ed1149",
"org": "booking.com",
"status": 200,
"subdomain": "careers.booking.com"
},
{
...

One problem here for me is the way that the data is represented. This is by no means what I had in mind when I started this project. The data here is shown in a JSON format which is of no use to me.

Reason is that let’s if I wanted to simply grab all the subdomains with a 401 status code from my database and pipe it into another tool let’s say for fuzzing or anything else it wouldn’t work.

Example:

curl "https://mydomain.tld"/assets?status=401 | nuclei -t ~/nucleit-templtes/blah-blah

In in order for that to work, the code/output had to change a little bit.

First of all I needed to get rid of all the extra info like “id” or “org” or even “status” (since the org and status are part of the request and id is really not that helpful in the output).

Secondly the output of the query should have been saved in a string rather than a list, so the returned data looks like this:

authorityportal.booking.com
careers.booking.com
...

rather than:

[
{
"authorityportal.booking.com"
},
{
"careers.booking.com"
},
...

With that in mind I made the following adjustments.

I basically removed a chunk of the code and changed this:

result = []
for asset in assets:
result.append({
'id': str(asset['_id']),
'org': asset.get('org', ''),
'status': asset.get('status', ''),
'subdomain': asset.get('subdomain', '')
# Add other fields as needed
})
return jsonify(result)

To this:

result = ""
for asset in assets:
result += asset['subdomain'] + "\n"
return result

Now if I made a curl request to “mydomain.tld”/assets?status=401" I’d get the results exactly as I wanted. One subdomains per each line.

I was very happy with that until I decided to take a look at it in the browser…

Urgh! The browser was not rendering the line breaks “\n” properly.

So naturally I tried changing “\n” to “ ”.

This time however, the browser was fine and the curl was not… 😆

Disappointed as I couldn’t think of any particular solution, once again ChatGPT rushed to the rescue.

ChatGPT: To make the output compatible with both the browser and the curl command, we can adjust the response format conditionally based on the request headers. If the request header indicates that the client accepts HTML, we’ll use   tags for line breaks. Otherwise, we’ll use \n for line breaks.

And so just like that, we got this:

result = ""
for asset in assets:
result += asset['subdomain'] + " "

accept_header = request.headers.get('Accept', '')
if 'text/html' in accept_header:
# Client accepts HTML, using instead of "\n" for line breaks
result = result.replace("\n", " ")

return result

Now both the browser and the curl command output the data beautifully and I can go to sleep peacefully tonight 😅

So far until now we have learned how we can build our own database with the information we get after our initial recon phase of subdomain enumeration and how we could be able to store and access our data.

More importantly, we don’t need to store our recon data in text files anymore. We now have all of it stored in our database which can be retrieved at anytime and anywhere really according to our needs and be used for further recon/content-discovery/exploitation.

If you read all the way down till here, thank you very much. I hope you learned a thing or two, as I did.

I’m thinking of maybe extending this series with another post or two on how to make this continues. Or maybe I’ll just make a separate series on that. If you prefer one or another please let me know.

Peace ✌️

Read Entire Article