Unlocking Some Effective Information Gathering Methodologies

1 month ago 27
BOOK THIS SPACE FOR AD
ARTICLE AD

Prasanna Acharya

Information gathering is a crucial step for undergoing penetration testing process.

Basically, it refers to the process of collecting data about a target system, network, or individual to identify potential vulnerabilities.

Alright, when it comes to information gathering, most probably you would start thinking about the tools that are used for this process. However, in this blog I won’t be covering those tools one by one, but I will be talking about different methodologies that can be used in information gathering for web penetration testing (as hacking is more about methods implied and mentality, yk😉). This might be a bit long, but I hope it will cover some important methodologies that would be worth your time.

So, we will be basically covering the following:

Search Engine ReconnaissanceWeb Server FingerprintingReviewing Sitemaps and Robots FileWeb Server EnumerationReviewing Web Page ContentsEntry Points Identification

When it comes to hacking, info gathering and then search engine, how can we forget our good old friend called Google Dorking (or Google Hacking).

Some important information that could be obtained from Google Dorking are:

Third-party or cloud service configuration filesSensitive directoriesEmails and usernamesServer and software detailsSensitive exposed files

Some Important Search Operators

site : limits the search to the provided domain.inurl : returns results that include the keyword in the URL.intitle : returns only the results that have the keyword in the page title.intext or inbody : will only search for the keyword in the body of pages.filetype : matches only a specific filetype, i.e. png, or phpcache : views the cached contents that may have changed since the time it was indexed“ “ (quotes) : searches for exact phrase or keyword match

Example usage:

site:example.com intitle:"login" inurl:admin filetype:php

As we might be needing information related to target application’s web server for performing penetration testing on web application, we try to identify the type and version of the server.

1. Web tools

In the first method for fingerprinting, we can use web tools like Wappalyzer (a browser extension). This might give us the web server that the web application is running.

Wappalyzer

2. Banner grabbing

It is is performed by sending an HTTP request to the web server and examining its response header.

An example request to a response from Apache server.

HTTP/1.1 200 OK
Date: Thu, 05 Sep 2019 17:42:39 GMT
Server: Apache/2.4.41 (Unix)
Last-Modified: Thu, 05 Sep 2019 17:40:42 GMT
ETag: "75-591d1d21b6167"
Accept-Ranges: bytes
Content-Length: 117
Connection: close
Content-Type: text/html
...

Here, we get the HTTP Server type along with its version.

3. Malformed requests

When the above methods do not work, we can then finally decide what the web server for the target application is, by sending a malformed request (generally using web proxy tools like Burp Suite).

For Apache server:

GET / SANTA CLAUS/1.1
HTTP/1.1 400 Bad Request
Date: Fri, 06 Sep 2019 19:21:01 GMT
Server: Apache/2.4.41 (Unix)
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

For Nginx server:

GET / SANTA CLAUS/1.1
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.3</center>
</body>
</html>

Note the first line of the request, where a non-existent method is used.

robots.txt file

This file contains a list of directories that prevents the spiders, robots, or crawlers from being indexed or accessed.

This file might contain web directories that could be sensitive, might contain hidden pages and functionalities or might contain internal files and resources.

An example robots.txt file:

User-agent: *
Disallow: /search
Allow: /search/about
Allow: /search/static
Allow: /search/howsearchworks
Disallow: /sdch
...

Sitemaps

A sitemap is a file where a developer or organization can provide information about the pages, videos, and other files offered by the site or application, and the relationship between them.

Testers can use sitemap.xml files to learn more about the site or application to explore it more completely.

Sometimes the sitemap directory is listed in robots.txt file or generally the sitemaps could be found in files like sitemap.xml or sitemap.txt as in https://example.com/sitemap.xml

sitemap

We can enumerate the web server to check for the following two things:

1. Non-standard ports

Sometimes, web applications may be associated with arbitrary TCP ports, and can be referenced by specifying the port number as follows: http[s]://www.example.com:port/ , while web applications usually live on port 80 (http) and 443 (https).

For example: We might get a web application that runs a web page on port 8080 (Apache Tomcat). This might give us some extra information about the target.

For finding about the other referencing ports, we can use Nmap.

2. Virtual hosts and sub-domains

A virtual host in web server configuration allows multiple domains (or websites) to be hosted on a single server. It is slightly different from sub-domain. The key difference between VHosts and sub-domains is that a VHost is basically a sub-domain served on the same server and has the same IP, such that a single IP could be serving two or more different websites.

We can enumerate the web application for getting the sub-domains and virtual hosts. For sub-domains, we can simply use command-line tools subfinder.

For virtual hosts, we should be using web fuzzing tools like ffuf or gobuster.

For more about ffuf and web fuzzing visit: Everything You Need to Know About Ffuf for Web Application Fuzzing

For fuzzing virtual hosts, the command is:

ffuf -w /path/to/subdomains-list:FUZZ -u http://mydomain.com/ -H 'Host: FUZZ.mydomain.com'

1. Web page comments

Check HTML source code for comments containing sensitive information that can help to gain more insight about the application. It might be SQL code, usernames and passwords, internal IP addresses, or debugging information.

For example:

...
<div class="table2">
<div class="col1">1</div><div class="col2">Mary</div>
<div class="col1">2</div><div class="col2">Peter</div>
<div class="col1">3</div><div class="col2">Joe</div>
<!-- Query: SELECT id, name FROM app.users WHERE active='1' -->
</div>
...

2. Metadata

Some META tags do not provide active attack vectors but instead allow an attacker to profile an application:

<META name="Author" content="Andrew Muller">

Others:

Refresh meta tag:

<META http-equiv="Refresh" content="15;URL=https://www.owasp.org/index.html">

Keywords meta tag:

<META name="keywords" lang="en-us" content="OWASP, security, sunshine, lollipops">

3. Javasript codes and files

Check JavaScript code for any sensitive information leaks which could be used by attackers to further abuse or manipulate the system. Look for values such as: API keys, internal IP addresses, sensitive routes, or credentials.

For example:

<script type="application/json">
...
{"GOOGLE_MAP_API_KEY":"AIzaSyDUEBnKgwiqMNpDplT6ozE4Z0XxuAbqDi4",
"RECAPTCHA_KEY":"6LcPscEUiAAAAHOwwM3fGvIx9rsPYUq62uRhGjJ0"}
...
</script>

4. Source map files

Source map files end with .map extension and checking them for any sensitive information can help to gain more insight about the application.

For example:

{
"version": 3,
"file": "static/js/main.chunk.js",
"sources": [
"/home/sysadmin/cashsystem/src/actions/index.js",
"/home/sysadmin/cashsystem/src/actions/reportAction.js",
"/home/sysadmin/cashsystem/src/actions/cashoutAction.js",
"/home/sysadmin/cashsystem/src/actions/userAction.js",
"..."
],
"..."
}

For identifying the entry points in the information gathering process initially, so that we can further carry out the penetration testing process, it might be necessary to consider the following points:

First, get a good understanding of the application and how the user and browser communicates with it.Pay attention to all HTTP requests as well as every parameter and form field that is passed to the application.Also, pay special attention to when GET requests are used and when POST requests are used to pass parameters to the application.Make special note of any hidden form fields that are being passed to the application, within POST request.Take note of any interesting parameters in the URL, custom headers, or body of the requests/responses.

And, this is it! This might be a lot but hopefully this might be covering some important methodologies that can be used for information gathering process in web penetration testing.

I hope this blog to be helpful for you regarding web penetration testing or bug hunting.

If you read the contents up to here, thanks a lot.😌

Feel free to contact me for any queries or just to say Hello.

Linkedin

Discord

Instagram

Github

Twitter

Read Entire Article