BOOK THIS SPACE FOR AD
ARTICLE ADConduct Search Engine Discovery Reconnaissance for Information Leakage
Review Webserver Metafiles for Information Leakage
Enumerate Applications on Webserver
Review Webpage Comments and Metadata for Information Leakage
Identify Application Entry Points
Map Execution Paths Through Application
Fingerprint Web Application Framework
1.Search engines like Google and Bing supports various advanced search operators to refine search queries. These operators are often referred to as “Google dorks”.
We can use “site:” operator in Google search to find all the sub-domains that Google has found for a domain. Google also supports additional minus operator to exclude sub-domains that we are not interested in “site:*.wikimedia.org -www -store -jobs -uk”Using site operator in Google search to find sub-domains
Bing search engine supports some advanced search operators as well. Like Google, Bing also supports a “site:” operator that you might want to check for any additional results apart from the Google searchFinding sub-domains using “site:” operator in Bing
2.There are a lot of the third party services that aggregate massive DNS datasets and look through them to retrieve sub-domains for a given domain.
VirusTotal runs its own passive DNS replication service, built by storing DNS resolutions performed when visiting URLs submitted by users. In order to retrieve the information of a domain you just have to put domain name in the search barSearching for sub-domains using virustotal
sub-domains found using VirusTotal
DNSdumpster is another interesting tools that can find potentially large number of sub-domains for a given domainSearching for sub-domains using DNSdumpster
The OWASP Amass tool suite obtains subdomain names by scraping data sources, recursive brute forcing, crawling web archives, permuting/altering names and reverse DNS sweeping.
amass --passive -d appsecco.com # Amass 2.x
amass enum --passive -d appsecco.com # Amass 3.x
Using OWASP Amass to discover subdomains for a given domain
3.Certificate Transparency(CT) is a project under which a Certificate Authority(CA) has to publish every SSL/TLS certificate they issue to a public log. An SSL/TLS certificate usually contains domain names, sub-domain names and email addresses. This makes them a treasure trove of information for attackers. I wrote a series of technical blog posts on Certificate Transparency where I covered this technique in-depth, you can read the series here.
The easiest way to lookup certificates issued for a domain is to use search engines that collect the CT logs and let’s anyone search through them. Few of the popular ones are listed below -
https://crt.sh/https://censys.io/https://developers.facebook.com/tools/ct/https://google.com/transparencyreport/https/ct/Finding sub-domains of an organisation’s primary domain using crt.sh
Additional to the web interface, crt.sh also provides access to their CT logs data using postgres interface. This makes it easy and flexible to run some advanced queries. If you have the PostgreSQL client software installed, you can login as follows:
psql -h crt.sh -p 5432 -U guest certwatch
We wrote few scripts to simplify the process of finding sub-domains using CT log search engines. The scripts are available in our github repo —
Interesting sub-domain entry from CT logs for uber.com
The downside of using CT for sub-domain enumeration is that the domain names found in the CT logs may not exist anymore and thus they can’t be resolved to an IP address. You can use tools like massdns in conjunction with CT logs to quickly identify resolvable domain names.
# ct.py - extracts domain names from CT Logs(shipped with massdns)# massdns - will find resolvable domains & adds them to a file ./ct.py icann.org | ./bin/massdns -r resolvers.txt -t A -q -a -o -w icann_resolvable_domains.txt -
Using massdns to find resolvable domain names
4.Dictionary based enumeration is another technique to find sub-domains with generic names. DNSRecon is a powerful DNS enumeration tool, one of it’s feature is to conduct dictionary based sub-domain enumeration using a pre-defined wordlist.
python dnsrecon.py -n ns1.insecuredns.com -d insecuredns.com -D subdomains-top1mil-5000.txt -t brtDictionary based enumeration using DNSRecon
5.Permutation scanning is another interesting technique to identify sub-domains. In this technique, we identify new sub-domains using permutations, alterations and mutations of already known domains/sub-domains.
Altdns is a tool that allows for the discovery of sub-domains that conform to patternspython altdns.py -i icann.domains -o data_output -w icann.words -r -s results_output.txtFinding sub-domains that match certain permutations/alterations using AltDNS
6.Finding Autonomous System (AS) Numbers will help us identify netblocks belonging to an organization which in-turn may have valid domains.
Resolve the IP address of a given domain using dig or hostThere are tools to find ASN given an IP address — https://asn.cymru.com/cgi-bin/whois.cgiThere are tools to find ASN given a domain name — http://bgp.he.net/Finding AS Number using IP address
The ASN numbers found can be used to find netblocks of the domain. There are Nmap scripts to achieve that — https://nmap.org/nsedoc/scripts/targets-asn.htmlnmap --script targets-asn --script-args targets-asn.asn=17012 > netblocks.txtFinding netblocks using AS numbers — NSE script
7.Zone transfer is a type of DNS transaction where a DNS server passes a copy of full or part of it’s zone file to another DNS server. If zone transfers are not securely configured, anyone can initiate a zone transfer against a nameserver and get a copy of the zone file. By design, zone file contains a lot of information about the zone and the hosts that reside in the zone.
dig +multi AXFR @ns1.insecuredns.com insecuredns.comSuccessful zone transfer using DIG tool against a nameserver for a domain
8.Due to the way non-existent domains are handled in DNSSEC, it is possible to “walk” the DNSSEC zones and enumerate all the domains in that zone. You can learn more about this technique from here.
For DNSSEC zones that use NSEC records, zone walking can be performed using tools like ldns-walkldns-walk @ns1.insecuredns.com insecuredns.comZone walking DNSSEC zone with NSEC records
Some DNSSEC zones use NSEC3 records which uses hashed domain names to prevent attackers from gathering the plain text domain names. An attacker can collect all the sub-domain hashes and crack the hashes offlineTools like nsec3walker, nsec3map help us automate the collecting NSEC3 hashes and cracking the hashes. Once you install nsec3walker, you can use the following commands to enumerate sub-domains of NSEC3 protected zone# Collect NSEC3 hashes of a domain$ ./collect icann.org > icann.org.collect# Undo the hashing, expose the sub-domain information.
$ ./unhash < icann.org.collect > icann.org.unhash# Listing only the sub-domain part from the unhashed data
$ cat icann.org.unhash | grep "icann" | awk '{print $2;}'
del.icann.org.
access.icann.org.
charts.icann.org.
communications.icann.org.
fellowship.icann.org.
files.icann.org.
forms.icann.org.
mail.icann.org.
maintenance.icann.org.
new.icann.org.
public.icann.org.
research.icann.org.
9.There are projects that gather Internet wide scan data and make it available to researchers and the security community. The datasets published by this projects are a treasure trove of sub-domain information. Although finding sub-domains in this massive datasets is like finding a needle in the haystack, it is worth the effort.
Forward DNS dataset is published as part of Project Sonar. This data is created by extracting domain names from a number of sources and then sending an ANY query for each domain. The data format is a gzip-compressed JSON file. We can parse the dataset to find sub-domains for a given domain. The dataset is massive though(20+GB compressed, 300+GB uncompressed)# Command to parse & extract sub-domains for a given domaincurl -silent https://scans.io/data/rapid7/sonar.fdns_v2/20170417-fdns.json.gz | pigz -dc | grep “.icann.org” | jqEnumerating domains/subdomains using FDNS dataset
10.Content Security Policy (CSP) defines the Content-Security-Policy HTTP header, which allows you to create a whitelist of sources of trusted content, and instructs the browser to only execute or render resources from those sources. So basically, Content-Security-Policy header will list a bunch of sources (domains) that might be of interest to us as attackers. There are deprecated forms of CSP headers -X-Content-Security-Policy and X-Webkit-CSP . You can use following curl command to extract CSP headers or a given URL —
curl -s -I -L “https://www.newyorker.com/" | grep -Ei ‘^Content-Security-Policy:’ | sed “s/;/;\\n/g”There are several common locations to consider in order to identify frameworks or components:
HTTP headersCookiesHTML source codeSpecific files and foldersFile extensionsError messagesHTTP Headers
The most basic form of identifying a web framework is to look at the X-Powered-By field in the HTTP response header. Many tools can be used to fingerprint a target, the simplest one is netcat.
Consider the following HTTP Request-Response:
$ nc 127.0.0.1 80HEAD / HTTP/1.0HTTP/1.1 200 OK
Server: nginx/1.0.14
[...]
X-Powered-By: Mono
From the X-Powered-By field, we understand that the web application framework is likely to be Mono. However, although this approach is simple and quick, this methodology doesn’t work in 100% of cases. It is possible to easily disable X-Powered-By header by a proper configuration. There are also several techniques that allow a web site to obfuscate HTTP headers (see an example in the Remediation section). In the example above we can also note a specific version of nginx is being used to serve the content.
So in the same example the tester could either miss the X-Powered-By header or obtain an answer like the following:
HTTP/1.1 200 OKServer: nginx/1.0.14
Date: Sat, 07 Sep 2013 08:19:15 GMT
Content-Type: text/html;charset=ISO-8859-1
Connection: close
Vary: Accept-Encoding
X-Powered-By: Blood, sweat and tears
Sometimes there are more HTTP headers that point at a certain framework. In the following example, according to the information from HTTP request, one can see that X-Powered-By header contains PHP version. However, the X-Generator header points out the used framework is actually Swiftlet, which helps a penetration tester to expand their attack vectors. When performing fingerprinting, carefully inspect every HTTP header for such leaks.
HTTP/1.1 200 OKServer: nginx/1.4.1
Date: Sat, 07 Sep 2013 09:22:52 GMT
Content-Type: text/html
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.16-1~dotdeb.1
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Generator: Swiftlet
Cookies
Another similar and somewhat more reliable way to determine the current web framework are framework-specific cookies.
Consider the following HTTP-request:
Figure 4.1.8–7: Cakephp HTTP Request
The cookie CAKEPHP has automatically been set, which gives information about the framework being used. A list of common cookie names is presented in Cookies section. Limitations still exist in relying on this identification mechanism - it is possible to change the name of cookies. For example, for the selected CakePHP framework this could be done via the following configuration (excerpt from core.php):
/*** The name of CakePHP's session cookie.
*
* Note the guidelines for Session names states: "The session name references
* the session id in cookies and URLs. It should contain only alphanumeric
* characters."
* @link http://php.net/session_name
*/
Configure::write('Session.cookie', 'CAKEPHP');
However, these changes are less likely to be made than changes to the X-Powered-By header, so this approach to identification can be considered as more reliable.
HTML Source Code
This technique is based on finding certain patterns in the HTML page source code. Often one can find a lot of information which helps a tester to recognize a specific component. One of the common markers are HTML comments that directly lead to framework disclosure. More often certain framework-specific paths can be found, i.e. links to framework-specific CSS or JS folders. Finally, specific script variables might also point to a certain framework.
From the screenshot below one can easily learn the used framework and its version by the mentioned markers. The comment, specific paths and script variables can all help an attacker to quickly determine an instance of ZK framework.
Figure 4.1.8–2: ZK Framework HTML Source Sample
Frequently such information is positioned in the <head> section of HTTP responses, in <meta> tags, or at the end of the page. Nevertheless, entire responses should be analyzed since it can be useful for other purposes such as inspection of other useful comments and hidden fields. Sometimes, web developers do not care much about hiding information about the frameworks or components used. It is still possible to stumble upon something like this at the bottom of the page:
Figure 4.1.8–3: Banshee Bottom Page
There is another approach which greatly helps an attacker or tester to identify applications or components with high accuracy. Every web component has its own specific file and folder structure on the server. It has been noted that one can see the specific path from the HTML page source but sometimes they are not explicitly presented there and still reside on the server.
In order to uncover them a technique known as forced browsing or “dirbusting” is used. Dirbusting is brute forcing a target with known folder and filenames and monitoring HTTP-responses to enumerate server content. This information can be used both for finding default files and attacking them, and for fingerprinting the web application. Dirbusting can be done in several ways, the example below shows a successful dirbusting attack against a WordPress-powered target with the help of defined list and intruder functionality of Burp Suite.
Figure 4.1.8–4: Dirbusting with Burp
We can see that for some WordPress-specific folders (for instance, /wp-includes/, /wp-admin/ and /wp-content/) HTTP responses are 403 (Forbidden), 302 (Found, redirection to wp-login.php), and 200 (OK) respectively. This is a good indicator that the target is WordPress powered. The same way it is possible to dirbust different application plugin folders and their versions. In the screenshot below one can see a typical CHANGELOG file of a Drupal plugin, which provides information on the application being used and discloses a vulnerable plugin version.
Figure 4.1.8–5: Drupal Botcha Disclosure
Tip: before starting with dirbusting, check the robots.txt file first. Sometimes application specific folders and other sensitive information can be found there as well. An example of such a robots.txt file is presented on a screenshot below.
Figure 4.1.8–6: Robots Info Disclosure
Specific files and folders are different for each specific application. If the identified application or component is Open Source there may be value in setting up a temporary installation during penetration tests in order to gain a better understanding of what infrastructure or functionality is presented, and what files might be left on the server. However, several good file lists already exist; one good example is FuzzDB wordlists of predictable files/folders.
File Extensions
URLs may include file extensions, which can also help to identify the web platform or technology.
For example, the OWASP wiki used PHP:
https://wiki.owasp.org/index.php?title=Fingerprint_Web_Application_Framework&action=edit§ion=4Here are some common web file extensions and associated technologies:
.php – PHP.aspx – Microsoft ASP.NET.jsp – Java Server PagesError Messages
As can be seen in the following screenshot the listed file system path points to use of WordPress (wp-content). Also testers should be aware that WordPress is PHP based (functions.php).
Figure 4.1.8–7: WordPress Parse Error
FrameworkCookie nameZopezope3CakePHPcakephpKohanakohanasessionLaravellaravel_sessionphpBBphpbb3_WordPresswp-settings1C-BitrixBITRIX_AMPcmsAMPDjango CMSdjangoDotNetNukeDotNetNukeAnonymouse107e107_tzEPiServerEPiTrace, EPiServerGraffiti CMSgraffitibotHotaru CMShotaru_mobileImpressCMSICMSessionIndicoMAKACSESSIONInstantCMSInstantCMS[logdate]Kentico CMSCMSPreferredCultureMODxSN4[12symb]TYPO3fe_typo_userDynamicwebDynamicwebLEPTONlep[some_numeric_value]+sessionidWixDomain=.wix.comVIVVOVivvoSessionId
ApplicationKeywordWordPress<meta name="generator" content="WordPress 3.9.2" />phpBB<body id="phpbb"Mediawiki<meta name="generator" content="MediaWiki 1.21.9" />Joomla<meta name="generator" content="Joomla! - Open Source Content Management" />Drupal<meta name="Generator" content="Drupal 7 (http://drupal.org)" />DotNetNukeDNN Platform - [http://www.dnnsoftware.com](http://www.dnnsoftware.com)
General Markers
%framework_name%powered bybuilt uponrunningSpecific Markers
FrameworkKeywordAdobe ColdFusion<!-- START headerTags.cfmMicrosoft ASP.NET__VIEWSTATEZK<!-- ZKBusiness Catalyst<!-- BC_OBNW -->Indexhibitndxz-studio
A list of general and well-known tools is presented below. There are also a lot of other utilities, as well as framework-based fingerprinting tools.
Website: https://github.com/urbanadventurer/WhatWeb
Currently one of the best fingerprinting tools on the market. Included in a default Kali Linux build. Language: Ruby Matches for fingerprinting are made with:
Text strings (case sensitive)Regular expressionsGoogle Hack Database queries (limited set of keywords)MD5 hashesURL recognitionHTML tag patternsCustom ruby code for passive and aggressive operationsSample output is presented on a screenshot below:
Figure 4.1.8–8: Whatweb Output sample
Website: https://www.wappalyzer.com/
Wapplyzer is available in multiple usage models, the most popular of which is likely the Firefox/Chrome extensions. They work only on regular expression matching and doesn’t need anything other than the page to be loaded in browser. It works completely at the browser level and gives results in the form of icons. Although sometimes it has false positives, this is very handy to have notion of what technologies were used to construct a target website immediately after browsing a page.
Sample output of a plug-in is presented on a screenshot below.
Figure 4.1.8–9: Wappalyzer Output for OWASP Website
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically operated by search engines for the purpose of Web indexing
Spidering, is Process of string context of website and Process of traversing of website. A web crawler (also known as a web spider or web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. This process is called Web crawling or spidering.
DirBuster is a multi threaded java application designed to brute force directories and files names on web/application servers. … The list was generated from scratch, by crawling the Internet and collecting the directory and files that are actually used by developers!
dirbuster Usage Example
root@kali:~# dirbuster