BOOK THIS SPACE FOR AD
ARTICLE ADI’ve taken interest in the Open-Source Intelligence (OSINT) research space with my first article geared towards sources for “Tracking Malware and Ransomware Activity” and I have to say, I owe some further effort to share my expertise today.
Scope
I’ve chosen my own local business website as a test target for this exercise so if you take a browse at the demonstrated target, all information and description outlined are directly connected to the author of this article.
Justification
Why, you might ask? In the stretch of my own understanding, I can offer a few reasons as to how we can justify this research which are as outlined:
Business ResearchMarketing ResearchMapping contacts for an organizationFinding reasonable connections for agreements🚨 DISCLAIMER: The activities involved in the depth of your research are something for which you accept liability and responsibility, so be sure to check the terms of service of the organizations and websites, as well as the legal requirements for data handling in the countries where you conduct such research.
Active Footprinting
This method is primarily chosen to perform this research since I want to directly interact with the website in scope by crawling the webpages within the site and use pattern matching to extract the data based on our requirements.
If your local research environment already has Go installed, go ahead and also install Katana and Nuclei to follow through:
go install github.com/projectdiscovery/katana/cmd/katana@latestgo install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
Katana is used for crawling the website in scope to reveal all internal URL paths while Nuclei is used with a pattern matching template for revealing emails.
Using the email-extractor.yaml template for Nuclei, I started some initial exercise with some limitations so some additional methods are demonstrated after this attempt.
Saving the template to my local Linux environment, I proceeded with the following command:
wget https://raw.githubusercontent.com/projectdiscovery/nuclei-templates/refs/heads/main/http/miscellaneous/email-extractor.yamlCrawling and Harvesting (Katana + Nuclei)
katana -u site.com | nuclei -t email-extractor.yamlThe template identified two (2) emails from scraping my site but they were not emails belonging to me so I redacted them from the attached image. The emails seemed to belong to the developer for a plugin used within our site.
Maybe the anti-bot mechanisms is preventing the discovery of the other email attached to the site.
Harvesting Emails with JavaScript
The anti-bot mechanism can be circumvented to some extent by using in-browser JavaScript from the browser’s developer tools.
First Method — Harvesting from the Current Page
Using the same pattern matching mechanism from the email-extractor.yaml Nuclei template, I transferred this to JavaScript code while adding recognition for .dev emails as well.
Copying this to the browser’s developer tools while having the target site opened on the current tab reveals some results.
The only problem with this method is that it only scrapes the current webpage. This ignores the fact that emails may be present on other paths of the website in scope.
Second Method — Harvesting from the Crawled Website
This method is built to crawl the URL paths of the website in scope then harvest emails, thereby, capturing emails that may have been not accounted for in the first method.
The code response first reveals a pending Promise which is fulfilled and returns the emails discovered based on the pattern matching specified.
Remember to beware of any Data-Privacy or Data Handling laws and/or compliance requirements that may apply to the activities of performing such research to ensure the safety of yourself and the contacts involved.
Happy Researching!