BOOK THIS SPACE FOR AD
ARTICLE ADIn my previous post Reconnaissance A Google-Dorking Affair I outline the basics of Google Dorking and the value this can return from an operational security, bug bounty or penetration testing perspective. Today I am going to expand on dorking by diving into the world of GitDorks!
GitDorks operate the same as Google Dorks, but instead of leveraging Google keywords for searching we are going to use Github qualifiers instead. As GitHub does a fantastic job of indexing the repositories, projects and codes on its platform with a structure query model we can uncover new endpoints, potentially disclosed secrets and much more!
Note: Whilst Git Dorking itself is legal, the abuse of information, applications or systems identified without consent of the target is not.
Very rarely do you want to be searching for all instances of the word “Password” on GitHub, instead you may have a particular organisation you are researching or repository you are testing in which case understanding the GitHub qualifiers available and when to use them will help narrow your focus to the targets of value.
GitHub Help Documentation has more details, including the full list of qualifiers, I am going to explore some of the more interesting ones here:
Organisation, User or Repository
The organisation (org), user (user) or repository (repo) qualifiers are the most effective way of focusing your search to your approved scope. Each qualifier is used by qualifier name, a separator, then the search term for example
org:githubrepo:github/docsuser:octocatNone of these support wildcard or regular expression matching, therefore you need to include your full title. By using more than one in a search GitHub will automatically apply the AND logic, or we can call OR if we prefer
org:github repo:github/docs user:octocatorg:github repo:github/docs OR user:octocatWill both return different results
Language
It is not uncommon for you to want to search for a specific language in GitHub, a full list of their supported languages can be found in their languages.yml document.
To use the language qualifier for filtering you can use
language:rubyNow there may be common languages you are interested in and they can be grouped with brackets such as
(language:ruby OR language:csharp)When brackets are used the OR is only applied to the items within the brackets, not the entire search.
Path
Path (path) can now be used as a replacement for the depreciated Filename qualifier which you may still see reference to in some tools or scripts. Path can be used with Regular Expressions (RegEx) to allow you to check for filenames, searched interesting directories or provide an absolute path to search.
A basic path search, which will return everything with “database” in the path name would be
path:databasesThis will return both software/databases/mysql and software/setup/databases.yml as both contain the string databases.
If we want to be more specific with our path searches we can use regex to enable us to filter include or exclude given filenames or extensions
path:/(^|\/)setup\.php$/Will return all files called setup.php. Whilst the command may look confusing it can be broken down as:
path:/ — Path Qualifier then a forward slash to start the regex(^|\/) — Result starting with a / or the start of a line (^)setup\.php — Filename setup.php, with the . being escaped with a \$/ — End of stringPath can be further expanded to include the wildcard (*) symbol to match letters but not special characters. For example
path:*.txt — Returns 102 million results, any .txt filepath:/src/*.txt — Returns 119 thousand as only those found in the src directory.Now that we have the building blocks of our queries we can start to formulate searches which may return high value findings. My example here is going to focus very specifically on Google API Keys however the process would be the same for any target.
Firstly, we need to understand the structure we are looking for and a quick search for “Google API Key” returns the following structure
AIza[0-9A-Za-z-_]{35}Which means the key starts with the letter AIza then 35 characters of Letters, Numbers, hyphens or underscores. We can also assume that the word Google should be within the file as well, lets start with a basic search:
(AIza AND Google) to return files with the AIza and Google in them —This returned over 1 million results, but as you can see from the screenshot it did find API keys. However, we probably want to refine what we are looking for to reduce false positives.
Next step would be to include common keywords that are linked to API keys, these includes terms such as access_token, api_key etc.
(access_key OR secret_key OR access_token OR api_key OR apikey OR api_secret OR apiSecret OR app_secret OR application_key OR app_key OR appkey OR auth_token OR authsecret) AND (AIza AND Google)This reduces our results to 492,000 from the original 1 million plus we had. Now to refine this further we can look for specific files which may hold keys, rather than functions which might not:
(path:*.xml OR path:*.json OR path:*.properties OR path:*.sql OR path:*.txt OR path:*.log OR path:*.tmp OR path:*.backup OR path:*.bak OR path:*.enc OR path:*.yml OR path:*.yaml OR path:*.toml OR path:*.ini OR path:*.config OR path:*.conf OR path:*.cfg OR path:*.env OR path:*.envrc OR path:*.prod OR path:*.secret OR path:*.private OR path:*.key)By adding the common file extensions we return just over 149,000 results,
Now to this point we have not been using an Organisation, Repository or User filter and instead we are searching all of GitHub. If we were doing this as part of an assessment on a company you would want to add the org: qualifier to the search to further refine your results. For example adding the org:teslamotors to the search returned 0 results (they don’t have Google API keys on Github)
Option 2 Regex
As we know the structure of the GoogleAPI key we can use GitHub RegEx to find matching strings, the limitation here is RegEx is only supported in Code results and not other results types.
/AIza[0-9A-Za-z-_]{35}/This returns 973,000 results and adding Google as an additional term returns 877,000.
It is then down to you as the researcher to determine why there is such a difference, which is the most accurate search, or if you could combine them both.
This article has covered the basis of GitHub qualifiers, how to identify and build search terms but if you are trying to GitDork on scale then using automation really can help. There are a couple of projects I would like to highlight for this
GitDorker (GitHub Link) — This tool allows for you to run a list of defined Dorks against either the whole of GitHub or a specific organisation. You can find the example Dorks in /Dorks and I suggest fully reviewing these to understand how loose some of the searches really are. (Note: You will need to update the python code to limit to 8 requests per minute for the personal access token from the tools original 30)RegEx Tokens (GitHub Link) — This project provides RegEx examples for different tokens from major providers such as Google, Facebook, Stripe and many more.Bug Bounty Wordlists (GitHub Link) — A list of 533 potential GitDorks to use. Note some of these include the Filename qualifier which is no longer used.This introduction to GitDorking was written with the intention of openning up a new world of reconnaissance activities, outlining the steps in which you can build your own dorks and to also share some of the platforms available for automating these searches.
The manual vs automated approach is a decision you will need to make for yourself, personally I spend time manually building high quality dorks then use the automated tools to perform wide scale searching, some prefer default lists, some prefer the manual approach but as with everything in this field the choice is yours.
— — — — — — — — — — — — — — — — — — — — — — -
If you found this post informative, useful or interesting and wish to support feel free to buy me a coffee on the link below