A python code that scraps for my password and secrets on GitHub.

1 year ago 107
BOOK THIS SPACE FOR AD
ARTICLE AD
cyber security tools coding. Feel Happy

Today am showing you how use can make your own github scrapper in python.

It is generally not a good idea to store passwords or secrets in a publicly accessible place such as a GitHub repository, as anyone with access to the repository would be able to view them. Instead, it is better to use a secure password manager to store and manage your passwords, or to use a secure storage solution such as a password-protected encrypted file.

If you still want to scrape your own repository for passwords or secrets, you can use the GitHub API to access the repository and search through its files. Here is an example of how you could do this using the Python library “PyGithub”:

import github

# Connect to GitHub using an access token
gh = github.Github("YOUR_ACCESS_TOKEN")

# Get the repository
repo = gh.get_repo("USERNAME/REPO_NAME")

# Search through the repository's files for the term "password"
results = repo.search_code("password")

# Print the filenames and file contents of the search results
for result in results:
print(result.filename)
print(result.text)

Keep in mind that this will only search through the file contents, not the file names. You may want to modify the search query or add additional code to search through the file names as well.

To access a repository on GitHub without using an access token, you can use anonymous access. However, this will only allow you to read the repository’s public content, and you will not be able to make any changes or access any private content.

Here is an example of how you could use the PyGithub library to search for the term “password” in the public files of a repository:

import github

# Connect to GitHub anonymously
gh = github.Github()

# Get the repository
repo = gh.get_repo("USERNAME/REPO_NAME")

# Search through the repository's public files for the term "password"
results = repo.search_code("password")

# Print the filenames and file contents of the search results
for result in results:
print(result.filename)
print(result.text)

Keep in mind that this will only search through the file contents, not the file names. You may want to modify the search query or add additional code to search through the file names as well.

To search through multiple repositories on GitHub for specific keywords, you can use the GitHub Search API. This API allows you to search through the public content of all GitHub repositories, as long as you respect their rate limits.

Here is an example of how you could use the GitHub Search API and the Python requests library to search for the term “password” in the public files of all GitHub repositories:

import requests

# Set the search query and the access token
query = "password"
token = "YOUR_ACCESS_TOKEN"

# Set the headers for the request
headers = {
"Accept": "application/vnd.github+json",
"Authorization": f"token {token}",
}

# Set the parameters for the request
params = {
"q": query,
"type": "code",
}

# Send the request
response = requests.get("https://api.github.com/search/code", headers=headers, params=params)

# Get the search results
results = response.json()["items"]

# Print the repository names and file paths of the search results
for result in results:
print(result["repository"]["full_name"])
print(result["path"])

This will search through the file contents of all public repositories for the term “password”. You can modify the search query to search for different terms or use the other parameters of the Search API to narrow down the results but keep in mind that the Search API has a rate limit of 10 requests per minute for authenticated requests and 3 requests per minute for anonymous requests. You may need to add rate limiting to your code to avoid going over the limit.

Now:

To make the previous code into a command-line application that supports arguments and rate limiting, you can use the Python library “argparse” to parse the arguments and the library “time” to add a delay between requests to stay within the rate limits of the GitHub Search API.

why?

Some website can easily block traffic if many requests are made at a time so that is why i chose to add rate limiting in my script.

Here is an example of how you could do this:

import argparse
import requests
import time

# Set the rate limit for the Search API
RATE_LIMIT = 3 # requests per minute for anonymous requests
DELAY = 60 / RATE_LIMIT # delay between requests in seconds

# Parse the command-line arguments
parser = argparse.ArgumentParser(description="Search GitHub repositories for a specific term")
parser.add_argument("term", type=str, help="the search term")
parser.add_argument("-t", "--token", type=str, help="a GitHub access token")
args = parser.parse_args()

# Set the search query and the access token
query = args.term
token = args.token

# Set the headers for the request
headers = {
"Accept": "application/vnd.github+json",
}
if token:
headers["Authorization"] = f"token {token}"

# Set the parameters for the request
params = {
"q": query,
"type": "code",
}

# Initialize the page number and the search results
page = 1
results = []

# Keep sending requests until there are no more search results
while True:
# Set the page parameter for the request
params["page"] = page

# Send the request
response = requests.get("https://api.github.com/search/code", headers=headers, params=params)

# Get the current search results and the total number of search results
current_results = response.json()["items"]
total_results = response.json()["total_count"]

# Add the current search results to the total search results
results += current_results

# Print the repository names and file paths of the current search results
for result in current_results:
print(result["repository"]["full_name"])
print(result["path"])

# Break the loop if there are no more search results
if len(results) >= total_results:
break

# Wait before sending the next request to stay within the rate limit
time.sleep(DELAY)

# Increment the page number
page += 1

Follow me here for more.

Check the full source code on github: https://github.com/WarrenMu/Gold-digger. follow me on github:

Read Entire Article