Adem's Dev Journey

📜 RepoList - A tool to generate wordlists based on GitHub repositories

25 Nov 2023 | 4 mins read

🔊 PLAY THIS ARTICLE

📜 RepoList: A tool to generate wordlists based on GitHub repositories

Hello everyone, I am back with another tool. This time it is a tool to generate wordlists based on GitHub repositories. I have named it Repolist

It is a simple tool written in Python. The code is available on GitHub and the package is available on PyPI.

The story behind Repolist

I was working on pentesting a website. I was trying to bruteforce the directories and files on the website. Using the common wordlists from seclists didn’t help much. So I thought of creating a custom wordlist.

I know for a fact that the website is using an open source e-commerce platform called PrestaShop for its backend. So I thought of creating a wordlist based on the files and directories of PrestaShop.

I didn’t want to manually copy the files and directories. So I thought of creating a tool that would do it for me.

I’m sure there are other tools that do the same thing. But I wanted to create my own tool just for fun. Python is not my primary language for development. So I thought it would be a good opportunity to use Python for this project.

What is Repolist?

Repolist is a tool that generates wordlists based on GitHub repositories. It uses GitHub API to fetch the files and directories of a repository. It then saves the files and directories in a text file.

Repolist

To use Repolist, just run the following command:

pip3 install repolist

To generate a wordlist, run the following command:

repolist -u "https://github.com/username/repository"

Options

Arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     Github repository URL (required)
  -o OUTPUT, --output OUTPUT
                        Output file (optional)
  -b BRANCH, --branch BRANCH
                        Use a specific branch (optional)
  -t TOKEN, --token TOKEN
                        Github token (optional)
  -p PREFIX, --prefix PREFIX
                        Prefix (optional)
  -s SUFFIX, --suffix SUFFIX
                        Suffix (optional)
  -f, --files           Get only files (optional)
  -d, --directories     Get only directories (optional)
  -v, --verbose         Verbose mode (optional)
  --proxy PROXY         Proxy (optional)

Combining Repolist with other tools

Using RepoList with tools like ffuf, httpx and gobuster can be very useful for penetration testing and bug bounty programs.

For example, you can use ffuf to bruteforce the files and directories of a website using the wordlist generated by Repolist.

repolist -u "https://github.com/WordPress/WordPress" | ffuf -u "http://example.com/FUZZ" -w -

If you have other tools in mind, please let me know in the comments below.

How I made Repolist?

I’ve used Python with Poetry to create Repolist. Poetry is fairly new to me and It was a great experience using it. Easy setup and dependency management. With few commands, I was able to create the project and publish it to PyPI. I will definitely use it for my future projects.

Argparse is used to parse the command line arguments. Requests is used to make the HTTP requests to GitHub API.

The code behind Repolist

The code is fairly simple. It uses the GitHub API to fetch the files and directories of a repository. It then saves the files and directories in a text file.

Here is a small snippet of how it works:

    def _get_files_and_directories(self, username="", repo="", branch="main"):
        """
        Get files and directories from a repository (recursive)
        https://docs.github.com/en/rest/reference/git#trees
        """
        url = "https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(
            username, repo, branch)
        r = self._make_request(url) # add headers if token is specified
        if r.status_code == 200:
            for file in r.json()["tree"]:
                self.repo_content.append({
                    "path": file["path"],
                    "type": file["type"]
                })
        else:
            self._log_error(type=r.status_code, msg=r.text)
            exit(1)

Using Poetry to publish to PyPI

Poetry makes it very easy to build and publish the package to PyPI. Those who are new to Poetry, here is how you can do it:

poetry new repolist
poetry build
poetry install
poetry publish

You can read more about it here.

Rate limit and proxies

Github API has a rate limit. So I have added an option to specify proxies and tokens. You can also specify a specific branch to get the files and directories.

Conclusion

If you read this far, thank you for reading. I hope you find RepoList useful. If you have any suggestions or feedback, please let me know in the comments below.