Wayback Machine — A way forward in finding bugs

4 years ago 247
BOOK THIS SPACE FOR AD
ARTICLE AD

Vuk Ivanovic

Image for post

Image for post

You must have heard of those bug bounties where the person found leaks inside github repos and such. And, those are great reads, and of course far harder to pull off in the real world. Well, for some of us, at least. But, there are always more ways to get to some interesting endpoints, including leaked data like api tokens that may still be valid among others.

The idea about using the Wayback machine to look for old interesting files is not a new one. And there is a tool that does it fairly well: https://github.com/tomnomnom/waybackurls

Now, while the tool above, and probably many other similar ones provide you with a list of links that wayback machine has, I needed a way to quickly verify that those links are actually working. And, with a list of 5, maybe even 10 links, it was easy to do it manually. But, with a larger list it gets complicated. Then there’s also the question of what if the links are working, but (even though they have different page names) they all redirect to the main page or some custom 404, what to do then?

Enter: waywayback and its companion waywayback-ffuf, I did the naming per what sounded fun at the time, and wrote the script per my needs, and it’s definitely ugly, but it’s mine :)

Also, it’s not that complicated, so feel free to edit it (or should I say improve it) for your particular needs, I guess merging them into one script is one of the options :)

You will need ffuf installed in order to fully utilize the scripts: https://github.com/ffuf/ffuf

And, in order to be able to run it inside any directory, for better organizing, make a symbolic link to waywayback and waywayback-ffuf, otherwise I imagine you’ll have to specify various paths inside the scripts.

Script waywayback:

#!/bin/bashDATED=`date +’%m-%d-%Y’`;
prefix=$(cut -f1 -d’:’ <<< “$1”)
save=$(cut -f3 -d’/’ <<< “$1”)
mkdir $prefix-$save
cd $prefix-$save
curl “http://web.archive.org/cdx/search/cdx?url=$save*&fl=original&collapse=digest" | sort -u > $prefix-$save[$DATED].txt
sed -i “s/^http.*\/$//” $prefix-$save[$DATED].txt
grep -vE “robots.txt$|robots.txt\/$|.gif$|.jpg$|.jpeg$|.css$|.tiff$|.ico$|.png$|.svg$|.ttf$|.woff$|.woff2$|.eot$” $prefix-$save[$DATED].txt > interesting-$prefix-$save.txt
sort -u interesting-$prefix-$save.txt > interesting-sort-$prefix-$save.txt
wc -l interesting-sort-$prefix-$save.txt
read -p “Press enter to continue unless if there are too many lines”
waywayback-ffuf interesting-sort-$prefix-$save.txt
rm interesting-$prefix-$save.txt

Script waywayback-ffuf:

#!/bin/bash
cut -d’/’ -f4- $1 > $1-wordlist.txt
prefix=$(cut -f4 -d’-’ <<< “$1”)
save0=$(sed s/interesting-sort-//g <<< “$1”)
save1=$(sed s/http//g <<< “$save0”)
save2=$(sed s/^s-//1 <<< “$save1”)
save=$(sed s/.txt//g <<< “$save2”)
ffuf -mc 200,204,301,302,307,401,403,405 -c -w $1-wordlist.txt -H “X-Forwarded-For: 127.0.0.1” -u “https://$save/FUZZ" -v -o [ffwayback-$prefix-$save].csv -of csv

usage: waywayback https://x.target.com

It will show how many lines/words are in the newly created wordlist, and the idea, for me at least, is that I don’t want to deal with anything above certain number, but it depends on the target in question and if I’m in the mood, etc.

then once it’s done (assuming that you decided to go with ffuf-ing through the wordlist):

- cd target

- cut -d’,’ -f2,5,6 *csv | grep -E “,200,|,405,|,302,” | more

you get the point.

The basic idea behind those scripts:

-use curl to interact with wayback machine api and retrieve the data, while using sed and grep to create a list of hopefully interesting endpoints and to put them in a wordlist for ffuf

-use wc -l to count the amount of found links and prompt you if you want to perform the ffuf on the target with the newly created wordlist (while ffuf has no issues with 100K links, and even going to 500K can be acceptable, sometimes too much means something is not right with the target website, ie. False positives or similar).

-use ffuf with the newly created wordlist to see what were false positives, what was interesting/boring, and addition of -H “X-Forwarded-For: 127.0.0.1” is because sometimes it can bypass a waf.

Some tips to quickly go through csv results:

cut -d’,’ -f2,5,6 *csv | grep “,200,” | more

The above is a good start. It shows just columns pertaining to a url, status code, and the size of the page. If you see different urls being of the same size, then just add | grep -v “,1234” where 1234 is the size in question, and if you find some other different size being repeated too many times for comfort then add: grep -vE “,1234|,4321” etc.

Interesting things to look for beyond obviously curious endpoints, think about api tokens/keys. While it is true that in some cases, well, in most cases, they tend to be expired and/or to belong to a random low level user, in some cases they may still be up and running. And, they may belong to an admin, staff, and other company personnel. You get the picture of the potential there.

Read Entire Article