Mastering/understanding the Crawler Indexing Before Osint

1 year ago 59
BOOK THIS SPACE FOR AD
ARTICLE AD

Hii ! It’s me Jerry1319 ( Md hasan ) . After a long time again i am going to write my write-up/article on the topic of Osint analysis ( via Dorking ) .

Let’s jump to the target directly,

While browsing the internet you guys have seen multiple confidential information ( pdf, doc,docx,xlsx,xls) files containing confidential data leaking publically i.e :- Adhar card leak on Google search , government site data leak with simple Google search , payment invoices leakage and so on.

- Google

The main reason behind these all are unplanned indexing rules.

What Dorking is and how the data is shown when we search ( Indexing)?

Dorking is the advance search query method to gain and get the information which are hidden or hard to find via a normal search query.

Indexing is a method or a way in which a particular crawler crawl the web and send it to the server for showing the result contains via a query search when an user search for a query according to that.

2. How Indexing works and its drawbacks when misconfigured ?

Indexing of the contents of the website is down via the rules setuped in the robots.txt file of the website for crawler to provide them data that what to index and what to not index . And if the the rules are not setup well then crawler will crawl the whole site even the endpoints which contains sensitive information's regarding to customers , company , pII data and so on , and at this point when this case scenario happens then the case of information disclosure and data leak happens and lead to heavy damage to the reputation of the company and it’s users too

3. How to setup a well configured robots.txt to prevent sensitive data indexing?

First of all create the robots.txt file then add/declare the rules and regulation of crawling to that what is allowed or what is not allowed, i.e :- User-agent , Allow , Disallow , sitemap etc clearly and if you want a specific crawler to crawler only the site then you can add the User-agent of the Crawler too.

- Google ( Robots.txt)

4. How to remove if a crawler still crawls the sensitive data/ endpoints?

After doing this all if a crawler still crawls/ or already due to misconfig indexing its crawled and indexes the sensitive data then, You can request a notice for the removal of the data from there server / you can send them DMCA Notice too.

Note :- Mostly data leaks/ results of dork leak are caused by misconfiged indexing and its also harm and degrade the Company/ Organization Reputation too.

That’s all for today ! Thank you for reading and supporting.

Read Entire Article