A little bit on Content Discovery

A couple of thoughts on a phenomenon such as content discovery. Once again I came across such a problem in one of the projects and decided to write about it. The post will be useless to those who are aware of how a security audit is conducted. However, for me, as a software developer, this once became a revelation. In information security, content discovery refers to the technique of searching for hidden resources, mainly in web applications.

I first encountered this phenomenon several years ago, when friends contacted me and said that their clients began to disappear. It was a small company. They had a sales department and a small one-page website where potential customers would leave their phone numbers. Then the sales managers called back to the customer and made deals. So, from a certain point on, customers who came to the site were processed by competitors before the manager had even a chance to call them.

At first, employees were the main suspects. Then I decided to take a look at the source code of the site. It turned out that the URL /clients (I don’t remember exactly) without any authentication was available to anyone for a few hours, showing the list of customers, who submitted the request. This URL was being twitched by the CRM and the new customers were being added to its database.

At that time, it wasn’t clear to me how to locate this path on the site, because it did not appear anywhere – neither in sitemap.xml nor in robots.txt or anywhere else. I did not know anything about content discovery, so I gave it a thought and eventually dismissed it (the problem was already solved anyway). As it turned out, there are ways to search for such hidden resources. Let’s see what you can hide and how to find it.


Search of URLs

The simplest and most common way is to search for URLs. For this purpose, there are dedicated tools (dirs3arch, dirb) and dictionaries. In this example, we will use the dirs3arch utility for this.

dirs3arch usage example, directory bruteforce

I do not engage in audits professionally, but here is a small list of what I was able to find by searching (well, some things I also saw it in the projects I was a part of).

  • All kinds of “secret” admin panels
  • Backup files
  • Configuration files
  • Service resources – I have repeatedly observed various pages containing debugging information, home-made monitoring or even allowing you to perform some action – clear the queue, restart the task, etc. Once I even came across a whole scripting engine that allowed you to get to the insides of the application and run the code. Someone just forgot to turn it off.

In addition to easy access to resources, some dictionaries contain URLs specific to certain products. For example, if you are accessing .htaccess, you get 403, then maybe this server is spinning on Apache.

Search of subdomains


Let’s imagine that our company owns the “company.com” domain. You can have the main site on company.com and a bunch of subdomains, including non-public ones.

For example:

  • test.company.com – for testing
  • demo.company.com – for demo
  • stage.company.com – for stage and etc.

Such domains can be easily found (again, here are a few dictionaries), and you can sort through them using the tool dnscan.

dnscan usage example, subdomain bruteforce

Certificate transparency

If the system uses SSL certificates, then we can also check the subdomains to which these certificates were issued, if the certification centre supports the technology certificate transparency. To find out which subdomains the certificates were issued for, we can use the utility ctfr:

ctrf usage example, subdomain enumeration


Another way to find domains is to ask a DNS server about it. In short – DNS servers can replicate their content. The replication or transfer procedure exists just for that. To obtain data for the replica, you need to send the DNS server an AXFR request, and if the server is configured incorrectly, we will get a full list of domains. This trick may work if the company uses its name servers.

dig -t AXFR domain.com @ns.domain.com


Search for hidden query parameters

Sometimes it is convenient during development to turn off the extra checks for debugging or similar tasks. This disabling can be an optional parameter of the HTTP request, hidden in cookies, headers, etc.

However, such things continue to live in the production environment. As a tool, you can take Burp Intruder. A couple of examples:

  • Hidden parameter – you can pass it as a form parameter, a JSON body field, etc.
    if(request.param(isAdmin) == true){   
    // let's execute something unsafe     
  • Undocumented API methods – if we have the POST method https://service/api/update/42, then why not try to put delete instead of update, even if it is not explicitly stated in the documentation
  • etc.

The Uncatchable Joe

Let’s assume, that we’re deploying a certain service to a known IP address (accessible from the Internet) and a non-standard port. It’s only accessible via the IP address, no bound domains exist. It seems that everything is in order, no one knows about this service, so no one will come. We act on the principle of the Uncatchable Joe – he’s out there, but no one has ever seen him – because no one cares to look for him. No such luck. On the internet, there are various services such as Shodan, the sole purpose of which is to scan the entire Internet and collect banners and other useful information. Do an experiment – take the usual Apache, put it on the Internet like a sitting duck and you will find a lot of interesting things in the access-logs.

A story from my personal experience. One of my colleagues deployed a Linux server for some personal needs, transferred the ssh port to some non-standard one, and left the password entry without ssh keys. After a while, he discovered on his server additional software in the form of a cryptocurrency miner. Anyway, you’ve probably heard stories of discovering on the Internet huge, non-password-protected databases with personal data of Facebook users and other good stuff.

Google Dorks

If your web resource was improperly configured and it was visited by the search engine robots, then there is a chance that something has gotten into the index that does not belong there. You can go to Google and search using special search syntax. You can find some truly fascinating things.

To conclude:

  • If some hosts should not be externally visible – make sure that they are externally inaccessible.
  • If you have service resources on a web server, close them with authorization or place them in the admin panel.
  • Do not create hidden parameters, cookies, etc.
  • In addition to the content discovery methods listed above, there are other ways to search for hidden resources, so don’t even try to hide – if they want it they’ll find you.

Leave a Reply

Your email address will not be published. Required fields are marked *