Every web application penetration test should begin with an environment survey and a thorough mapping of the tested application. During mapping, you should reveal not only all the resources that the application links to or which the application retrieves, but also resources that are placed in a publicly accessible part of the website, but no link points to them. Such files are, for example, web administration or backups.
Probably each of you has already encountered dirb or dirbuster tools in your practice. The happier of you came across the dirsearch tool. All these tools have one thing in common. They are used to map web applications and to reveal resources of these applications, which are stored in a publicly accessible part of the website, but no hyperlinks lead to them. These tools are therefore used in practice, for example, to search for configuration files, log files, files that are intended for inclusion, or backup files.
PenterepTools – ptwebdiscover
We would like to introduce to you our ptwebdiscover tool, which, like the other tools mentioned above, is used to reveal web application resources. However, unlike others, ptwebdiscover has many features that you would look in vain for in other tools. So let’s take a closer look at the individual features that this tool offers.
Searching for resources using a dictionary
One of the basic features of ptwebdiscover is searching for resources using dictionaries. You can select the desired dictionary with the -w, --wordlist switch.
ptwebdiscover -u https://www.penterepmail.loc -w /usr/share/wordlists/????You can further modify individual terms from the dictionary using the -ch, --charset switch, which can be used to select the modification of dictionary expressions to lowercase, uppercase, capitalize, or hybrid. You can use more than one option at a time.
ptwebdiscover -u https://www.penterepmail.loc -w /usr/share/wordlists/???? -ch lowercase uppercase capitalizeThe hybrid option takes unique values from the dictionary (e.g., readme.txt) and tests the following case options: readme.txt, Readme.txt, README.txt, and README.TXT.
If you know what characters the resource you are looking for begins with, you can use the -bw, --begin-with switch to filter the dictionary to include only expressions that begin with a specific sequence of characters.
ptwebdiscover -u https://www.penterepmail.loc -w /usr/share/wordlists/???? -bw adUsing the -lm, --length-min and -lx, --length-max switches, it is possible to further filter from the dictionary only those expressions whose length corresponds to the specified entry.
Searching for resources by brute force
If you would like to search for resources by brute force, just omit the -w switch.
ptwebdiscover -u https://www.penterepmail.locYou can select the character set to be used with the -ch, --charset switch. You can choose from the predefined sets lowercase, uppercase a numbers. You can also define your own character set [abCD15_~].
ptwebdiscover -u https://www.penterepmail.loc -ch lowercase uppercase [15_]You can control the length of the string to be created with the -lm, --length-min and -lx, --length-max switches.
ptwebdiscover -u https://www.penterepmail.loc -lm 2 -lx 5Working with file extensions
Whether you search using a dictionary or by brute force, you will definitely benefit from adding suffixes to the tested string. The tool ptwebdiscover has two switches for this purpose. The first of them, -e, --extensions allows you to manually define the extensions you want to use. If you want to search for terms not only with the extensions you enter, but also without them, enter an empty string between extensions. The second switch, -ef, --extension-file allows you to load a list of extensions from a file.
ptwebdiscover -u https://www.penterepmail.loc -e ““ php php.bak bak logptwebdiscover -u https://www.penterepmail.loc -ef extensions.txtFiltering of findings
Server responses can be filtered either by the returned status code or by the returned content. By default, the tool considers all server responses returning a status code other than 404 as found resources. If you would like to ignore also other returned status codes, you can use the - sc, --status-codes switch.
ptwebdiscover -u https://www.penterepmail.loc -sc 302 400 401 404Conversely, if you want the tool to consider only resources that return specific status codes as existing resources, you can define these codes using the -sc, --status-codes switch.
ptwebdiscover -u https://www.penterepmail.loc -sc 200To filter results based on content, the -sy, --string-in-response and -sn, --string-not-in-response , or -shy, --string-in-response-header , and -shn, --string-not-in-response-header switches can be used for searching the content of response headers.
ptwebdiscover -u https://www.penterepmail.loc -sy contentptwebdiscover -u https://www.penterepmail.loc -sn “page not found“Using the -tm, --time-min and -tx, --time-max switches, it is also possible to influence the results only to those that are returned by a response from the server within the specified time range.
ptwebdiscover -u https://www.penterepmail.loc -tx 200The -clm, --content-length-min , and -clx, --content-length-max switches can also be useful, they can be used to set the minimum and maximum size of the returned content.
ptwebdiscover -u https://www.penterepmail.loc -clx 50000Recursive nesting
If the tool finds a directory, it can also nest recursively into it and it can repeat the entire search for the content in that directory. In case of recursive nesting, all found directories are browsed. Recursive directory browsing can be turned on with the -r, – recurse option.
ptwebdiscover -u https://www.penterepmail.loc -lx 4 -rSearching for resources by content parsing
Resources can be detected not only by brute force or using a dictionary, but also by parsing content by finding links and external resources in HTML documents, files such as sitemap.xml, robots.txt, etc. The ptwebdiscover tool allows to parse all detected content using the -P, --parse switch.
ptwebdiscover -u https://www.penterepmail.loc -PIf you want to reveal all URL addresses located only in the source that is specified in the -u parameter, omitting brute force and a dictionary, you can do so with the -Po, --parse-only switch.
ptwebdiscover -u https://www.penterepmail.loc -PoThe tool also allows to perform Web crawling (Web spidering) of the entire website. All you have to do is perform also recursive nesting along with content parsing.
ptwebdiscover -u https://www.penterepmail.loc -Po -rContent retrieving
If you are not parsing the content, ptwebdiscover sends HTTP requests using the HEAD method. If necessary, you can change this method with the -m, -- method switch.
ptwebdiscover -u https://www.penterepmail.loc -m GETYou can also store the content returned by the server to create a local copy of all revealed resources. You can use the -S, --save switch to do this, specifying the destination directory in which you want to save the downloaded content.
For example, to crawl the entire website, including storing locally the content, you can do the following:
ptwebdiscover -u https://www.penterepmail.loc -Po -r -S ~/localcopySmart backup search
The ptwebdiscover tool also provides a smart backup search feature. It works by automatically finding a file with the same name for each resource found, but with the suffix common to backup files. For example, if the tool detects the index.php file, it will try to find the files:
index.bak, index.old, index.zal, index.php_, index.php~, index.zip, index.7z, index.rar, index.tar, index.tar.gz, index.tgz, index.php.bak, index.php.old, index.php.zal, index.php.zip, index.7z, index.php.rar, index.php.tar, index.php.tar.gz and index.php.tgz
Next, ptwebdiscover will try to detect backups of the entire website and the database. If the tested domain was www.penterepmail.loc, the tool would try to detect the following files in the root directory of the site:
www.penterepmail.loc.bak, www.penterepmail.loc.old, www.penterepmail.loc.zal, www.penterepmail.loc.zip, www.penterepmail.loc.rar, www.penterepmail.loc.tar, www.penterepmail.loc.7z, www.penterepmail.loc.tgz, www.penterepmail.loc.tar.gz, www.penterepmail.loc.sql, www.penterepmail.loc.db, penterepmail.loc.bak, penterepmail.loc.old, penterepmail.loc.zal, penterepmail.loc.zip, penterepmail.loc.7z, penterepmail.loc.rar, penterepmail.loc.tar, penterepmail.loc.tgz, penterepmail.loc.tar.gz, penterepmail.loc.sql, penterepmail.loc.db, penterepmail.bak, penterepmail.old, penterepmail.zal, penterepmail.zip, penterepmail.7z, penterepmail.rar, penterepmail.tar, penterepmail.tgz, penterepmail.tar.gz, penterepmail.sql, penterepmail.db
The same is repeated when replacing the dot in the name by characters “_” and “-” and also when deleting this delimiter, for example
wwwpenterepmailloc.bak, www_penterepmail_loc.bak, www-penterepmail-loc.bak, etc.
To turn on the smart backup search, use the -b, --backups switch.
ptwebdiscover -u https://www.penterepmail.loc -Po -r -bIf you only want to search for backups of the entire website without any further search for content by brute force or using a dictionary, use the -bo, --backups-only switch.
ptwebdiscover -u https://www.penterepmail.loc -bo
PenterepTools – ptwebdiscover searching for backups
Prefixes, suffixes, subdomain search and parameter fuzzing
The moment you know that the resource you are looking for begins with a specific prefix or ends with a certain suffix, the -up, --prefix and -us, --sufix switches can definitely be useful. The following example will search for files: access_aaa_2021.log, access_aab_2021.log, access_aac_2021.log, …
ptwebdiscover -u https://www.penterepmail.loc -lm 3 -up access_ -us _2021 -e logSometimes it can be also useful to search for resources elsewhere than just in a specific path. In this case, you can use an asterisk (*) where you want to insert the tested string. You can use this method, for example, when the application accesses resources by retrieving them from a location that is not in a publicly accessible part of the website, or when you are looking for a specific file in an unknown directory, or when searching nice URL addresses.
ptwebdiscover -u https://www.penterepmail.loc/show?file=*ptwebdiscover -u https://www.penterepmail.loc/*/showYou will also find the asterisk character useful when you use the ptiistilde tool to find short file names in the 8.3 format on IIS servers, such as admini~1.php. If you want to use brute force to detect the missing part of the file name, you can use the following command.
ptwebdiscover -u https://www.penterepmail.loc/admini*.phpIf you would like to use a dictionary for the same task, you can try, for example, the following command which only selects such values from the dictionary that begin with the string “admini”.
ptwebdiscover -u https://www.penterepmail.loc/ -w dictionary.txt -bw admini -e phpIt is definitely worth mentioning the possibility to search for existing subdomains. In this case, just place the asterisk at the correct place in the URL.
ptwebdiscover -u https://*.penterepmail.loc
PenterepTools – ptwebdiscover checking domain prefixes and suffixes
In this case, the subdomain on the DNS server is translated and you will only find out about subdomains that have a valid DNS record. However, during testing, it is often necessary to also verify the occurrence of such subdomains that are located on the tested server, but do not have a valid DNS record. You will commonly encounter this, for example, with development versions of the application, which are located on subdomains dev, devel, development, test, which appear to be non-existent, but developers are able to access them by inserting the corresponding entry into the hosts file.
If you want to use ptwebdiscover to detect these subdomains as well, you’ll need to add the -tg, --target switch to define which target your requests will be directed to. In this case, the domain specified in the -u parameter, which contains an asterisk at the place of the subdomain, will only be used as the content of the HTTP request Host header.
ptwebdiscover -u https://*.penterepmail.loc -tg http://www.penterepmail.locAs the server can return the content of a default web page for non-existent subdomains, you may also need to add the -sn, --string-not-in-response switch, which specifies a string that must not be in the server’s response.
ptwebdiscover -u https://*.penterepmail.loc -tg http://www.penterepmail.loc -sn WelcomeOutput formatting and saving findings to a file
You can customize the ptwebdiscover output to suit your needs. If you prefer a tree structure view that looks like the tree tool output, then you can use the -tr, --tree switch.
ptwebdiscover -u https://www.penterepmail.loc -Po -r -tr
PenterepTools – ptwebdiscover discovered file hierarchy
You can also decide whether you want to list in findings only the paths to the revealed resources, or the complete URL, including the domain. By default, the output contains the entire URL. Use the -wd, --without-domain switch to supress the domain.
ptwebdiscover -u https://www.penterepmail.loc -Po -r -wdSometimes you will also use the option to have the returned HTTP response headers displayed. In ptwebdiscover, you can turn on their listing using the -wh, --with-headers switch.
ptwebdiscover -u https://www.penterepmail.loc -Po -whThen, you can view the complete output not only on the monitor, but you can use the -o, --output switch to save it to a file for further processing by other tools.
ptwebdiscover -u https://www.penterepmail.loc -Po -o output.txtUsing threads
The great advantage of the ptwebdiscover over the dirb tool is mainly the ability to run in multiple threads, thanks to which this tool can be many times faster. By default, it is set to run in twenty threads, but you can modify it with the -t, --threads switch.
ptwebdiscover -u https://www.penterepmail.loc -t 100Conversely, if you need to limit the number of requests sent to the tested server to prevent server congestion, you can use the -d, --delay switch to set the delay (in milliseconds) between requests. In this case, the test will run in only one thread.
ptwebdiscover -u https://www.penterepmail.loc -d 100Using special dictionaries for technology identification
In addition to the above features, it is worth mentioning the possibility of using special dictionaries that contain data in the format location::technology. These dictionaries can identify technologies on the basis of resources typical for those particular technologies. For example, revealing the wp-include directory fairly accurately identifies the WordPress content management system. The use of these dictionaries is otherwise identical with the use of common dictionaries.
Notes
If you would try to use the tool against domains that you only have in the hosts file, add the -wdc, --without-dns-cache switch. The tool uses DNS caching, which unfortunately cannot process the content of the hosts file. Without using this switch, the domain would appear unavailable.
Before the test itself, the tool verifies the availability of the target domain. If you would like to have this test disabled, you can do so using the -wac, --without-availability-check switch.
Complete list of implemented switches
| -u | --url | <url> | URL for test (usage of a star character as anchor) |
| -ch | --charsets | <charsets> | Specify charset (example: lowercase,uppercase,numbers,[custom_chars]) |
| -lm | --length-min | <length-min> | Minimal length of brute-force tested string (default 1) |
| -lx | --length-max | <length-max> | Maximal length of brute-force tested string (default 6 bf / 99 wl |
| -w | --wordlist | [filename] | Use default or specified wordlist |
| -bw | --begin-with | <string> | Use only words from wordlist that begin with the specified string |
| -ci | --case-insensitive | Case insensitive items from wordlist | |
| -e | --extensions | <extensions> | Add extensions behind a tested string (“” for empty extension) |
| -ef | --extension-file | [filename] | Add extensions from default or specified file behind a tested string. |
| -r | --recurse | Recursive browsing of found directories | |
| -b | --backups | Find backups for db, all app and every discovered content | |
| -bo | --backups-only | Find backup of complete website only | |
| -P | --parse | Parse HTML response for URLs discovery | |
| -Po | --parse-only | Brute force method is disabled, crawling started on specified url | |
| -D | --directory | Add a slash at the ends of the strings too | |
| -sy | --string-in-response | <string> | Print findings only if string in response (GET method is used) |
| -sn | --string-not-in-response | <string> | Print findings only if string not in response (GET method is used) |
| -sc | --status-codes | <status codes> | Accept only response with status codes |
| -isc | --ignore-status-codes | <status codes> | Ignore response with status codes (default 404) |
| -tm | --time-min | <miliseconds> | Minimal time for response |
| -tx | --time-max | <miliseconds> | Maximal time for response |
| -clm | --content-length-min | <bytes> | Minimal length of response |
| -clx | --content-length-max | <bytes> | Maximal length of response |
| -m | --method | <method> | Use said HTTP method. Default: HEAD |
| -d | --delay | <miliseconds> | Delay before each request in seconds |
| -p | --proxy | <proxy> | Use proxy (e.g. http://127.0.0.1:8080) |
| -T | --timeout | <miliseconds> | Manually set timeout (default 10000) |
| -H | --headers | <headers> | Use custom headers |
| -ua | --user-agent | <agent> | Use custom value of User-Agent header |
| -c | --cookie | <cookies> | Use cookie (-c “PHPSESSID=abc; any=123”) |
| -a | --auth | <name:pass> | Use HTTP authentication |
| -t | --threads | <threads> | Number of threads (default 20) |
| -j | --json | Output in JSON format | |
| -wd | --without-domain | Output of discovered sources without domain | |
| -wh | --with-headers | Output of discovered sources with headers | |
| -ip | --include-parameters | Include GET parameters and anchors to output | |
| -tr | --tree | Output as tree | |
| -o | --output | <filename> | Output to file |
| -S | --save | <directory> | Save content localy |
| -E | --errors | Show all errors | |
| -s | --silent | Do not show statistics in realtime | |
| -v | --version | Show script version | |
| -h | --help | Show this help message |




