Ptinsearcher: The best tool for extracting metadata and other interesting things from websites

ptinsearcher

During penetration testing, you will often need to extract all comments, phone numbers, email addresses, links, and other relevant information from websites. On the contrary, you will need to extract all the metadata from documents and images. Our ptinsearcher tool will help you best with all of this, as it can extract a large amount of information from web pages (or locally stored files).

Installing the ptinsearcher tool

You can install the tool on devices with Python version 3.6 or higher by using the command:

sudo pip install ptinsearcher

Using the tool is very simple. Just run it against the specific URL which you specify after the -u, --url switch. Use the -e, --extract switch to select the specific types of information you want to retrieve from the specified source. The tool can extract the following types of data:

P - Telephone numbers\nE - E-mail addresses\nI - IP addresses\nU - Internal URL addresses \nX - External URL addresses\nS - Subdomains\nC - HTML comments\nF - HTML forms and their inputs\nM – Metadata

Therefore, if you would like to extract the phone numbers listed at the URL address https://www.example.com, you will do so with the command:

ptinsearcher -u https://www.example.com -e P

Types of information to be extracted can also be combined. If you intend to obtain phone numbers, email addresses, and HTML comments, you can use this command:

ptinsearcher -u https://www.example.com -e PEC

To obtain metadata from any source, the command can be used similarly:

ptinsearcher -u https://www.example.com/document.doc -e M

Obtaining information from multiple URLs

Multiple sources can be specified together after the -u parameter, for example:

ptinsearcher -u https://www.example.com/page1.html https://www.example.com/page2.html -e PEC

However, retrieving resources from a file will definitely be a more interesting option. Just prepare a text file named, for example, sources.txt, which will contain a list of URL addresses (one URL address per line). Then just run ptinsearcher against this source list using the -f, --file switch.

ptinsearcher -f sources.txt -e PEC

In this case, results are displayed for each source separately. If you want to merge them, use the -gc, --grouping-complete switch. You will get a list of all unique phone numbers, e-mails, comments, etc. from all the listed sources.

ptinsearcher -f sources.txt -e PEC -gc

Combining ptwebdiscover and ptinsearcher tools

It is advisable to combine ptwebdiscover and ptinsearcher tools. First, use ptwebdiscover to discover resources of the web application and save the found URLs to a file.

ptwebdiscover -u https://www.example.com -Po -r -o sitemap.txt

Then, use ptinsearcher to extract all the required information from all revealed resources.

ptinsearcher -f sitemap.txt -e PEC -gc

Searching for information in locally stored files

Information can be searched by ptinsearcher not only in web resources, but also in locally stored files. To do this, simply use the name of the locally stored file instead of the URL address in the -u parameter, for example:

ptinsearcher -u local.html -e PEC\nptinsearcher -u /home/example/local.html -e PEC

You can even use a file with a file list and refer to this file with the -f switch.

ptinsearcher -f localSources.txt -e PEC -gc

Complete list of implemented switches

-u--url<url>Test URL
-f--file<file>Load URL list from file
-d--domain<domain>Domain – Merge domain with filepath. Use when wordlist contains filepaths (e.g. /index.php)
-e--extract<extract>Specify types of data to extract [E, S, H, F, I, X, P, M, L, Q, A] (default A)
-o--output<output>Save output to file
-op--output-parts Save each extract_type to separate file
-gp --group-parameters Group parameters
-wp--without-parameters Without parameters
-g--grouping One output table for all sites
-gc--grouping-complete Merge all results into one group
-r--redirect Follow redirects (default False)
-c--cookie<cookie=value>Set cookie(s)
-H--headers<header:value>Set custom headers
-p--proxy<proxy>Set proxy (e.g. http://127.0.0.1:8080)
-ua--user-agent<user-agent>Set User-Agent (default Penterep Tools)
-j--json Output in JSON format
-v--version Show script version and exit
-h--help Show this help message and exit

Recent posts