During penetration testing, you will often need to extract all comments, phone numbers, email addresses, links, and other relevant information from websites. On the contrary, you will need to extract all the metadata from documents and images. Our ptinsearcher tool will help you best with all of this, as it can extract a large amount of information from web pages (or locally stored files).
Installing the ptinsearcher tool
You can install the tool on devices with Python version 3.6 or higher by using the command:
sudo pip install ptinsearcherUsing the tool is very simple. Just run it against the specific URL which you specify after the -u, --url switch. Use the -e, --extract switch to select the specific types of information you want to retrieve from the specified source. The tool can extract the following types of data:
P - Telephone numbers\nE - E-mail addresses\nI - IP addresses\nU - Internal URL addresses \nX - External URL addresses\nS - Subdomains\nC - HTML comments\nF - HTML forms and their inputs\nM – MetadataTherefore, if you would like to extract the phone numbers listed at the URL address https://www.example.com, you will do so with the command:
ptinsearcher -u https://www.example.com -e PTypes of information to be extracted can also be combined. If you intend to obtain phone numbers, email addresses, and HTML comments, you can use this command:
ptinsearcher -u https://www.example.com -e PECTo obtain metadata from any source, the command can be used similarly:
ptinsearcher -u https://www.example.com/document.doc -e MObtaining information from multiple URLs
Multiple sources can be specified together after the -u parameter, for example:
ptinsearcher -u https://www.example.com/page1.html https://www.example.com/page2.html -e PECHowever, retrieving resources from a file will definitely be a more interesting option. Just prepare a text file named, for example, sources.txt, which will contain a list of URL addresses (one URL address per line). Then just run ptinsearcher against this source list using the -f, --file switch.
ptinsearcher -f sources.txt -e PECIn this case, results are displayed for each source separately. If you want to merge them, use the -gc, --grouping-complete switch. You will get a list of all unique phone numbers, e-mails, comments, etc. from all the listed sources.
ptinsearcher -f sources.txt -e PEC -gcCombining ptwebdiscover and ptinsearcher tools
It is advisable to combine ptwebdiscover and ptinsearcher tools. First, use ptwebdiscover to discover resources of the web application and save the found URLs to a file.
ptwebdiscover -u https://www.example.com -Po -r -o sitemap.txtThen, use ptinsearcher to extract all the required information from all revealed resources.
ptinsearcher -f sitemap.txt -e PEC -gcSearching for information in locally stored files
Information can be searched by ptinsearcher not only in web resources, but also in locally stored files. To do this, simply use the name of the locally stored file instead of the URL address in the -u parameter, for example:
ptinsearcher -u local.html -e PEC\nptinsearcher -u /home/example/local.html -e PECYou can even use a file with a file list and refer to this file with the -f switch.
ptinsearcher -f localSources.txt -e PEC -gcComplete list of implemented switches
| -u | --url | <url> | Test URL |
| -f | --file | <file> | Load URL list from file |
| -d | --domain | <domain> | Domain – Merge domain with filepath. Use when wordlist contains filepaths (e.g. /index.php) |
| -e | --extract | <extract> | Specify types of data to extract [E, S, H, F, I, X, P, M, L, Q, A] (default A) |
| -o | --output | <output> | Save output to file |
| -op | --output-parts | Save each extract_type to separate file | |
| -gp | --group-parameters | Group parameters | |
| -wp | --without-parameters | Without parameters | |
| -g | --grouping | One output table for all sites | |
| -gc | --grouping-complete | Merge all results into one group | |
| -r | --redirect | Follow redirects (default False) | |
| -c | --cookie | <cookie=value> | Set cookie(s) |
| -H | --headers | <header:value> | Set custom headers |
| -p | --proxy | <proxy> | Set proxy (e.g. http://127.0.0.1:8080) |
| -ua | --user-agent | <user-agent> | Set User-Agent (default Penterep Tools) |
| -j | --json | Output in JSON format | |
| -v | --version | Show script version and exit | |
| -h | --help | Show this help message and exit |




