By Seed Plugins:
WP Link Status Pro is a WordPress plugin to check and manage the HTTP status response codes of all your content site links and images.
It works creating a set of configurations called scans where you can analyze your content to extract links and images, looking for broken links, redirections, etc.
The results of each scan can be filtered in so many ways, perform several types of searches, and generating a detailed report of the links and images detected.
From the scan results you can change directly URLs or links anchor text, and more actions like unlinking or convert to nofollow link, without having to edit the entire post.
Apart of these scans reports, this plugin provides an extra URL tool to perform quick and massive changes from a set of predefined URLs.
WP Link Status Pro works as a WordPress plugin and the minimum version recommended is WordPress 3.4
This plugin runs on servers with PHP from version PHP 5.1.3
To perform HTTP requests the cURL library is needed, and must be installed as a PHP library libcurl. It is not defined a mininum version of cURL, but it is strongly recommended updating to the latest versions if possible.
From the client side, this plugin runs entirely in the administration area of WordPress, and uses jQuery (provided by WordPpress) as a main client framework.
To install this plugin you need to locate the wp-link-status-pro.zip file at the root of your downloaded files.
Next, visit the Plugins page from your WordPress main menu, and select the Add New link:
Now click on Upload plugin:
Select the Browse button, choose the wp-link-status-pro.zip file from your computer, and press Install now:
Once uploaded comes the last step, click on Activate Plugin:
And that’s all, you can see a new menu WP Link Status:
The old way to install this plugin (FTP)
This plugin can be deactived in the usual way via WordPress plugin page administration.
Once deactivated, it can be uninstalled in the same plugin page with the Delete link provided by WordPress.
But this usually only removes the plugin files. To remove it completely, including options, user data, and this plugin specific MySQL database tables, before uninstall you need to go to the plugin “Settings” section, select the “Advanced” tab, and mark Data on uninstall field:
This ensures that plugin data will be removed on the plugin unistalling process.
This plugin activity is organized with entities called scans. You can create multiple scans, each one with a different configuration, and run them in independent crawling processes, and therefore to offer a different set of results.
The first step to analyze your site links is to create a new scan and to configure it. To get started you have a link located at the top of the plugin page with the text Add new scan:
Another link to create a new scan is located in the plugin submenu with the caption New Scan:
The general tab defines basics of the scan and this is how it looks like:
First of all is recommended to assign a description in the Scan name field to identify later the current scan.
One of the most important things to decide in this tab is the link types, if they only can check for links, images or both. It is mandatory to choose at least one option.
The next field Destination type refers to the scope of the links: all the site links, only internal links, or links to point to external sites.
One more option is Time scope. By default the scan will inspect all the content, but it is possible to shorten from last periods, like yesterday, one week or one month ago, etc.
In the next field Crawl order you choose where to start, first the recent posts, or maybe you want to check first the oldest content. This option also determines the order in which the scan results are presented by default.
When a link results a redirection, selecting the next field Check status of destination URLs you can follow the redirected URL and check their status code as well. For security and server performance, there is a maximum number of attempts following redirections, you can check it from the General Settings section of this plugin.
Another option is to track or not the malformed links. A malformed link is a bad formatted URL in your content of which can not check their HTTP status. To identify this elements, mark the option Track malformed links (selected by default).
In the last field Send an e-mail when the scan is completed you can choose to send to the default blog e-mail address or add more e-mail addresses (comma or semicolon separated).
In this tab we can configure the content from which we will obtain the links to inspect:
First we need to select any of all the registered Post types. By default are selected Posts and Pages.
In the same way, for this post types is necessary to choose one or more Post Status that defines the status scope of the post types. If any of post types are selected, we need to mark one or more post status as well.
Mainly links and images will be extracted from the post content data. But if you have custom post fields with extra data, it is possible to extract links from these fields. The Custom fields section allows you to add them by their field name, specifying if the expected values are full URLs or HTML where we can find links.
Next field is Comment links. You can mark Approved or Pendig comments, but this field is optional. Comments links will be inspected in the content of the comment, and also in the comment author URL.
The last situation is Blogroll links, that is compatible with the old WordPress system to create sidebar blogrolls.
It is necessary to mark any type of links source, either post types links, comments or blog roll, but it is mandatory to select at least one of these content sources.
This section allows you to create several filters before generating final results:
It is possible to admit only links with certain anchor text condition. With this filter you can enter a complete or partial anchor text and to define its condition: if it is contained or not with this text, starts or end by, etc.
This filter means that only certain matching URLs will be inspected. The condition could be any matching string, a prefix of suffix string, or a full URL matching.
Keep in mind, for prefix or full URL options, that this filter works with complete URL structures, those URLs begins with http:// or https://
Relative or absolute links (or protocol relative links) are outside the scope of this filter, because it only works with the absolute URL version of this cases.
The opposite of the previous section, excluding URLs is an inverse filter where the matching URLs will not be collected.
Perhaps the most complicated filter, because it only works into the link or image attributes. It allows to define an HTML attribute (rel, title, alt, etc.) and filter by matching values.
The last field saying Accelerate crawling process integrating filters in main database query means that, before analizing each content data, these filters can be part of the database query, resulting in a very fast way to identify posts that meet these conditions and saving processing time.
This is the last section where we can define additional scan settings:
As we can read here, these are optional fields and they can be left empty, replaced by default values in this case.
The Number of threads means the number of internal HTTP processes working with this scan. The more processes are assigned, faster the scan will be completed. But more speed could affect server performance, and generally it is not advisable to configure a scan with more of one thread (this is the value by default). You can see more details about performance in the Run the crawler section.
Conection timeout and Request timeout are values related to the time spending an HTTP request. Usually, a value of 10 seconds for connection, and 30 seconds for requests timeout will be enough. But it is possible to define other values for excepcional cases.
Once the scan is configured properly, the next step is to save the scan settings. There are two buttons enabled below the form to do it:
Save scan changes is the default button. Pressing this button saves the scan settings reloading the page, and you can continue editing the scan normally.
The other button Save and run crawler performs two actions, saving the scan form and runs a process called crawler. The crawler is the execution mode of the scan settings and it is explained in the Run the crawler section.
Respect to the scan settings, running the crawler is a one way operation. Once the crawler is activated, the scan settings fields becomes disabled (except the scan name text field) and it is not possible to go back to the previous mode:
There is an exception to this disabled mode. When a crawler is active but not completed, it is possible to change the values of the advanced tab without stopping the crawling process. At the end, with the crawler finished, this tab fields will be disabled too.
The main plugin page shows the scans list, with some details like the scan status (if the crawler is running, stopped, or just not started) or the results progress, and a summary of the scan configuration in plain text.
The scans, and any other data structure of this plugin, are not stored as a custom post types, so it should not affect the performance of the WordPress core, the active theme or other plugins.
By defaut 5 scans per page are shown. You can change this number using the standard WordPress button Screen Options located at the top of the page, that display the scans pagination form:
To edit a scan, you can access to its configuration screen through the scans list. Each scan has a row action links with different purposes, in this case you need to click Edit scan link.
Remember that editing the scan values only is possible when the crawler has never been started, like it is described in the section Save scan.
If some critical values are missing the scan is unable to start the crawler and is marked as not ready. In the scans list page they appear with a warning orange icon indicating something is wrong:
This kind of problem can be located in three sections in the edit scan form. In the general section, if not selected at least one of the link types, a error message is displayed:
From the content options folder we have two potential issues. Without selecting any post type, comment option or blogroll, this screen appears:
In the same way, select one or more post types, but without selecting any post status will raise this error as well:
The last type of error is related to the status codes, if none checked this error is shown:
You can remove any scan at any time without restrictions. Even if the scan is in crawling mode, stopped, not ready, etc.
There are three locations to delete scans. One from the scan list page, for each scan row actions you can press the Delete scan link:
Another way is checking the desired scans and select the bulk action Delete option:
The last place is on the edit scan page. Below the form, right next to the save button, you have another Delete Scan link:
In all the contexts there is a confirmation box to avoid accidental scan removing:
Running the crawler is the next logical step to apply the scan configuration and to obtain link analysis reports. The first way to do it is directly from the edit scan form, pressing the button Save and run crawler as described in the section Save scan.
The second way is into the list scans page. On the one hand, for scans with the crawler never activated, you have a handy link Ready to start the crawler showing this action:
In addition, for each scan, when the crawler is not running, you have in the action rows the link Start crawler
The crawler works in unattended mode, so you do not need to keep the browser opened or a WordPress session active. When the crawler is running, in the scans list an orange indicator will appear before the scan name.
The crawler module focuses on server performance with mechanisms that prevent monopolizing database access and avoid overload web server requests, trying to execute their scripts the minimum time possible. The recommended configuration is 1 thread per scan, and maximum 1 scan running at the same time. You can check the default values in the Crawling Settings section, and for single scans in the advanced tab.
A note for developers: the crawler module works submitting HTTP requests through internal plugin scripts. So if you are running this plugin under an environment outside the Internet (e.g. local or development server) you need to add the involved host names or domains into your hosts file, both the client and the server where this plugin is executed.
In the same way, if the site you are crawling implements browser password protection, you need to remove this password restriction in order to work properly.
Once started, you can visit the crawler results page inmediately (without having to wait for the scan to be completed) and you can see the progress of scan analysis.
As previously explained at the Save scan section, running the crawler module disables the scan edit fields. But bear in mind an exception to this rule. During the process the scan is running (or stopped, but not completed), it is possible to change the values of the advance tab:
A running crawler can be stopped at anytime in the scans list page through the Stop crawler option of the row actions links:
Once this is done, the indicator before the scan name changes to black background color, and in the row actions will be back the Start crawler option:
Stopping sends a signal to the crawler process to do not check more links, stop checking the database content and do not perform more HTTP requests. Stop a crawler does not have to be immediate, and it can spend some seconds ending current internal operations.
Even turning off the crawler, the collected data and scan results are still available through the crawler results page.
There is a limit defined in the settings sections to the number of crawlers running
simultaneously. You can see this value in the menu Settings > Crawling tab:
The Max crawlers running factor determines how much scans can coexists with the crawler in active or running mode. By default only one active crawler is allowed, because this fact has an special impact on server performance, but you can try and test different values based on your server capabilities.
When you start a crawler but this limit is reached, instead of being active the scan turns in a special queued mode. This mode is very similar to the stopped mode but with a difference: when the current active scan is finished, the next queued scan can change to the running mode automatically.
In the scans list the indicator changes to grey color:
As it can be seen, a new row action appeared with the link Unqueue crawling. It is a simple mechanism to change to the previous mode: stopped or not started. Unqueuing a scan disables the self-start crawling mechanism.
From the main scans list page, you can access to an existing crawler results section doing a click in the scan name, or clicking the Show results link of the scan actions row.
The crawler results page displays all links found as well as several filters along with tools to take control of the detected links. That is how it looks like:
At top of the page there is a section with the current scan info. It is the same scan information showing in the scans list page, including row actions links that you can use also here.
Before the results there is a section with a select box to perfom bulk operations on the results. And in the right side are placed the pagination links.
As the same way of the scans list, in the crawler results pages you can change the number of results per page clicking on the Screen Options and change the default value:
There are predefined filters working as a simple menu of status levels where you can quickly filter and navigate for the existing status codes groups found, of filter by requests error as well.
And next the advanced search area, where we will see how you can use many powerful filters in a very easy way.
Besides the status level menu, the advanced area displays a select box with all variations of the status codes, status levels and error requests found by the crawler.
The content filter shows the content types selected in the scan configuration and the post types having results. You can select to filter only entries, post types (posts, pages, etc.) or comments or blogroll links if they contains any result.
This is a simple filter to differentiate between links and images.
The default report shows all the results. But there is an special situation described in the URL actions section where you can ignore or hide certain results. So this filter option allows you the keep the default visibility, display only ignored results, or mix both states.
This filter is related about search engine optimization and how search engines read your URLs. It works with the link rel attribute, filtering results with nofollow values or without nofollow (the dofollow option).
The protocol filter allows you to search results with the usual http protocol, the secure protocol https, and also with the protocol relative // way.
This filter finds special links morphology. You can filter by relative or absolute URLs, spaced links (URLs with a spaces before of after the quotes), and also with malformed URLs.
It is possible to filter results by user actions. If you have modified an URL, changed the anchor text, etc. these actions are saved and you can filter later to show modifications (edit url, anchor, added/removed nofollow, or apply redirections), unlinked results, unmodified links or URLs rechecked.
A simple filter to display internal or external links:
The last option is related about the order of results. By default is selected the crawl order from the scan configuration, but you can change to order by recent or oldest content, or order by domain name alphabetically, and also by download time or download size.
Another way to filter results is searching by URL. The default mode is the Matched string, searching for any coincidence. But it is possible to change it to search by URL prefix, URL suffix, Full URL or even the URL fragment # part.
This is a search of complete URL structures, so in cases of URL prefix or Full URL, remember to include the http:// or https:// protocols.
Another relevant point is that this search works finding also the final redirection URL for results responding with 3xx status codes.
The anchor text search works only for links, not with images. By default search any coincidence with the Matched String mode, which can be changed by Starts with mode, End by, or search by Full anchor as well.
If you do not need all advanced filters showing all the time, it is possible to hide the Advanced filters area. To do it simple click in the close button located at the right side.
This action will be saved as a user preference and remembered next time is loaded the crawler results page, to show or not the advaced filters area.
If you are using many filters and you are not sure which ones are working or not, you can reset all the filters to their initial values and start again a new search. The reset button is located below de close button.
Closing the Advanced filters area does not hide entirely all the filters. The area between the Bulk actions and the pagination links now shows the basic filters of Status codes, Content types and/or Post types, and basic Link types.
To return to the previous state and show again the Advanced filters, there is a button in the right side of the basics filters area:
One of the most interesting things about this plugin is the possibility to edit directly the crawler results and modify the content without entering in the post content editor, comment editor or blogroll manager.
The row actions of each item result allows you to perform some modifications like edit the URL, unlink links or remove images, recheck status, show request and response headers, etc.
You can change the URL clicking the Edit URL option of the row actions links:
Next an inline form shows allowing you to modify the source URL. There is a Cancel button to go back and close the form. The Update URL button communicates with the server to update the content where the URL is located.
If everything is Ok, a confirmation message will appear, and the URL is marked with the tag modified. You can also search by this modification state in the advanced filters area.
In cases or redirected URLs responding 3xx status codes the item result is flagged with the redirect tag showing also the number of redirections.
Here is another way to modify an URL replacing it with their final redirection. To do it simple click in the Apply Redirection link of the result row actions:
The next step ask you to confirm this replacing action through an inline form. You can press Cancel to close this form and go back. Pressing Set redirection starts the server communication to perform this action.
If the server responses Ok, the URL will be changed by the redirection URL, and the result flagged with the modified tag.
Another possibility is unlink links or remove images. Unlink means that the HTML
a tag will be removed but leaving the anchor text. In the case of images, the enterely
img tag will be removed.
You can start this process clicking from the actions row the option Unlink (for links) or Remove (for images):
Next appears the inline confirmation form, where you can press Cancel to close this form, or the Unlink/Remove button which starts the process.
Finally, if the server response is Ok, the result is flagged with the Unlinked tag. From here it is not possible to do more operations for this result, and only the native WordPress actions for the Content column will remain active.
In addition to edit URLs, you can change the
rel property of each link adding or removing the nofollow attribute. For links without nofollow, there is an option Add nofollow in the actions row:
Next appears the inline form, where you can Cancel this process, or press the Add nofollow button starting the server communication.
After the server responses everything is Ok, the URL as flagged with a nofollow tag:
Now a new link Remove nofollow appears in the actions row to do the inverse operation:
It is possible to hide certain results clicking the Ignore action row:
A confirmation inline form will appear, where you can Cancel this operation and close the form, or click the Ignore button. This causes that the result will be removed of the results.
To see again the ignored results you need to use the advanced filters, where you can change the visualization of Not ignored results to Only ignored results or Ignored and not ignored.
These types of visualization brings back the hidden results, and you can reverse their status using the special action row Undo ignore:
This is a simple action row Visit to open a new browser tab and load the result URL. It is the same behaviour of directly click the URL of this result:
From the status row you can check again the HTTP pressing the Recheck link. Once this operation is completed and the server data is received, the HTTP status code, download time and size will be updated, and the result will be flagged with the rechecked tag:
This feature allows you to see the response and request headers of last HTTP status check, simply clicking the Show headers link:
This action displays a popup area where you can see the HTTP headers details:
The anchor text can be changed with the Edit anchor text option of the actions row:
It shows an inline form to edit the anchor value. You can press Cancel to close this form, or start the process clicking the Update anchor text button:
If the server response is Ok, the anchor will be flagged with the modified tag.
The content column shows the WordPress content entities like posts, comments or blogroll links. The operations of the actions row of each result are linked to the default WordPress screens. Here there is an example of actions allowed for a single post:
To do massive changes you can use the Bulk actions select box. It supports the same row actions for the URL, status and anchor columns, and it is possible to apply these operations in the same way: showing a confirmation form that you can cancel, and a summit button to send operations to the server:
This is a tool to perform quick changes to your content searching from a set of defined URLs. You can load this screen by clicking the URL Tools submenu link:
This utility displays a single form. By one hand, a big textbox where you can enter a set of complete URLs:
Next there is a select box to choose the desired operation:
Just beside another select box allowing to choose the mode you execute these operation:
Run a test without database changes, but you can see the expected final results.
Real execution mode, we recommend to perform a database backup before do it.
The submit button display a diffente caption for each operation. With the test mode it shows Execute test process. When the update mode is selected, the caption button is Execute database update, and also when is pressed it shows a confirmation box remembering to do a database backup.
Sending the form starts a search operation of URLs matching the posts content. Each URL result shows if the first column the posts or pages which appears. In the second column, the anchor text (in case of links). Last column display the operation attempt response:
The settings section allows you to change any default value used by this plugin. You can access to the settings through the last submenu:
The first tab of the settings section shows all values related to the crawling process:
Number of crawler threads
Each scan works internally under a web server HTTP request. By default only one crawler script is running at once (this is the recommended setting). But you can define more threads if your server performance supports it.
Max crawlers running
Usually one scan whith their own threads is running at the same time. But it is possible to run more than one scan at once changing this value. One more time, we do not recommended to change this setting.
Max pack items
In each crawler execution the database is inspected looking for URLs and stopped when a URL is found to check it. The max pack items limit is used to avoid situations when the crawler script monopolizes the database access. When this limit is reached, the script ends its execution to free up resources and starts another crawler request according to the web server priority (in other words, it enqueues the crawler script allowing other resources to be managed or executed).
Max URL requests attempts
In case of any HTTP request error, the crawler does several attempts before to flag with an error the URL. This setting provides a value of max attempts.
Max redirections allowed
For redirected URLs that results in more redirection, this value set a limit to avoid infinite redirections.
Max download size
The size limit in KB to avoid large downloads.
Default user agent
The user agent for all the requests.
This section defines all the settings related with duration and timeouts:
URL Connection timeout
Seconds allowed to connect to the request URL host.
URL Request timeout
Max time allowed to retrieve headers and body from one URL.
URL Extra timeout check
A little grace period to avoid internal timeouts conflicts.
Check crawler alive each
Period time to check if a crawler is interrupted and if so restart it.
Total objects check each
Period time to calculate summaries of the total objects (posts, comments or blogroll)
Summary of status each
Period time to calculate status code totals to display data in real time.
Summary of URLs each
Current number of URLs processed or waiting to be checked.
Summary of objects each
Summary of objects (posts, comments or blogroll) with processed URLs.
The last settings section is reserver for very technical details:
Some parts of the crawler module are recursive and in extreme circumstances could affect server memory. To avoid that this setting limits the recursive iterations.
Data results pagination
A setting to use or not the MySQL feature SQL_CALC_FOUND_ROWS to calculate total rows in pagination results.
Data on uninstall
As a commented in the uninstalling section, if this option is checked all plugin data will be removed on uninstall: options, user metadata and mysql tables.