MySQL REGEXP Validation / Error Handling

Allowing users to run regular expression (REGEXP) searches through a web interface provides excellent search capabilities with very little engineering. One need only change WHERE `field`=’value’ to WHERE `field` REGEXP ‘regex’.

A problem arises if the regex entered is invalid. MySQL will report an error.

One solution that works well is to wrap the mysql_query string in a function or class method. Then, when testing for an error, check (use stristr) to see if the error was a regex error. Regex errors should be handled as user errors, not application errors. They should be logged to allow later analysis (if there are alot of regex errors, help should be provided).

Another approach is to use the PREPARE STATEMENT (see link above). This will throw an error if there are any invalid regexes.

Unwelcome Visitors

Pay particular attention to the site visitor IP addresses in the web stats.

Any IP address that has a significant amount of requests for unknown reasons should be suspect.

Use the link above, or its overseas equivalent to determine who the IP address belongs to. If you don’t think they should be using your site heavily, use an .htaccess file to block them. Be sure to block the entire block reported by the whois system.

Apache allow,deny directives allow you to block requests: http://httpd.apache.org/docs/2.2/howto/access.html

These requests will show up as Forbidden in the stats later, and should eventually stop, since they aren’t being served.

In addition - if you are seeing alot of unexplained requests, from unknown sources, you should check your server for unauthorized content.

Speed up eZ publish

The following steps can be taken to make an existing eZ publish installation run more quickly:

  • Empty the trash
  • Clear the collected information
  • Clear the search stats
  • Delete all the drafts, especially those of the administrator

These steps remove unused information from the database and can really help the system run faster.

PHP session management and temporary file cleanup

By default, PHP manages sessions during requests. This simplifies installation, because no cron job is required. To allow performance tuning, several configuration settings are allowed to adjust the frequency of the session cleanup checking and execution.

In the default php.ini, there is the following line:

cd /path/to/sessions; find -cmin +24 | xargs rm

Added as a cron job, this can be executed at set intervals (every one to five minutes is probably good), to clear the session file storage directory of files older than 24 minutes. Once implemented the find command can replace the default PHP session cleanup, so those configuration variables can be set to never check for session timeouts. This will speed PHP processing, although it may be imperceptible. It will also improve your control of session length, because the checking and clearing are executed based on time, not PHP requests.

Name any temporary files required to support the session with the session id, plus an extension to indicate the type and use. For example 4365kh2kj54dfg2kjh12.wav, 4365kh2kj54dfg2kjh12.rpt.txt, 4365kh2kj54dfg2kjh12.rpt.html. This allows a second session cleanup script to check for the presence of the session file based on the temporary file’s basename, and delete the temporary files. The second session cleanup script can be included in the main PHP script, or run as a separate cron job, unless the contents of the temporary files could be considered protected data, in which case they should be removed as soon as the session is terminated.

If authentication (logout) is also used to destroy sessions, a mechanism should be provided to clear any related, sensitive, files.

Questions to Ask References for Web Development Companies

Check references for sites similar to the one you would like.

Key questions / topics:

  • Tell them about how much you are willing to spend to get an idea of what the site cost. If your budget is far below the site cost, it may not be a good match.
  • Was the work was done on time and within the budget?
  • Are any recurring fees? Hosting? Maintenance?
  • Issues with email, including spam?
  • Problem resolution - was the company helpful when difficulties arose or changes were requested.