Category: "LAMP"

How to Keep a Mobile Site Out of Search Engines

I have a mobile version of a site, which is accessed through auto-detection of mobile user-agents.

Since the content is virtually the same as the regular site, I didn’t want it indexed by search engines, both to prevent site visitors from landing on a mobile site, and to avoid SEO penalties for duplicate content.

I used the following RewriteCond/RewriteRule to deliver an alternate robots.txt file for the mobile site.

        RewriteCond %{HTTP_HOST} ^m\..*
        RewriteRule ^robots\.txt$ m.robots.txt [L]

The alternate robots.txt file is:

User-agent: *
Disallow: /

In this case, the site is run by eZ publish, but the same strategy should work with any other mobile site that is run as a subdomain.

Passing Structured Data from Perl to PHP

After writing many lines of Perl to decode data returned from database queries which must then be passed back to PHP, I found JSON conversion routines which are much faster and more reliable. In addition, unlike my awkward Perl, they work!

::::::::::::::
test.pl
::::::::::::::

#!/usr/bin/perl

use strict;
use JSON::XS;

# Thanks to: http://stackoverflow.com/questions/8463919/how-to-convert-a-simple-hash-to-json-in-perl
my $json=JSON::XS->new->utf8->pretty->allow_nonref;

my %data=('one',1,'two',"bee's",'three',0,'four',('a','monkey','b','cat'));

my $json_text=$json->encode(\%data);
# Escape the single quotes
$json_text=~s/(')/\\$1/g;
# Remove carriage returns
$json_text=~s/(\n)//g;

print<<RETURN;
json=$json_text
RETURN

::::::::::::::
test.php
::::::::::::::

<?php
include 'Zend/Json.php';

$aResult=null;
$aResponseLines=$aResult=array();

$sResponse=`perl test.pl`;

$aResponseLines=explode("\n",$sResponse);
if (count($aResponseLines)>0)
foreach ($aResponseLines as $k => $v)
{
        $e=strpos($v,'=');
        if ($e!==false)
                $aResult[substr($v,0,$e)]=substr($v,$e+1);
}

var_dump($aResult);

var_dump(Zend_Json::decode($aResult['json']));

bash Version Control Check-In

To make it easier to copy code from a development server to a version control server, check the file, then check it in, I created a bash script.

This script is also supported with ssh keys so the password does not need to be entered with each copy request. Thanks to: http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/

#!/bin/bash

# This script requires three parameters
#       server - The name of the server where the code will be retrieved from
#       file - The name of the file.  In this case, the path is hardcoded.
#       comment - A comment to include when checking in the file

if [ $# -lt 3 ]; then
        echo 'usage:'
        echo '        ~/get.sh server file comment'
else
        # Copy the file
        scp root@$1.domain.com:/opt/system/web/portal/$2 NEW

        # If the files can be compared
        if [ -r NEW -a -r "$2" ]; then

                # Create tmp file for the diff results
                TMP=`mktemp`
                diff NEW "$2" > "$TMP"

                # If the files are different
                if [ $? -ne 0 ]; then
                        more "$TMP"
                        echo -n 'Update? y/[n]? '
                        read
                        if [ $REPLY == 'y' ]; then
                                # Checkout the file
                                cleartool co -nc "$2"
                                # Copy the new file over
                                cp NEW $2
                                # Checkin the updated file
                                cleartool ci -c "$3" "$2"
                        else
                                echo 'no changes made'
                        fi
                else
                        echo 'files are the same'
                fi
                rm $TMP
        else
                echo 'scp failed or files missing'
        fi
fi

Thanks to:
http://www.cyberciti.biz/tips/shell-scripting-bash-how-to-create-temporary-random-file-name.html

Apache IE8 HTML entities filter

One of the pages in a web application displays text log file output in popup browser windows.

If that output includes this statement:

<?xml version="1.0" encoding="utf-8"?>

IE8 will try to parse the content as XML, and it will show an error:

The XML page cannot be displayed
Cannot view XML input using style sheet.
Please correct the error and then click the Refresh button, or try again later.
Invalid at the top level of the document. Error processing resource:

I didn’t want to add any scripting to the pages, since they’re text, and I didn’t want to make any coding changes. One solution is to use an Apache output filter to convert the text into HTML entities, and force the document type to text/html.

ExtFilterDefine htmlentities mode=output cmd="/usr/bin/php -R 'echo htmlentities(fgets(STDIN));'"

<FilesMatch "\.txt$">
  ForceType text/html
  SetOutputFilter htmlentities
</FilesMatch>

This is definitely a quick solution that may not be ideal for every situation, or could be refined.

The documents aren’t HTML, they are text. They don’t have any tags in them, and those that are there should not be treated as tags, but as text. Forcing the type to text/plain didn’t work.

Regardless, this is one way you can convert characters into HTML entities without modifying your code.

Different solutions:

  • Extend the filter to add the HTML tags necessary for a true text/html document
  • Modify the code to convert the document to HTML
  • Install recode (see link above)
  • Do something entirely different

Some Apache RewriteRules for Improved Security

A set of Apache RewriteRules, including curl commands to test them. Always test the rules, using a browser if possible, and curl. The curl output has been edited to make it easier to read.

Remember to write the rules carefully so you don’t deny access for valid requests, and use an appropriate 403 page, so real (good) people that arrive there have an opportunity to understand what happened and what they can do to request access. This may mean the rules blocked a valid request.

Route admin access through HTTPS

With the prevalence of laptops and WiFi, HTTPS is important for site security. This rule assumes the site is administered through a subdomain (admin.domain.com), and routes any request where the server name begins with admin through HTTPS.

RewriteCond %{SERVER_NAME} ^admin
RewriteCond %{HTTPS} =off [NC]
RewriteRule .* https://admin.domain.com [L]

Test

[user@localhost Backup]$ curl -i http://admin.domain.com
HTTP/1.1 302 Found
Location: https://admin.domain.com

Block Probing Requests, XSS Injection, and Unwelcome Referrers

Even if your server doesn’t have these scripts or URLs, it is good to block the requests. People or servers that are requesting them are not visiting your site, they’re attacking it.

This includes referrers that simply don’t make sense, or are seen in the logs or stats requesting content they shouldn’t.


RewriteCond %{REQUEST_URI} (\.aspx?|\.php)$ [NC,OR]
RewriteCond %{REQUEST_URI} (ldap|php\-?myadmin|scripts|mysql|wp\-login) [NC,OR]
RewriteCond %{QUERY_STRING} mouseover [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (sleuth|morfeus|wget|python|curl|perl|scanner|apache\-httpclient) [NC,OR]
RewriteCond %{HTTP_REFERER} \.(ws|in|ru|ua|tv)/?$ [NC,OR]
RewriteCond %{HTTP_ACCEPT_LANGUAGE} en-us [NC,OR]
RewriteRule ^.*  - [F]

Tests

RewriteCond %{REQUEST_URI} (\.aspx?|\.php)$ [NC,OR]

This site is a PHP site which routes requests through RewriteRules, there should not be any direct requests for .php. Since it’s a PHP application, requests for .asp and .aspx should never be received either.

[user@localhost ~]$ curl -I http://domain.com/login.aspx
HTTP/1.1 403 Forbidden

RewriteCond %{HTTP_REFERER} \.(ru|ua|tv)$ [NC]

After too many requests from referrers ending in .ru, .ua, and .tv, I decided to block them.

[user@localhost ~]$ curl -I -e ‘http://some.ru’ http://domain.com
HTTP/1.1 403 Forbidden

RewriteCond %{QUERY_STRING} mouseover [NC,OR]

This rule is in response to some sort of XSS injection attack which included onmouseover.

[user@localhost ~]$ curl -I http://domain.com/?onmouseover
HTTP/1.1 403 Forbidden

RewriteCond %{REQUEST_URI} php\-?myadmin|scripts|mysql|wp\-login [NC,OR]

These are all common requests for variations of phpmyadmin, utility scripts, access to mysql and the login for WordPress. They should not be received by this server.

[user@localhost ~]$ curl -I http://domain.com/scripts
HTTP/1.1 403 Forbidden

RewriteCond %{HTTP_ACCEPT_LANGUAGE} en-us [NC,OR]

The site I’m protecting is delivered in US English. If US English isn’t one of the languages accepted by the client, the request will be denied.

Checking the latest visitors log verified the rules are working properly.

Host: 113.53.253.77

/index.php HTTP Response: 403 Date: Feb 24 07:46:56 Bytes: 629
/admin/index.php HTTP Response: 403 Date: Feb 24 07:46:57 Bytes: 633
/admin/pma/index.php HTTP Response: 403 Date: Feb 24 07:46:58 Bytes: 636
/admin/phpmyadmin/index.php HTTP Response: 403 Date: Feb 24 07:46:59 Bytes: 639
/db/index.php HTTP Response: 403 Date: Feb 24 07:47:01 Bytes: 631
/dbadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:02 Bytes: 634
/myadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:03 Bytes: 634
/mysql/index.php HTTP Response: 403 Date: Feb 24 07:47:04 Bytes: 634
/mysqladmin/index.php HTTP Response: 403 Date: Feb 24 07:47:05 Bytes: 637
/typo3/phpmyadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:06 Bytes: 640
/phpadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:07 Bytes: 635
/phpmyadmin1/index.php HTTP Response: 403 Date: Feb 24 07:47:10 Bytes: 637
/phpmyadmin2/index.php HTTP Response: 403 Date: Feb 24 07:47:11 Bytes: 637
/pma/index.php HTTP Response: 403 Date: Feb 24 07:47:12 Bytes: 632
/web/phpMyAdmin/index.php HTTP Response: 403 Date: Feb 24 07:47:13 Bytes: 640
/xampp/phpmyadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:14 Bytes: 641
/web/index.php HTTP Response: 403 Date: Feb 24 07:47:15 Bytes: 632
/websql/index.php HTTP Response: 403 Date: Feb 24 07:47:17 Bytes: 634
/phpmyadmin/index.php HTTP Response: 403 Date: Feb 24 07:47:18 Bytes: 636
/phpMyAdmin/index.php HTTP Response: 403 Date: Feb 24 07:47:20 Bytes: 637
/phpMyAdmin-2/index.php HTTP Response: 403 Date: Feb 24 07:47:21 Bytes: 639
/php-my-admin/index.php HTTP Response: 403 Date: Feb 24 07:47:22 Bytes: 638
/phpMyAdmin-2.2.3/index.php HTTP Response: 403 Date: Feb 24 07:47:23 Bytes: 642
/phpMyAdmin-2.2.6/index.php HTTP Response: 403 Date: Feb 24 07:47:24 Bytes: 643
/phpMyAdmin-2.5.1/index.php HTTP Response: 403 Date: Feb 24 07:47:24 Bytes: 642
/phpMyAdmin-2.5.4/index.php HTTP Response: 403 Date: Feb 24 07:47:25 Bytes: 643
/phpMyAdmin-2.5.5-rc1/index.php HTTP Response: 403 Date: Feb 24 07:47:31 Bytes: 646
/phpMyAdmin-2.5.5-rc2/index.php HTTP Response: 403 Date: Feb 24 07:47:32 Bytes: 646
/phpMyAdmin-2.5.5/index.php HTTP Response: 403 Date: Feb 24 07:47:32 Bytes: 643
/phpMyAdmin-2.5.5-pl1/index.php HTTP Response: 403 Date: Feb 24 07:47:33 Bytes: 646
/phpMyAdmin-2.5.6-rc1/index.php HTTP Response: 403 Date: Feb 24 07:47:34 Bytes: 646
/phpMyAdmin-2.5.6-rc2/index.php HTTP Response: 403 Date: Feb 24 07:47:35 Bytes: 646
/phpMyAdmin-2.5.6/index.php HTTP Response: 403 Date: Feb 24 07:47:36 Bytes: 643
/phpMyAdmin-2.5.7/index.php HTTP Response: 403 Date: Feb 24 07:47:37 Bytes: 643
/phpMyAdmin-2.5.7-pl1/index.php HTTP Response: 403 Date: Feb 24 07:47:38 Bytes: 646

* One last note. If you block curl, testing with curl must set the user agent to something other than curl, or every test request will be blocked by the user agent rule.