Category: "Ajax/JSON"

Filtering Spam - Tips & Ideas

SSH in to the server and find the affected inbox.

Quick Look at the inbox

This command shows the From, X-Spam-Bar, and Subject, without the file name. If you want to see the file names, remove the -h. You may have to use a different header for the spam score.

grep -h “From: \|X-Spam-Bar: \|Subject: ” * | more

The output should show give you an overview of what has been delivered, where it came from, and the subject.

Spam Score

The spam score is an appealing tool because it adapts to the current spam environ, and includes many parameters such as the sending IP address, domain, email address, SPF and reverse DNS information.

This is a very simple script that gives an overview of the spam bar values. It can be used as a first pass t set the level for filtering.

echo ‘Items in Inbox’
ls -1 | wc -l
echo ‘No Spam Bar (probably not spam)’
grep -L “X-Spam-Bar:” * | wc -l
echo ‘X-Spam-Bar Counts’
echo ‘+’
grep -m1 “X-Spam-Bar: +$” * | wc -l
echo ‘++’
grep -m1 “X-Spam-Bar: ++$” * | wc -l
echo ‘+++’
grep -m1 “X-Spam-Bar: +++$” * | wc -l
echo ‘++++’
grep -m1 “X-Spam-Bar: ++++$” * | wc -l
echo ‘+++++’
grep -m1 “X-Spam-Bar: +++++$” * | wc -l
echo ‘++++++’
grep -m1 “X-Spam-Bar: ++++++$” * | wc -l

It’s good to check the spam bar for valid emails as well. Many times, ‘real’ email is given a spam score.

From addresses

Look for patterns in the from addresses. Common patterns include ‘info@somedomain.info’ - the info, either as the sending email user or TLD is frequently used.

grep -h -m1 “From: ” * | sort

Subject

The subjects should be checked for the patterns in a similar manner as the From header.

Body

Using the results of the From and Subject checks, review a few of the message bodies (read the emails). Look for common text that would not be used by people. For example, Dear email@domain.com. People wouldn’t use an email address in a salutation, neither would real newsletter senders or other respectable sources.

Set the Filters

Read the directions. All the systems work differently.

Test the filters with some valid emails and some spam to be sure they behave as intended.

I created three filters.

  • Discard - The discard filter checks for glaring spam signs, such as a server that sent many spams messages, a from address pattern, distinct text patterns that simply wouldn’t come from people or other valid sources. These are discarded without warning to the sender.
  • Fail with message - Fail with message warnings were issued for anything that looked like spam, but might still be a valid email. Since the only ones that will read the fail with message text are people, it’s worth sending a friendly message, with a proposed solution. A good proposed solution is to suggest they use the site’s contact form. This will usually bypass the spam filters. It’s probably not a good idea to put a URL in the message, since a creative spammer may use it.

Zend Framework - No translation for the language 'en_US' available.

Notice: No translation for the language ‘en_US’ available. in /var/www/html/ZendFramework-1.10.8-minimal/library/Zend/Translate/Adapter.php on line 441

I’ve been working on an application using Zend Framework as the foundation. One of the key elements is ensuring internationalization/i18n support, and the error above was being displayed.

In addition to translating page text, I wanted to add htmlentities
conversion, without calling it explicitly.

I created an adapter which extends the Gettext adapter.


<?php

Class CLOUD_Translate_Adapter_Gettext Extends Zend_Translate_Adapter_Gettext
{
    public function translate($messageId, $locale = null)
    {
		return htmlentities(parent::translate($messageId,$locale),ENT_QUOTES,'UTF-8');
	}
}

To prevent the error from being displayed, I changed:

        
$translate = new Zend_Translate(array('adapter'=>'CLOUD_Translate_Adapter_Gettext',
                'content'=>$language_path,
                'scan' => Zend_Translate::LOCALE_DIRECTORY,
                'locale'=>$locale->toString()));
$translate->setOptions(array('disableNotices'=>true));

to:


$translate = new Zend_Translate(array('adapter'=>'CLOUD_Translate_Adapter_Gettext',
                'content'=>$language_path,
                'scan' => Zend_Translate::LOCALE_DIRECTORY,
                'locale'=>$locale->toString(),
                'disableNotices'=>true));

The difference is that disableNotices is included in the instantiation, so that as it initializes the translation object, detected errors are not reported.

Since the default language for the page is en-US, there is no need for translation.

Mozilla/4.0 (compatible;)

This user agent was in the middle of many page requests in my Apache logs, requesting content referenced by link tags in the head section.

After a bit of research on one of the link tag URLs, I ran this script:

IPS=`grep Author access_log | cut -f 1 -d ' '  | sort | uniq`
for IP in $IPS
do
        echo Testing "$IP"
        host "$IP"  
done

In almost every case, the requests came from large organizations - corporations, government agencies, and the military.

These institutions often use proxy servers, and Mozilla/4.0 (compatible;) must be a common user agent setting for the proxy server requests.

In the one case where it wasn’t a large organization, it was a blacklisted IP, and the user agent was Java.

The sample set was limited, but the pattern was clear.

Serializing Data to Pass between Perl and PHP

The objective of this task was to determine if data serialized by PHP could be decoded by Perl.

The first step was to create some serialized data.

In this case, the data is being used to define an interface. An associative array was used, with the first element serving to identify the type of data, and the second to contain the details of the interface. The details is an associative array where each element includes a validation string, label, help or error string, default value, and entered value. This could be extended to include i18n and l10n information, as well as a wide variety of other data.

The PHP code serializes the array, echos it, and then does a var_dump.

<?php
$aData=array(
'type'=>'Magic',
'details'=>array(
'url'=>array(
        'validation'=>'/^[\.\w\-]{1,255}$/',
        'label'=>'URL',
        'help'=>'Valid URL is letters, digits, dashes, periods',
        'default'=>'http://default.com',
        'value'=>'http://domain.com'),
'authid'=>array(
        'validation'=>'/^[\.\w\-]{1,255}$/',
        'label'=>'AuthId',
        'help'=>'Valid Id is letters, digits, dashes, periods',
        'default'=>'',
        'value'=>'')
));
$sSerialized=serialize($aData);
echo $sSerialized.PHP_EOL;
var_dump(unserialize($sSerialized));
echo PHP_EOL;

I took the serialized data echoed by PHP and pasted it into a Perl script.

It uses the PHP::Serialization module to unserialize the data. The code posted here is based on http://www.ohmpie.com/serialization, although this is a more limited example, the ohmpie.com page offers serveral differ serialization approaches.

The printAll method prints all the attributes and values for the class. Note that the values can be reached directly through the object.


#!/usr/bin/perl
# Thanks to: http://www.ohmpie.com/serialization/
use strict;
use PHP::Serialization;
use TestClass;
my $encoded='a:2:{s:4:"type";s:8:"Magic";s:7:"details";a:2:{s:3:"url";a:5:{s:10:"validation";s:19:"/^[\.\w\-]{1,255}$/";s:5:"label";s:3:"URL";s:4:"help";s:45:"Valid URL is letters, digits, dashes, periods";s:7:"default";s:18:"http://default.com";s:5:"value";s:17:"http://domain.com";}s:6:"authid";a:5:{s:10:"validation";s:19:"/^[\.\w\-]{1,255}$/";s:5:"label";s:6:"AuthId";s:4:"help";s:44:"Valid Id is letters, digits, dashes, periods";s:7:"default";s:0:"";s:5:"value";s:0:"";}}}';
my $data = PHP::Serialization::unserialize($encoded);
bless($data,'TestClass');
$data->printAll;

print "URL: ".$data->{'details'}->{'url'}->{'value'}."\n";

print "\n";

This is the TestClass package or module. It only includes the top two elements, type and details, PHP::serialize populates the object with the unserialize call.


#!/usr/bin/perl
# Thanks to: http://www.ohmpie.com/serialization/
#       http://www.perlhowto.com/iterate_through_a_hash
#       http://perl.about.com/od/packagesmodules/qt/perlcpan.htm
#       http://search.cpan.org/~bobtfish/PHP-Serialization-0.34/lib/PHP/Serialization.pm
package TestClass;
use strict; 

#The Constructor
sub new {

        my $obj = {
                type => undef,
                details => undef };
        bless($obj);

        return $obj;
}

sub printAll {
        my $key=undef;
        my %hash=undef;
        my $innerkey=undef;
        my %innerhash=undef;
        my $self=shift;
        my $value=undef;
        my $innervalue=undef;
        print "Type: " .
        $self->{'type'}."\n";
        %hash=%{$self->{'details'}};
        while (($key,$value) = each %hash )
        {
                print "key: $key\n";
                %innerhash = %{$value};
                while (($innerkey,$innervalue) = each %innerhash )
                {
                        print "\t$innerkey: $innervalue\n";
                }
        }
        print "\n";
}

1;

This approach allows data to be stored serialized in a database and read and updated by either Perl or PHP. The structure of the data can change, but the database schema would remain the same.

HTTP Blacklist - Http:BL PHP Code - Generic

This is a generic PHP script that can be used with Http:BL. Http:BL can be used to block requests to a web site based on the IP address. There are several configuration settings that allow you to adjust the performance. In the code below, any IP address identified as suspicious by Project Honey Pot, active within the past 30 days, or with a threat score 100 or greater is blocked.

The easiest way to use it is to include it into the top level of the application, for example:

require_once 'bl.php';

This code just logs the requests and the scores. Once you’re comfortable with it, you can use it to redirect unwanted visitors to a 403 page, or down the rabbit hole.


<?php
/*
abcdefghijkl.2.1.9.127.dnsbl.httpbl.org

Response:
Octet 1: 127 or indicates error
Octet 2: # of days since last activity
Octet 3: Threat score (0=No threat, 255=Extreme threat)
Octet 4: Visitor type
*/

define ('httpBL_API_key','!-- YOUR KEY HERE --!');
define ('httpBL_URL','dnsbl.httpbl.org');
 
/* These are the settings which control which visitors are blocked */
define ('DAYS_SINCE_LAST_ACTIVITY',30);  /* Active within this many days prior will be blocked */
define ('MAX_THREAT_SCORE',100);         /* Anything over this threat score will be blocked */
define ('MAX_TYPE_VALUE',1);             /* Type of visitor - this isn't really bitmapped */
define ('VISITOR_MAP',3);
 
$aOctetMap=array(
'127'=>0,
'DAYS_SINCE_LAST_ACTIVITY'=>1,
'MAX_THREAT_SCORE'=>2,
'VISITOR_MAP'=>3
);
 
$aVisitorType=array(
0=>'Search Engine',
1=>'Suspicious',
2=>'Harvester',
4=>'Comment Spammer',
8=>'[Reserved for Future Use]',
16=>'[Reserved for Future Use]',
32=>'[Reserved for Future Use]',
64=>'[Reserved for Future Use]',
128=>'[Reserved for Future Use]'
);
         
$aSearchEngineSerials=array(
0=>'Undocumented',
1=>'AltaVista',
2=>'Ask',
3=>'Baidu',
4=>'Excite',
5=>'Google',
6=>'Looksmart',
7=>'Lycos',
8=>'MSN',
9=>'Yahoo',
10=>'Cuil',
11=>'InfoSeek',
12=>'Miscellaneous'
);
$sBL=httpBL($_SERVER['REMOTE_ADDR']);
if ($sBL!==null) 
        /* Write out the information to a text file so you can see what is happening */
        file_put_contents('output.txt',$_SERVER['REMOTE_ADDR'].' '.$sBL.PHP_EOL,FILE_APPEND);
        /* Once you are comfortable with your code and settings, you can redirect unwanted visitors elsewhere */
         
function httpBL($sIP)
{
        global $aOctetMap;

        $sOctets=implode('.',array_reverse(explode('.',$sIP)));
        $sURL=httpBL_API_key.'.'.$sOctets.'.'.httpBL_URL;
        $aResult=dns_get_record($sURL,DNS_A);
        if (isset($aResult[0]) && isset($aResult[0]['ip']))
        {
                $aResultOctet=explode('.',$sResult=$aResult[0]['ip']);
                if ((int)$aResultOctet[$aOctetMap['VISITOR_MAP']]<MAX_TYPE_VALUE) return null;
                if ((int)$aResultOctet[$aOctetMap['MAX_THREAT_SCORE']]>=MAX_THREAT_SCORE) return $sResult;
                if ((int)$aResultOctet[$aOctetMap['DAYS_SINCE_LAST_ACTIVITY']]<=DAYS_SINCE_LAST_ACTIVITY) return $sResult;
        }
        return null;
}

The advantage of this approach is that after an IP address has been cleared or cleaned up, access is restored without admin action, so blocked addresses aren’t blocked forever, only for a month or so while they are potentially harmful. The .htaccess Allow,Deny configuration can also be used, but it must be manually maintained, by checking the stats frequently and determining the owner and extent of the IP address block.