Let's talk

Call us: +44 (0)1603 878240

SEO Automation: The basics and benefits

13:01 on Fri, 8 Apr 2011 | SEO | 1 Comment

SEO will often lead you to perform repetitive tasks. It's handy, that you will usually be performing these tasks on a computer; computers are the bee's knees at performing such tasks, and with little effort they can be processing mundane tasks while you sit back and watch cat videos (or work on a more interesting search activity).

In this article I will explain some of the ways you can use automation to optimize the way you work and show some basic examples of SEO automation that you can take away and use. This article is not designed for professional programmers but more for seo's looking to be more efficient, so don't be alarmed to see some code further down the page, its really not as hard as it looks.

So the question is, why carry out the same repetitive tasks when it can be done by your server or home pc?

Benefits of automation
It's all about time. It could take 1 hour per day to check a bunch of urls to see if they are live or not, alternatively, it could take 4 hours to research, write and test a simple script/tool which would then only take 5 minutes per day to run. The benefits are obvious.

Time = Money
Automation = More time
More time = More Money!

You could look at it as an investment with an incredible reward which is gained almost instantly. The less time you spend on this, the more you can provide a client in terms of monthly time, effort productivity. It's a win win!

Types of automation
There are many tasks that can be automated which you may have never thought of until now. Ask yourself and your colleagues “What repetitive tasks do we do?” . They don't have to be online tasks, it could simply be changing documents into the correct format or copying files. But before you run off automating everything that you do there are a few things to consider;

• Some websites and webmasters are not great fans of having automated scripts/tools run on their websites/resources, so read the terms and conditions of the sites in question carefully. If they don't like it, don't do it!
• Ask yourself if it is necessary for certain things to be automated, if you are not going to benefit from it a great deal, just remember automation is not always as accurate as the human touch. Sometimes your code doesn't know things that may be good or bad, it just does as it is told regardless of the consequences.
• There are many tools, plugins and modules which make writing and using automation much easier, so do a little research before starting your script.
• Don't get greedy or go insane, if you have no rush for the data, don't get your automation to go wild (for example, if you query google pagerank too many times in quick succession, google will block your ip temporarily or even permanently from querying Pagerank!). Not a fun place to be!

What you will need
There are tools that automate some ridiculously complex tasks but this blog post is aimed at the simple things that, altogether, make up a lot of time. There are many existing scripts/tools that can be found on the internet for automation (most are not free), but here I'm going to explain some basic methods for making your own (give a man a fish etc). You will need a Programming/Scripting language:

 

C++,C#,C
Python
Perl
PHP
Shell

 

For this blog post I will be using Perl, mainly because this is the language I use the most and because it is very good for the sort of tasks I am talking about. Many other languages are suitable especially python because, like perl, there are many Modules which make life much easier. You don't have to be a hardcore coder to understand and implement the ideas below.

We will start with something familiar, pagerank. Let's say you need to check the pagerank of 20 different websites a day, this takes a substantial amount of time to do manually over the course of a month.

 

#!/usr/bin/perl 
use WWW::Google::PageRank; 
my $pr = WWW::Google::PageRank-­>new; 
print scalar($pr-­>get('http://www.yahoo.com/')), " ";

 

This little script uses the the WWW::Google::PageRank module and then simply uses it to print the pagerank of yahoo.com. Obviously running this script over and over on each url by hand would be silly, so now we need to automate this. We have a list of our 20 urls (I will be using 3) in a file called domains.txt and we want our program to go through the list and print to the screen the following information: “PR and URL”.

The commented code looks like this:

#!/usr/bin/perl 

use warnings; 
use strict;
#Load the Google pagerank module 
use WWW::Google::PageRank; 
#Assign the object to the variable $pr ready to use 
my $pr = WWW::Google::PageRank->new; 
#Open the file domains.txt and assigns it to the variable $inputfile 
open my $inputfile, 'domains.txt' or die $!; 
#Start a loop to look at all the lines in the file (domains.txt) 
while (my $url = <$inputfile>)
    { 
    #Print PR 
    print " 
PR "; 
    #Print the result from our pagerank query for the current line 
    print scalar($pr->get($url)); 
    #Print the Url on the current line 
    print " $url"; 
    #Loop ends 
    } 
#File closes 
close($inputfile);

 

The outcome should look like this (I called my script pr.pl):

[Ben@********]$ perl pr.pl
PR 9 http://www.yahoo.com
PR 10 http://www.google.com
PR 8 http://www.bing.com

 

Remember, to run this script you need perl installed, which you can get here; http://strawberryperl.com/ and the perl module WWW::Google::PageRank you can get from here http://search.cpan.org/~ykar/WWW-Google PageRank- 0.16/lib/WWW/Google/PageRank.pm and don't forget the perl script file name must end in .pl and must be executable.

Without wandering into webspider territory I am going to show you one of the most handy modules around for web automation. WWW::Mechanize is a programmatic web browser which supports ssl, cookies and most other things a normal web browser supports. Even though this module has many features and can perform highly complex tasks we will just use it to perform a simple task. 

This script tells us the status code (404, 200 etc) of a list of urls:

#!/usr/bin/perl

use warnings;
use strict;
#Load the Mechanize module
use WWW::Mechanize;
#Assign the object to the variable $mech ready to use
my $mech = WWW::Mechanize->new( autocheck => 0 );
#Open the file domains.txt and assigns it to the variable $inputfile
open my $inputfile, 'domains.txt' or die $!;
#Start a loop to look at all the lines in the file (domains.txt)
while (my $url = <$inputfile>)
    {
    #browse to url
    $mech->get($url);
    #Dont follow the redirects, following will not display 301's
    $mech->max_redirect(0);
    #Make a variable $response that will contain the status code
    my $response = $mech->status();
    #Print the status code and ->
    print "$response -> $url";
    #Loop ends
    }
#File closes
close($inputfile);

The outcome should look like this (I called my script header.pl):
[Ben@********* Desktop]$ perl header.pl
200 ­­> http://www.testurl.com/birds
200 ­­> http://www.testurl.com/cows
400 ­­> http://www.testurl.com/rabbits
200 ­­> http://www.testurl.com/horses
301 ­­> http://www.testurl.com/monkeys

 

I have written this script to be similar to the first so that you can see how easy it is to replicate scripts to perform many other tasks.

It's worth remembering that most of the time these scripts do not have to be very user friendly as they are primarily bespoke tools made for your own use or for people who you have direct contact with people you can help to use the tools. Just because they don't have to be 100% accessible doesn't mean they can be too sloppy. For example, make sure the data you want is output in a clean, readable and accurate way. There's nothing to stop you making them very accessible and stable, but remember that it will usually take more time.

It's important to streamline your services, optimize the way that you do things, so you can carry out more tasks in a day - which will eventually lead you to more profitability and enable you to stay ahead of your competitors with information available to you on demand.
 

Comments & Discussion

(1 Comment)

Post a comment, your email will not be published, nor will it be harvested.

btelis

btelis • Years ago

Many thanks Ben for sharing

Reply