Rahul's Tech Blog

Growth Hacking, Just Hacking or No hacking !!!

Day2day PHP: URL Scout, Get notified when webpage changes.

I found this old drafted blog entry which I wrote in college, it never made to front page of my blog.
Reminds me of my awesome college days :)
Library was completed, views files were not, and its still missing. You wont find a full solution here, but if you could understand the library I don’t think it would be difficult to write view files.

Introduction

RSS Feeds have made our life easy. You can watch for any changes in a website by subscribing to its RSS feed but what would you do if you need to watch a static page for changes, which does not provide RSS Feeds?

I was facing this problem long back. When I was doing B.Tech. I needed to watch some important government websites (which are old, static and mostly dirty) for changes. I decided to write a program to automate this task. I choose PHP because its a server side language and my servers (which were shared hosting server at that time) were much more powerful than my PC and most importantly server was always connected to internet while we had some issues with connectivity during those days

What will I learn from this tutorial?

Nothing, if you copy paste :)

Yah, and smiley doesn’t mean I am not serious.

This tutorial is not about learning complex functions of PHP but using simple functions to complete your day2day complex requirements. In this tutorial you will learn downloading remote files, hashing, mailing and using xml to store data.

You might be wondering why use XML to store data, why not mySql. Its because we wont be storing a lot of data so XML is a better choice. It will also make our application (relatively more) portable i.e. When you try to install it on your server, you wont have to go through the hassle of creating a database, tables, users, password. Just unzip the downloaded file and you are game.

Concept behind the application

A List of URLs will be kept. Every time the script is executed it will download the page and calculate its hash. Hash will then be compared with previously stored hash. If any change is found new hash value will be replaced and user will be notified by email or you can SMS it too.

Preparation

We will be needing an XML file to store url, its hash and related data like title and comment. Then a include (I call it system) file where CLASS will be stored. And 3 pages one for cron job (checking and notifying), one for adding a new url and one HTML form.

Codes

Instead of just writing everything in a simple function (which can be done in relatively less time) I will write a class, so that it can be distributed as a package and customization can be done easily by modifying some constants.

Code is self explanatory. Inline comments makes it easier to understand.

class URL_OBSERVER {
private static $xml;                                    //      An object of SimpleXMLElement
private static $mailData;                               //      Changed data are stored here that will be mailed
const EMAIL = "yourmail@gmail.com";             //      Email address to which mail is sent
const PASSWORD = "yourpassword";                //      Password for the interface so that only you can use it.
//      This function intialize URL_OBSERVER::$xml variable.
public static function initialize() {
        // If $xml is not initialize then initialize it
        if(URL_OBSERVER::$xml == NULL) {
                //      Read data from XML file (returns false in case of error)
                $xmlData = file_get_contents(URL_OBSERVER::FILE);
                //      Throw an error in case error occures
                if($xmlData === false) {
                        throw new Exception("invalid url");
                }
                //      Initialize $xml variable
                URL_OBSERVER::$xml = new SimpleXMLElement($xmlData);
        }
}
//      This function downloads each url stored in XML file calculate its hash
//      and compare it with previously stored hash. If there is a change it replaces the hash
//      It also add data to $mailData variable incase there is a change
public static function updateHash() {
        URL_OBSERVER::initialize();
        URL_OBSERVER::$mailData = NULL;         // Set the MailData to null
        //      Loop through each Element
        foreach(URL_OBSERVER::$xml->element as $object) {
                /*
                 * Calculate hash of the URL
                 * Check if has is same then OK, else replace the Hash
                 * Add the ELEMENT to the Mail Array
                 */

                // Decode URL, Download url, then calculate its hash
                $currentHash = md5(file_get_contents(urldecode($object->url[0])));
                //      Load previous hash
                $savedHash = $object->hash;
                // Check if the hash has changed
                if($currentHash != $savedHash) {
                        //      Store changed element in mailData so that it can be notified to user
                        URL_OBSERVER::$mailData[] = $object;
                        // Save changed hash
                        $object->hash = $currentHash;
                }
        }
        //      If there is a change (mailData is not NULL) then write the changed value to file
        if(URL_OBSERVER::$mailData != NULL) {
                //      Write the changed value to file
                //      asXML function return XML data when nothing is supplied as argument,
                //        else it writes the XML data to the file (arfument)
                URL_OBSERVER::$xml->asXML(URL_OBSERVER::FILE);
        }
}
//      This function calls updateHash function and if there is a change
//      that is, if mailData is not NULL. It sends notification to the email
public static function updateAndSendNotification() {
        URL_OBSERVER::updateHash();
        // Check is there is a change after updating
        if(URL_OBSERVER::$mailData != NULL) {
                // Premessage
                       $messageBody = "<table>";
                       $messageBody .= "<tbody><tr>";
                       $messageBody .= "<td>Title</td>";
                       $messageBody .= "<td>Link</td>";
                       $messageBody .= "<td>Comment</td>";
                       $messageBody .= "</tr>";
                       $count = 0;
                       foreach (URL_OBSERVER::$mailData as $key=&amp;gt;$object) {
                       $messageBody .= "<tr>";
                       // Decode the title before sending
                       $title = base64_decode($object[‘title’]);
                       //    Decode the url before sending
                       $url = urldecode($object-&amp;gt;url);
                       $messageBody .= "<td>{$title}</td>";
                       $messageBody .= "<td>{$url}</td>";
                       $messageBody .= "<td>{$object-&amp;gt;comment}</td>";
                       $messageBody .= "</tr>";
                       $count++;
                       }
                       $messageBody .= "</tbody></table>";
                // Set header to display HTML instead of just plain/text
                $headers  = ‘MIME-Version: 1.0′ . "\r\n";
                $headers .= ‘Content-type: text/html; charset=iso-8859-1′ . "\r\n";
                // Send mail
                mail(URL_OBSERVER::EMAIL, "Notification($count) from url observer", $messageBody, $headers);
        }
}
//      This function adds URL, TITLE, comment and HASH to the XML file
public static function addUrl($arg_title, $arg_url, $arg_comment = NULL) {
        //      Title and url cant be null
        if($arg_title == NULL || $arg_url == NULL) {
                throw new Exception("Title and URL must not be NULL");
        }
        // URL must not be already present
        URL_OBSERVER::initialize();
        foreach(URL_OBSERVER::$xml->element as $obj) {
                if(urldecode($obj->url[0]) == $arg_url) {
                        throw new Exception("Url already present");
                }
        }
        //      Fetch data from url If false then throw error
        $content = file_get_contents(urldecode($arg_url));
        if($content === false) {
                throw new Exception("Error opening URL");
        }
        //      Calculate Hash
        $hash = md5($content);
        //      Add to XML file
        //      Create a child, element
        $element = URL_OBSERVER::$xml->addChild(‘element’);
        //      Add attribute to element
        $element->addAttribute(‘title’, base64_encode($arg_title));
        //      Add child (URL, HASH and COMMENT) to Element that we just created
        $element->addChild(‘url’, urlencode($arg_url));
        $element->addChild(‘hash’, $hash);
        $element->addChild(‘comment’, $arg_comment);
        //      Save the XML to file
        URL_OBSERVER::$xml->asXML(URL_OBSERVER::FILE);
    }
}

Conclusion

I am very bad at it, you conclude yourself and let me know.