Mysql and Top pruner

The server had a setup in which a cronjob was used to dump output of top command to a directory accessible via http once every 5 minutes. To get rid of the logs and also to prune mysql logs the following script was made.

#! /bin/sh


#find and delete all files
#in logger path older than n days

find $LOGGER_PATH -mtime +$N_DAY -exec rm -f {} ;

#mysql pruner section.
#flush logs first,
#find the last filename
#and purge binary logs upto that

#mysql flush logs here

FILENAMES=`ls $MYSQL_DATA_DIR | grep -e "^mysql-bin.0+[[:digit:]]+$"|sort|tail -n $LOGSTOKEEP`

#linecount will be lesser than or equal to the number of logs we want to keep

if [ $LINECOUNT -lt $LOGSTOKEEP ]; then
echo "$LOGSTOKEEP files were not found. So no purge is being done."
LASTFILENAME=`echo $FILENAMES|cut -d' ' -f 1`
#do purge upto lastfilename

Python script to get links from yahoo search

This was a quick script I made to pull links from yahoo search using the boss search api, and then list the unique domains.

If you want the entire links, just modify so that the whole links are appended to the list. Yahoo does not allow to get all the results, but only a certain predefined number so this code only extracts about 800 domains. But it is still good enough for a start and for most uses.

I am also working on getting citation values for google scholar for a friend. I will post that soon here. Heres the code for now.

#! /usr/bin/python
import urllib,json
from urlparse import urlparse

#print yahoo_application_id

#print ""+yahoo_application_id+"&format=xml"
	print "trying result from " + str(nextresult)
	f = urllib.urlopen(""+yahoo_application_id+"&format=json&count=100&start="+str(nextresult))
	ssjson= ss.decode(
	print totalhits
	for x in ssjson["ysearchresponse"]["resultset_web"]:
		url= x["url"]
		o = urlparse(url)
		link = o[0]+"://"+o[1]
		if link not in links:
	if (nextresult>10000):
print "Obtained results: " + str(nextresult) + " of which " + str(len(links)) + " were unique."
for x in links:
	print x

Cool huh? If you want any help modifying this, drop me a line.

Python script to get addresses from google maps

A simple python script to get addresses of businesses in a city. Just a quick demo for a client I wrote in an hour.

import urllib;
def getdata(idstr, matchstr):
	return line
location="&q=" + urllib.quote("airport loc: New Delhi, India") + "&btnG=" + urllib.quote("Search Maps")

fp=urllib.urlopen(url + "?" + data + location)
for line in fp.readlines():
	filecontents=filecontents + line
while morecontent==True:
	startpos=filecontents.find("id:", startloc)
	if startpos>-1:
		endpos=filecontents.find("}}}", startpos)
		if endpos>-1:
			sxti=getdata(section, "sxti:"")
			sxsn=getdata(section, "sxsn:"")
			sxst=getdata(section, "sxst:"")
			sxpr=getdata(section, "sxpr:"")
			sxpo=getdata(section, "sxpo:"")
			sxph=getdata(section, "sxph:"")
			actual_url=getdata(section, "actual_url:"")
			print sxti + ", " + sxsn + ", " + sxst + ", " + sxpr + ", " + sxpo + ", " + sxph + ", " + actual_url

PHP and XML RPC – Searching for values

Many servers provide xmlrpc interfaces which allow other web applications to call and execute functions. It is actually quite simple once you get the hang of it. The servers also return the response and any variables as xmlrpc response messages which are xml responses basically

There are probably a lot of ways to get this done, but this actually turned out pretty well using a combination of the phpxmlrpc library and the Domdoc class in php.

Let us say that the server url is at port 4567.
In order to make the call you first have to download the phpxmlrpc library from and extract it to a folder in your web directory root.

In the beginning you may want to turn the debugging feature of the rpc client on. To do uncomment the setDebug line in the code.



//ini_set("display_errors", 1);


//You do not need to set the transport as https here.

$xmlrpc_client=new xmlrpc_client('admin/admin', '',4567);


//An xmlrpc call without any parameters is below.

$xmlrpc_msg=new xmlrpcmsg('rpcFunctionName');

//the next one is an xmlrpc call with parameters.

//$xmlrpc_msg=new xmlrpcmsg('rpcFunctionName', array(new xmlrpcval(7, "int"),new xmlrpcval("", "string"),new xmlrpcval(2, "int")));

//Here is where you set the transport as https. Look at the manual for more options

$xmlrpc_resp=$xmlrpc_client->send($xmlrpc_msg, 200, 'https');

if ($xmlrpc_resp==False)


print "No response";

die ('Error');


if (!$xmlrpc_resp->faultCode())


//If you just want the xml, print out the return value from serialize()

//print $xmlrpc_resp->serialize() ;

$dom=new DomDocument();


//examine the xml to find out the path to the actual things you need.

$xpathString = "//methodResponse/params/param/value/struct/member[name='a_value']/value/string";

$xp=new DOMXPath($dom);

$domNodeList = $xp->query($xpathString);

foreach($domNodeList as $domNode){

//You may need to do more xpath queries as $xp->query($anotherXpathstr, $domNode). search will be done under $domNode

$server_name=$domNode->nodeValue ;





print "Error: " . $xmlrpc_resp->faultString();



Also do look up the documentation for xmlrpc_client property return_type.

5 open source software to use daily

Open source is here to stay and is a major contender with closed source and proprietary software. Open Source software is now quite easy to use so if you are considering starting out in using open source because you have heard a lot about it, you might be wondering what all software is available for you to use to get your routine jobs done.

Below are 5 software applications that most users use for more than 2 hours day.

Ubuntu and Fedora

The most important piece of software on any machine, these Operating systems are the best. Ubuntu is one of the most user friendly Linux Distribution with a huge number of contributors and the largest user base. Fedora is just right for a simple server which you do not have to house sit too much. Fedora can also be quite easily used as a desktop OS. I run ubuntu on my home and work systems and fedora on my test servers.


The browser is the software that is turned on first after my OS boots up and is the last software to be closed before being shut down. Further goes to reinforce how network oriented, the computer world now is. The possibility to install addons endeared it to me and millions of others, and the way it is implemented is much better than Opera’s widgets and much safer than IE’s activex controls. The sheer number of addons available is mindboggling. The theme of the browser can be changed too, but I don’t think many people actually use that feature anymore. My most popular addons are adblock, twitterfox and firebug.

Open Office

My boss needs pretty reports and prettier statistics and charts. Open Office proves quite competent for the job. You have Writer for reports and text documents, Calc for spreadsheets and Impress for presentations. There is also Math and Draw which come in useful at times. It does have performance issues and could do with an appearance makeover but since this project is much more complicated than the others mentioned in this page, it can be forgiven.

Evolution (Kontact for KDE)

Both of these are Mail clients with bundled contact managers, calendars and to-do lists. They both have desktop integration with the clock in the system tray in both GNOME and KDE in most major distributions. Support for IMAP, POP, SMTP, Exchange servers are all available. They should usually be available in your distibution’s official repositories if you are using linux. For windows, you could have a look of thunderbird, though I find it unnecessarily complex for my simple needs.

XChat & Pidgin

These are chat clients, the former exclusively for irc and the latter is a multiprotocol chat client which I use primarily for for googletalk and yahoochat. Pidgin does handle irc too, but I like xchat for irc better.

GEdit (Kate for KDE, Vim for ssh access)

When you gotta code, you gotta code. These no-nonsense text editors are perfect for writing up the code before you run your favorite compiler on it. Syntax highlighting for different languages is available and Vim support autocomplete for some languages with plugins. But I like the GUI better for the simpler multi-tab facilities which makes it similar to modern day browsers. I still haven’t gotten used to the tabs in nautilus though.

Other software that I use frequenty are Banshee for music library management and playback, GIMP for image editing, Filezilla for ftp access, VLC Media Player playing video files and Tomboy notes for taking quick notes while on the phone. There are probably hundreds of alternatives for these software in the Open source world and the more you explore, the more treasures you will find. These however are the ones you will find most commonly installed out of the box on major distros. Have fun and let me know what you think about this post via the comments .

How to buy a new computer

TFT Monitors aren't very costlier than CRTs, look cooler and save on your power bills

TFT Monitors aren't very costlier than CRTs, look cooler and save on your power bills

I am often asked by friends or friends of friends to help them get a new computer for their office. Since I know a lot of people in the NGO sector who are highly budget conscious, they all prefer to get systems that not only help them get the job done but would also like to keep maintenance costs to a minimum.

Here are a few tips on buying a new computer either for the office or for yourself or as a gift to someone you know. There are a lot of sites giving you the technical information so I am not going to be covering the specifications and prices. You can visit or itwares for latest prices in India. Some excellent forums where the members are helpful and provide excellent advice are thinkdigit forums, suggestafix forums and techspot forums. All these are Indian so you can even get the best configuration for your budget. You will have to register at these sites before you can post.

Before you decide on a system to purchase, here is what you should ideally do.

Decide on the purpose of the computer system

Your needs define the configuration of the system you are going to buy. If you need a system for gaming, you probably need lots of RAM, a good graphics card and probably the latest processors. Same for a design rig.

If you plan to watch a lot of movies and want a home entertainment machine, you will need lots of storage. If all you plan to do is draft email, write up reports and play the occasional game, a modest configuration will do quite well. Making a list of activities you plan to do on a system will help you avoid purchasing a costly system when you have no need for it, or rather, using the money saved to buy you some accessories and/or cool gadgets.

Decide on your budget and stay within it

This is the second thing you need to do. Or it could be the first. If you aren’t clear about how much you are willing to spend, you are going to spend more than you can afford and will then have to cut back on other expenses like getting a good branded UPS or external hard drive or whatever. Decide on a price range and when you start looking, look for a computer that matches your lower limit. You ARE going to get suggestions to go for a better system and before you know it, reach the top limit. Stop there. The lastest specifications aren’t going to be the latest for more than a month, so don’t bother getting more than your needs.

Decide what all accessories you need

While budgeting, make a list of what all you need to buy. If you have frequent power cuts, get a UPS. If you need to have data portable, get an External Hard Drive. Need video chatting, get a webcam and a microphone/headset. Plan for speakers. Plan for your Internet connection. Do you need a printer? How many prints are you gonna need each month? Less than 100, get a inkjet printer. If more than hundred or if you can afford a laser printer, it has less running costs.

Get prices from various brands and compare with an assembled machine

Visit the websites of popular brands like HP, Lenovo, Dell and have a look at the available configuration and their prices. Get to your local assembler and ask for a similar configuration. Check the quality of the components, brands etc. Compare the prices. Sometimes branded machines come cheaper. Visit showrooms of the brands and see if there are some sales or old stock being pushed off at discount prices.

Read the fine print

While budgeting, don’t forget the taxes. It is 4% in Delhi for computers, may be more in your case. Especially while purchasing for your office, include it. What about installation charges, does it cost extra? Is the operating system included? FREEDOS is pretty useless as an OS if you did not already know. Are other software like Office Suites bundled?

Check the warranties

Prefer systems with the longest warranties, preferably onsite. If the warranty is only for one year, ask how much an extension would cost. IBM provides MA packs through its partners that cost about 3000 for a Thinkcentre per year. Extending a warranty while buying would save you some money.

Decide the software needed

And finally decide on what software you are planning to use. Does the system have bundled software. Is it licensed? Are they full versions or trial versions? Avoid using pirated software. Consider using linux distributions which are user friendly, like Ubuntu.

So these are a simple set of guidelines which I hope will help you. If you have any more queries or suggestion, do leave a comment.

Backing up all discussions in a facebook group with perl

The Facebook group named “A Consortium of Pub-Going, Loose and Forward Women” has been hacked more than 6 times in the last one week itself. You must have heard of, the’re the pinkchaddi girls. :p

This script is set as a cronjob on my computer to back up the group each hour (for this group with about 147 discussions, it takes about 20 minutes, but then, my ISP sucks). By changing the first url in the script, it should work properly on any group. The script takes the first discussion page of the group, then takes each discussion, compiles all the posts one after the other and dumps them to text files in a folder, then zips them and emails them to a specified email id.

To email, you need access to a mail provider who gives you smtp access. The script can authorize itself.

You need these dependencies for it to work.

libarchive-zip-perl (Open SSL)

Just search for them with your favorite package manager or use cpan to install them. For my first perl script, I am pretty happy with it. :) You can take this script and use it for yourself, and make any modifications you want. If you do make any improvements, consider posting it back in the comments, so I could use it too.

And oh, keep in mind that facebook officially isn’t too happy with you taking the content off their site through scripts.

#! /usr/bin/perl
# The following dependencies are required
# libarchive-zip-perl
# (Open SSL)
# IO-Socket-SSL
# Authen-SASL
# Net::SMTP::Multipart; had to install this from cpan on ubuntu

require LWP::UserAgent;
use LWP::Simple;
#customize here ################################
#the gmx mail service is used for sending mail.
$backuprecepient = ‘’; #to whom the email should be sent to
$firstlink=’′; #the script expects the link to the first page of the discussions. Login *must* not be required.

$ua=new LWP::UserAgent;
# we have two lists, one is the list with links that need to be navigated to, say ‘tonav’.
# we get the page, then add the page link to the one navigated, say ‘hasnav’.
# on each navigation, we get all pageno links and check if they exist in ‘tonav’ or ‘hasnav’, if not add those to ‘tonav’. And add current page to ‘hasnav’.
# the loop continues as long as there are links in tonav.
while (scalar(@tonav)>0){
push(@hasnav, $discussionlistlink);
$request=new HTTP::Request(‘GET’, $discussionlistlink);
$maincontent=~ m/<li class=”current”>(.*?)</li></ul></div></div>/;
$pagenossource= $1;
#facebook wants to fuck up my script. :( ugly hack. must be a better way. firefox seems to interpret it properly though.
$pagenossource=~ s/amp;//gi;
@links=$pagenossource=~ m/<a href=”(.*?)”>d/gis;
#we find the further discussion list pages links here
for $link (@links) {
for $tonavitem (@tonav)
if ($link eq $tonavitem) { $disclinkexists=1; }
for $hasnavitem (@hasnav)
if ($link eq $hasnavitem) { $disclinkexists=1;}
if ($disclinkexists==0) { push(@tonav,$link);     }
#we get the actual discussion links here
@links=$maincontent=~ m/<h2 class=”topic_title datawrap”><a href=”(.*?)”>/gis;
for $link (@links)
for $discitem (@discussionlinks)
if ($link eq $discitem) { $disclinkexists=1; }
if ($disclinkexists==0)
$linkreg=~ s/amp;//gi;
push(@discussionlinks, $linkreg);
@timeData = localtime(time);
$directoryname=”fbbackup”.join(”, @timeData);
mkdir “$directoryname”, 0770 unless -d “$directoryname”;
for $discussionlink (@discussionlinks)
#for each link, go the discussion page, follow each page and find the page no. links. Finally discussionpages will have the list of discussion pages
$topic = “”;
while (scalar(@tonav)>0){
push(@hasnav, $discussionpage);
$request=new HTTP::Request(‘GET’, $discussionpage);
if ($topic eq “”)
$maincontent=~ m/<h2>Topic: <span>(.*?)</span>/gis;
$topicwrapper=~ m/>(.*?)</gis;
$maincontent=~ m/<div class=”pagerpro_container”><ul class=”pagerpro”>(.*?)</div></div>/gis;
$pagenossource= $1;
#facebook wants to fuck up my script. :( ugly hack. must be a better way. firefox seems to interpret it properly though.
$pagenossource=~ s/amp;//gi;
@links=$pagenossource=~ m/<a href=”(.*?)” onclick/gis;
#we find the further discussion list pages links here
for $link (@links) {
for $tonavitem (@tonav)
if ($link eq $tonavitem) { $disclinkexists=1; }
for $hasnavitem (@hasnav)
if ($link eq $hasnavitem) { $disclinkexists=1;}
if ($disclinkexists==0) { push(@tonav,$link); }
if ($topic)
(my $filename = $topic)=~ tr/a-zA-z0-9/_/cs;
$activity=”=========================================n$discussioncount of “. scalar(@discussionlinks) . “nTopic: “.$topic.”n”;
print $activity;
if (scalar(@hasnav)>1) { pop(@hasnav); }
$activity= scalar(@hasnav) . ” pages in discussion.n”;
print $activity;
for $hasnavitem (@hasnav)
$activity= $hasnavitem.” initiated.”;
print $activity;
$request2=new HTTP::Request(‘GET’, $hasnavitem);
$discussioncontent=~ m/<div id=”all_threads”>(.*?)</div></div></div><div class=”UIWashFrame_SidebarAds”>/gis;
@posts=$discussionpageallthreads =~ m/<div class=”post_index”>(.*?)<ul class=”actionspro”>/gis;
for $post (@posts)
$postreg=~ m/Post #(.*?)</gis;
$postreg=~ m/<span class=”author_header”><strong>(.*?)</strong>/gis;
$postreg=~ m/timestamp”>(.*?)</span>/gis;
$postreg=~ m/<div class=”post_message”>(.*?)</div>/gis;
$postmessage=~ s/<br />/n/gi;
$postmessage=~ s/<.*?>//gi;
$postmessage=~ s/  */ /gi;
$filecontent=$filecontent .”Post #” . $postindex . ” by ” . $author . ” (” . $timestamp . “)n”;
$filecontent=$filecontent . $postmessage . “n”;
$filecontent=$filecontent . “————————————————————————-n”;
$activity = “…completedn”;
print $activity;
$filecontent=”Topic: ” . $topic . “n” . $postcount . ” posts in discussion n====================================n” . $filecontent;
open (DISCNPAGE, “>$directoryname/$filename”);
print DISCNPAGE $filecontent;

# Create a Zip file

use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
my $zip = Archive::Zip->new();

# Add a directory
my $dir_member = $zip->addTree( “$directoryname/”,”$directoryname/”  );

# Save the Zip file
unless ( $zip->writeToFileNamed(“$directoryname”.”.zip”) == AZ_OK ) {
die “unable to save the zip file the files are probably backed up in directory $directoryname”;
#delete the backup folder
use File::Path;
$delA = “$directoryname”;

use Net::SMTP::Multipart;
my $to = $backuprecepient;
my $subject = “Backup $directoryname”;
my $body = “Backup zip file attached.n ————————-nn$activitylog”;

my $from = $smtpuser;
my $password = $smtppassword;
my $smtp;

if (not $smtp = Net::SMTP::Multipart->new(‘’,
Port => 25,
Debug => 0)) {
die “Could not connect to servern”;

$smtp->auth($from, $password)
|| die “Authentication failed!n”;
$smtp->Header(To=>$to, Subj=>$subject, From=>”$from”);


Aha! just found that there is a radio button ‘show HTML literally’ below the blogger edit box. Now I can post my code directly into blogger. :)

Printing a sheet to pdf silently in Excel VBA

This piece of code uses Adobe Acrobat to silently print a sheet to the location you specify.
Change PDFPath and strOutFile to modify.

Dim strDefaultPrinter As String, strOutFile As String
Dim lngRegResult As Long, lngResult As Long
Dim dhcHKeyCurrentUser As Long
Dim PDFPath As String
dhcHKeyCurrentUser = &H80000001
strDefaultPrinter = Application.ActivePrinter
PDFPath = ThisWorkbook.Path & Application.PathSeparator ‘The directory in which you want to save the file
strOutFile = PDFPath & “sheet1.pdf” ‘Change the pdf file name if required. This should have the fully qualified path

lngRegResult = RegOpenKeyA(dhcHKeyCurrentUser, “SoftwareAdobeAcrobat DistillerPrinterJobControl”, lngResult)
lngRegResult = RegSetValueEx(lngResult, Application.Path & “excel.exe”, 0&, dhcRegSz, ByVal strOutFile, Len(strOutFile))
lngRegResult = RegCloseKey(lngResult)
ThisWorkbook.ActiveSheet.PrintOut copies:=1, ActivePrinter:=”Adobe PDF”

Using a passwords file with Excel VBA

With Excel VBA, to loop through a text file containing usernames and passwords use the following code. It is probably a good idea to store encrypted passwords in the passwords file.

Dim FileNum as Integer
Dim UserName as String, UserPassword as String
   If Dir(ThisWorkbook.Path & Application.PathSeparator & “passwords.txt”) = “” Then
        ‘password file does not exist. Exit with Error
        Msgbox “Password File Does not exist”
       Open ThisWorkbook.Path & Application.PathSeparator & “passwords.txt” For Input As FileNum
       ‘loop through passwords in the file till you find the one matching the client
        Dim FoundPassword As Boolean
        Dim UNameIter As String, PasswordIter As String ‘This is the values input by the user
        LoginOK = False
        While Not EOF(FileNum)
            Input #FileNum, UNameIter, PasswordIter
            If UNameIter = UserName Then
                If PasswordIter=UserPassword Then
                    ‘Login OK
                    LoginOK= True
                End If
            End If
        If LoginOK= False Then
            ‘login not ok
            Msgbox “User ID / Password do not match
            Msgbox “User ID and password are correct”
        End If
    End If

Recovering Files from Damaged Camera Card

There was an important event this week at the workplace and we had called in a photographer to take some snaps for the press. After the event, to his horror, he finds that his card won’t open on any computer. Pretty scary huh? He seems to have used a pretty old card and though the card still showed all the images in his Nikon camera, it wouldn’t mount in any card reader. Directly connecting the camera with the cable didn’t work either. A dmesg run on ubuntu showed lots of IO errors, which seemed to be physical damage on the storage area. Since the errors were clustered near each other I was sure that the entire card was not a write-off. Well the event was quite important so I had to find a way to get the images out of the card.

A bit of googling got me lucky. Zero Assumption Software has a product called Zero Access Recovery (Download Link) which, though it is a trial version, has the photo recovery part as a fully functional component in the trial version. So technically the photo recovery software is freeware. The software first asks you to select the card, (I put the card in the card reader and though windows couldn’t mount it, it kept running autorun repeatedly some reason), scans it for damaged areas (this takes ages…) and then lists all files which can be recovered. Be prepared for a long wait though. It took me 1 1/2 hours for the 512 mb card. The software even lets you make a disk image of the card, but I couldn’t figure out a way to use the image.

More luck and I was able to recover most of the photos losing only three of them. And the photographer was actually quite impressed with my ‘technical prowess’ as he called it. Well, that is one satisfied user. :)