Python script to get links from yahoo search

This was a quick script I made to pull links from yahoo search using the boss search api, and then list the unique domains.

If you want the entire links, just modify so that the whole links are appended to the list. Yahoo does not allow to get all the results, but only a certain predefined number so this code only extracts about 800 domains. But it is still good enough for a start and for most uses.

I am also working on getting citation values for google scholar for a friend. I will post that soon here. Heres the code for now.

#! /usr/bin/python
import urllib,json
from urlparse import urlparse

#print yahoo_application_id

#print ""+yahoo_application_id+"&format=xml"
	print "trying result from " + str(nextresult)
	f = urllib.urlopen(""+yahoo_application_id+"&format=json&count=100&start="+str(nextresult))
	ssjson= ss.decode(
	print totalhits
	for x in ssjson["ysearchresponse"]["resultset_web"]:
		url= x["url"]
		o = urlparse(url)
		link = o[0]+"://"+o[1]
		if link not in links:
	if (nextresult>10000):
print "Obtained results: " + str(nextresult) + " of which " + str(len(links)) + " were unique."
for x in links:
	print x

Cool huh? If you want any help modifying this, drop me a line.

