So you or your company has a website. You optimized it for SEO so you get all the organic traffic from google, bing, yahoo, etc... There are plenty of webmaster tools out there that will give you page rankings for specific pages. However, if you want to see for yourself how interlinked the pages in your domain or in your competitors domain are, you will have to poke around with some code. I have been using nutch for all the crawling and indexing purposes. It scales fairly well with hadoop. Bixo is another open-source tool, however I never got the hang of it and nutch provides just what I wanted - given a seed page, crawl through the server providing a list of inLinks for all URLs. This data can further be used (w/ some cleaning) w/ TreeViz, a great visualization tool for tree structures.. I will post some screenshots later today.











0 comments