whitworth photography Posted March 6, 2005 Share Posted March 6, 2005 One of my sites was copied by a bot running for http://www.archive.org. Go out there and search for your domain name and see if you get a hit. I read through some of their "FAQs" and basically they want me to do some work on MY site to keep THEM from copying my work (ie add a robots.txt). What the heck is that all about? From a legal standpoint, since when do I have to make changes to my site to keep someone from stealing my logos, pictures, etc. Now I understand in reality that I should do things to keep people from stealing my work because it happens, but I don't understand how they can so blatantly steal parts of my site from a legal standpoint. I want to send them an email asking them to explain themselves, but before I do, I thought I'd see what you guys think about this. TIA, Kirk Link to comment Share on other sites More sharing options...
michaelkh Posted March 6, 2005 Share Posted March 6, 2005 1) archive.org is a hugely valuable internet service that has no parallel. When you want to look up some content that was on a website that has long since disappeared, what are you going to do? 2) They are no more 'stealing' your site than google is. They are indexing it, and they are not making or attempting to make profit from that. If a user wants to view your site now, they'll just visit you. If they want to view it how it was a few years ago, they'll visit archive.org. If you don't want them to do that, you block the spider. 3) archive.org are up front about what they do, and a robots.txt file is a very _very_ simple thing to add. If you add the few things they need you to add, they won't index you. It's simple. While you're at it, you can add rules that stop other well-behaving robots from sucking up your bandwidth. 4) Worry more about people who are going to steal your stuff and pretend it's theirs, and who won't be deterred by a simple robots.txt. Link to comment Share on other sites More sharing options...
kai_griffin Posted March 6, 2005 Share Posted March 6, 2005 Adding entries to robots.txt is a standard, documented procedure for anyone running a website who does not wish one or more pages to be indexed by search engines. Since when? Since pretty much as long as search engine services like Google have existed. Remember that if you do this for your home page, you'll drop off Google's radar (and all other search engines, too). Typically, you might add entries just for specific folders. Anyway, as inferred above, these guys aren't actually stealing anything from you. Link to comment Share on other sites More sharing options...
eric merrill Posted March 6, 2005 Share Posted March 6, 2005 This is the web. Various sites index the web. Archive.org is one of the nicer ones in that they actually honor your preferences you have expressed (via the robots.txt) for indexing and archiving your site. There's no reason to get upset. They are not doing anything wrong. Link to comment Share on other sites More sharing options...
hugh_crawford1 Posted March 6, 2005 Share Posted March 6, 2005 Note that your robots.txt can specify different instructions to different robots. Also, archive.org is very much one of the good guys of the Internet. You can even use their bandwidth to serve a current mirror of your site via a special tag. Link to comment Share on other sites More sharing options...
whitworth photography Posted March 7, 2005 Author Share Posted March 7, 2005 I want to thank you guys for reining me back in. After seeing archive.org in my web server logs, I found an article about all the grief one guy went through trying to get archive.org to remove his site from their records. I guess when I read through that it got me a little worked up unecessarily. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now