Domain Empire

Wayback Machine Being Sued

Spaceship
Watch

Dave_Z

Electrifying GuyTop Member
Impact
393
Just found this in one of my RSS feeds, only now I checked it:

http://www.nytimes.com/2005/07/13/t...7b4b470d4593e0&ei=5088&partner=rssnyt&emc=rss

Keeper of Expired Web Pages Is Sued Because Archive Was Used in Another Suit

By TOM ZELLER Jr.
Published: July 13, 2005

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

If this was already posted somewhere, a thousand pardons.

Sheesh, this was bound to happen, I guess...
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
hmm...it's actually a very interesting read. I think that archive.org may have a problem. Realistically they do not have permission to copy everyone's pages. Also this could effect google's cache too.
 
0
•••
Very great info
I think the archive guys will have a point there too
 
0
•••
HI

Very interesting read , will have to keep an eye on the out come...

Thanks
Tom Dahne
 
0
•••
The problem with the internet archive is that, simply put, they copy other people's material without permission. The ability to opt-out using robots.txt really doesn't help, since the relevant law does not say that you can copy unless someone says not to - you need permission to copy. They've gotten along by being decent folks, and complying with requests by content owners. But you can only get along by being nice so far. Eventually, you run into someone who is unreasonable and has the law on their side.
 
0
•••
I agree - personally I don't have a problem at all with the archive. I think it's a great tool and I use it myself. However, as jberry pointed out, I do believe that the law is unfortunately on the side of the company who's probably trying to make their money back.
 
0
•••
The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content
Wow
 
0
•••
the wayback machine only takes a snapshot of the publicly available pages, pages that are (or were) meant to be seen by public. I dont see a reason why one would want their pages not archived just because they are getting into trouble... see what I am saying.
 
0
•••
the wayback machine only takes a snapshot of the publicly available pages, pages that are (or were) meant to be seen by public. I dont see a reason why one would want their pages not archived just because they are getting into trouble.

Yes. If I am a professional photographer, I too want my work to be seen by the public. I do not want people taking snapshots of my photographs and distributing them.

Whether you see a reason or not is irrelevant to the fact that if Company X changes its website, then they have perfectly legal grounds for going after someone who is distributing copies of their old website. The Archive never had their permission to make that material available at archive.org, and that's the bottom line.
 
0
•••
I agree that legally they have no right to store/redistribute old material that belongs to others, but it'll be a real shame if/when we don't have the wayback machine as a tool any longer. It's incredibly useful.
 
0
•••
If the suit against archive.org prospers, it could set a precedent for suing the $$$ out of google, whose cache functionality violates the same copyright laws.
 
0
•••
Yahoo's new MyWeb Beta offers the same feature of caching... It would be really unfair to see non-profit organization like Archive.org getting sued while Google, Yahoo and the rest of the commercially orientated sites are in no problems...
 
0
•••
lol, pplz just want money, archive.org is not making anyprofit directly from the material that is "copied" nor do they claim it as their own. This should not count as copying it should be thought of as a backup, and a solution to this, i think is that archive.org should implement a system that gives websites a way in which they can exclude themselves from having their sites backedup..
 
0
•••
This sort of thing just goes to show that the Legal Precedents surrounding Internet Disputes are a long way from being settled.

Really , it is only the same ( technicalities permitting ) as somebody producing a pamphlet or mission statement / Journal or some form of traditionally printed document and it being stored in a library archive somewhere. There are no legal grounds to sue then is there if such data is researched and used as legal evidence. Businesses have to be accountable for information produced, past or present and there should be no legal loopholes to hide behind.

Lets hope that common sense can prevail on this one instead of this simply opening the floodgates to a barrage of claims.


Also , technically, your browser makes copies of web page data to be able to read / cache web pages, but I bet they are not about to take action against each and every visitor of their own site. On the contrary, this helps them conduct business, they obviously want the best of both worlds.
 
Last edited:
0
•••
Your argument has merit, webmark. As in your library archive example, if the journal publisher knowingly allows people to make millions of xerox copies without doing anything for years, wouldn't that be sufficient basis for the document to pass into the public domain?

Patent holders have to actively protect their patents, as failure to do so could dilute and eventually render the patent unenforceable. Perhaps new laws need to be crafted to treat internet-published material more like patents and less like traditional copyrights.
 
0
•••
I think that it is terrible that a free service like that which has helped so many people should have to deal with this.
 
0
•••
They are not copying anything, they are only keeping records of what was on the net before.
They are also non profit, so honestly, I think that the suing company has not got a chance.
 
0
•••
Also , technically, your browser makes copies of web page data to be able to read / cache web pages, but I bet they are not about to take action against each and every visitor of their own site.

That is specifically exempted.

There is no "non profit" exception to copyright law.

Some news sites, for example, make the last N days of news available for free, but charge a fee if you want to research farther back in time. They are perfectly entitled to do that - it is their content. But you are saying that if some non-profit site wanted to collect all of those articles and make them searchable for free, then they should be entitled to do that?

No, they shouldn't.
 
0
•••
It is not a copy, it is a cache, it simply shows what has been there in the past. It is not being used commercially so it is not actually copying. Breaching copyright would be photocopying a painting and selling prints.
 
0
•••
jberryhill said:
That is specifically exempted.

I realise that, just thought I would be a little pedantic....:tu:
 
0
•••
Zeeble said:
It is not a copy, it is a cache, it simply shows what has been there in the past. It is not being used commercially so it is not actually copying. Breaching copyright would be photocopying a painting and selling prints.

You don't seem to actually be observing the law here.

http://www.archive.org/about/terms.php

I think the terms at Archive is screwed. I think if a company with serious lawyers go after them they will lose. Someone now has a case that can prove damages. If this goes to court which I predict it will then they will lose. I bet Archive would change to an inclusive robots change instead of exclusive as it is now. I would rather add a archiverobot to my robots.txt giving them archive permission. Also if archive does lose in court it would effect other sites such as google as google also caches pages.
 
0
•••
The DMCA distinguishes between a cache and an archive. Archive.org might be considered an archive but not a cache. A cache is temporary.

One could easily argue that archive.org is exempted under section 108. Presumably this is why archive.org is not being accused of copyright infringement. They are being sued for “breach of contract, promissory estoppel, breach of fiduciary duty, negligent dispossession and neglect.” The claim is basically that they did not properly secure their system to prevent users from accessing data they were supposed to restrict access to.

Harding Earley Follmer & Frailey, are accused of violation the DMCA and CFAA, but not for not for archiving anything. They are accused of “hacking” the archive.org site in order to get access to information they weren't supposed to.

So the only copyright issue in the case is the DMCA "anti-circumvention" clause. But all this discussion about what would happen if archive.org were, hypothetically, sued for copyright infringement is sure interesting.
 
0
•••
Dang Archive.org is so usefull i don't want it gone!!
 
0
•••
primacomputer said:
The DMCA distinguishes between a cache and an archive. Archive.org might be considered an archive but not a cache. A cache is temporary.

One could easily argue that archive.org is exempted under section 108. Presumably this is why archive.org is not being accused of copyright infringement. They are being sued for “breach of contract, promissory estoppel, breach of fiduciary duty, negligent dispossession and neglect.” The claim is basically that they did not properly secure their system to prevent users from accessing data they were supposed to restrict access to.

Harding Earley Follmer & Frailey, are accused of violation the DMCA and CFAA, but not for not for archiving anything. They are accused of “hacking” the archive.org site in order to get access to information they weren't supposed to.

So the only copyright issue in the case is the DMCA "anti-circumvention" clause. But all this discussion about what would happen if archive.org were, hypothetically, sued for copyright infringement is sure interesting.

Nice analysis. It sheds good light to what is actually involved.
 
Last edited:
0
•••
It is not being used commercially so it is not actually copying. Breaching copyright would be photocopying a painting and selling prints.

Uh, those people being sued for sharing music files aren't doing it commercially either. There is no "non commercial" exemption to copyright law.

I'm not sold on the 108 exception. The purpose of that exception is primarily to permit what are called inter-library loan programs to operate. If there is a journal article at library X, and my library Y does not carry that journal, then library X can make a copy of the article, send it to library Y, and library Y can let me read that copy.

Taking that into account, journal publishers often charge different institutional rates for publications, knowing that those publications will be subject to inter-library loan provisions.

Labrocca hits the nail on the head by stating the obvious proposition that opt-in is legally on better ground than opt-out. Like everyone else, I like the archive, I use the archive, and I would not like to see its usefulness impaired. But I wonder to what extent the vast majority of web publishers are even aware of its existence. The wayback forum there regularly features postings by publishers who have discovered the archive and want to know how to get their old material out of it, since they were wholly unaware the archive was doing that.

Finally, the archive is distinguishable from temporary caching.
 
0
•••
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back