Copy a website

formatting link

Been using it for years. Works on MOST websites, though not all.

Gunner

"Confiscating wealth from those who have earned it, inherited it, or got lucky is never going to help 'the poor.' Poverty isn't caused by some people having more money than others, just as obesity isn't caused by McDonald's serving super-sized orders of French fries Poverty, like obesity, is caused by the life choices that dictate results." - John Tucci,

Reply to
Gunner Asch
Loading thread data ...

If you try downloading my website algebra.com, you will get into an infinite recursion through millions of pages. That's why I prevent most such bots from accessing my site. This would work only on very simple sites.

Reply to
Ignoramus4791

Some bots as you say - limit to 1 or 2 or 3 levels deep only.

Mart> >>

formatting link

----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----

formatting link
The #1 Newsgroup Service in the World! >100,000 Newsgroups

---= - Total Privacy via Encryption =---

Reply to
Martin H. Eastburn

you must be very smart to have such a complex and sophisticated website.

Reply to
Cydrome Leader

Yours was one of the sites it doesnt work well on...chuckle

Been there, tried that..

But it works quite well on most others.

Gunner

"Confiscating wealth from those who have earned it, inherited it, or got lucky is never going to help 'the poor.' Poverty isn't caused by some people having more money than others, just as obesity isn't caused by McDonald's serving super-sized orders of French fries Poverty, like obesity, is caused by the life choices that dictate results." - John Tucci,

Reply to
Gunner Asch

How does your web server differentiate between a bot and a human user making http requests?

Regards,

Robin

Reply to
robinstoddart

snipped-for-privacy@gmail.com fired this volley in news:b6e6ea8f-a3fe-4f46- snipped-for-privacy@f63g2000hsf.googlegroups.com:

Duh! It doesn't. The site has links back to the place where the link began. It wouldn't appear recursive to a human user, because that person would choose where he/she viewed. The spider can't tell, and ends up in recursions it can only abort by "counting out" repeats.

LLoyd

Reply to
Lloyd E. Sponenburgh

I tend to just use wget. Helps if you've got *nix for an OS or the Cygwin utilities for windoze though.

Mark Rand RTFM

Reply to
Mark Rand

I actually have some smarts in the server that can tell a bot from a human. But httrack is blocked on the spot in any case. I am not against it, as such, but it will not work on my site.

Reply to
Ignoramus3863

For one, "wget" can certainly detect and ignore recursive loops.

Reply to
Richard J Kinch

Not a bot attempting to look human. Just bots that advertise their botness, by honest design or flawed hacking.

Reply to
Richard J Kinch

Yes, a bot trying to look like a human (ie supplying Referer and browser-like User-Agent, I can still detec that it is a bot).

The way I detect is is that there is a hidden link that humans cannot see, and cannot click, but bots would follow it. The hidden link is not permitted by robots.txt, so it catches all non-compliant bots.

Reply to
Ignoramus3863

Yes, that would be difficult to defeat.

Reply to
Richard J Kinch

I had a lot of troubles with httrack and other bots like this. The people who run them usually are not meaning anything bad, they just do not realize that they should not run it against dynamic sites like mine. They may not even realize that my site is dynamic because it tried to look not to be (search engine friendly and all).

I spent a very long time trying to 1) make a website which hopefully does not lead into too many infinite crawlings, and 2) detect and stop bad bots early enough. But I still get problems from time to time.

Reply to
Ignoramus3863

Why?

Gunner

"Confiscating wealth from those who have earned it, inherited it, or got lucky is never going to help 'the poor.' Poverty isn't caused by some people having more money than others, just as obesity isn't caused by McDonald's serving super-sized orders of French fries Poverty, like obesity, is caused by the life choices that dictate results." - John Tucci,

Reply to
Gunner Asch

Whats wrong with bots harvesting your manuals?

Frankly..on dialup..I dont have the time to hit each and every manual and wait for a download to start and finish.

On sites such as yours, I run the program and go to bed.

Gunner

"Confiscating wealth from those who have earned it, inherited it, or got lucky is never going to help 'the poor.' Poverty isn't caused by some people having more money than others, just as obesity isn't caused by McDonald's serving super-sized orders of French fries Poverty, like obesity, is caused by the life choices that dictate results." - John Tucci,

Reply to
Gunner Asch

NOthing.

But if you go to algebra.com, you can accidentally go into an infinite loop with various scripts.

You may need to sleep a lot longer than anticipated.

i

Reply to
Ignoramus32074

I use wget for this too, provided saving a few pages doesn't work out so well. There are Windows binaries available, no need for Cygwin. For example:

formatting link
Wget won't hold your hand though, command line and a little bit of reading/homework suggested for it to be really useful...

Reply to
Leon Fisk

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.