Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop™ data structures, which are great for batch processing....
|
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.
It allows you to download a World Wide Web site from the Internet to a local directory, building recursively al...
|