we need a large distributed database with well-curated storage or successful relevancy ranking. alternatively: searX is an example of where p2p is not required! it queries and combines from multiple sources.
#yacy does similar things but with individual index creations, sharing of the index, and advanced (solr/lucene) ranking. yacy software is easy to set up. continuous operation is not rock solid but it's not unusable. the interface design has a learning curve and is confusing. documentation is sparse but mostly sufficent for basic operation. i think diskspace requirement becomes large and sites crawled need to be more unique, blocking rules(spam) needs to be used better, and too much useless metadata indexed during crawl by default.
there are costs for disks and hosting. then there's labor and adoption. and skill is required. incentive for collective resource sharing is pretty low compared to readily available options. (searX outsources the costs and heavy lifting to the popular engines.)
know
•alternatively: searX is an example of where p2p is not required! it queries and combines from multiple sources.
#yacy does similar things but with individual index creations, sharing of the index, and advanced (solr/lucene) ranking. yacy software is easy to set up. continuous operation is not rock solid but it's not unusable. the interface design has a learning curve and is confusing. documentation is sparse but mostly sufficent for basic operation. i think diskspace requirement becomes large and sites crawled need to be more unique, blocking rules(spam) needs to be used better, and too much useless metadata indexed during crawl by default.
there are costs for disks and hosting. then there's labor and adoption. and skill is required. incentive for collective resource sharing is pretty low compared to readily available options.
(searX outsources the costs and heavy lifting to the popular engines.)
Anders Rytter Hansen likes this.
Anders Rytter Hansen
•SearX seems interesting. Yacy doesn't work that well IMO