I noticed that a *lot* of the crawlers/bots we see on www.bbc.co.uk & www.bbc.com are spoofed e.g. a "Meta" crawler coming from 10s of different small ISPs across the world (the real one comes from a Meta ASN).
I deployed a change this morning which adds source ASN validation (alongside user-agent string analysis) to our "known crawlers/bots" classifier & well, the results speak for themselves. Attached graphs show RPS from "known crawlers/bots" to www.bbc.co.uk & www.bbc.com.
#WebDev #BBC #Bots
I deployed a change this morning which adds source ASN validation (alongside user-agent string analysis) to our "known crawlers/bots" classifier & well, the results speak for themselves. Attached graphs show RPS from "known crawlers/bots" to www.bbc.co.uk & www.bbc.com.
#WebDev #BBC #Bots