# Google Image Crawler Setup User-agent: Googlebot-Image Disallow: / # Block YandexBot User-agent: YandexBot Disallow: / # Block MegaIndex.ru User-agent: MegaIndex.ru Disallow: / # Block dotbot User-agent: dotbot Disallow: / # Block Sogou web spider User-agent: Sogou web spider Disallow: / # Block SemrushBot User-agent: SemrushBot Disallow: / # Block Baiduspider User-agent: Baiduspider Disallow: / Sitemap: http://extension.psu.edu/sitemap/extension/sitemap.xml # Crawlers Setup User-agent: * #Crawl-delay: 5 # Do not index courses pages hosted in Plone Disallow: /courses # # Do not index iwd preview pages Disallow: /*?iwd_preview12345* # # Do not index test product User-agent: * Disallow: /testing-reminder-iwd Disallow: /test-product-iwd Disallow: /york-county-iwd # # Do not index the page subcategories that are sorted or filtered. Disallow: /*?dir= Disallow: /*&dir= Disallow: /*?limit= Disallow: /*&limit= Disallow: /*?mode= Disallow: /*&mode= Disallow: /*?order= Disallow: /*&order= Disallow: /*?event= Disallow: /*&event= # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ #Disallow: /js/ #Disallow: /lib/ Disallow: /magento/ #Disallow: /media/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /scripts/ Disallow: /shell/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/ Disallow: /.git/ # Paths (clean URLs) Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ #Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ Disallow: /catalog/product/gallery/ Disallow: /slf-permit-training-internal # Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt # Paths (no clean URLs) #Disallow: /*.js$ #Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?SID=