Is your feature request related to a problem or challenge?
As software development in general becomes more and more agent driven / search driven, it is important to make sure that the content on the DataFusion website is part of the content used to make those development decisions
If the datafusion website is invisible to agents then we won't show up when people ask said agents to help them build tools, etc
There are a few things that the https://datafusion.apache.org is clearly missing
/robots.txt with clear crawl rules (basically should crawl everything) -- for example from duckdb: https://duckdb.org/robots.txt
- /sitemap.xml listing canonical URLs, keep it updated on publish - for example from duckdb: https://duckdb.org/sitemap.xml
There is a bunch more stuff from https://isitagentready.com/datafusion.apache.org but I think robots.txt
Describe the solution you'd like
Add robots.txt and sitemap.xml
Ideally using one PR for each feature
The sitemap.xml should be auto generated as part of the sphinx build process
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
As software development in general becomes more and more agent driven / search driven, it is important to make sure that the content on the DataFusion website is part of the content used to make those development decisions
If the datafusion website is invisible to agents then we won't show up when people ask said agents to help them build tools, etc
There are a few things that the https://datafusion.apache.org is clearly missing
/robots.txtwith clear crawl rules (basically should crawl everything) -- for example from duckdb: https://duckdb.org/robots.txtThere is a bunch more stuff from https://isitagentready.com/datafusion.apache.org but I think robots.txt
Describe the solution you'd like
Add robots.txt and sitemap.xml
Ideally using one PR for each feature
The sitemap.xml should be auto generated as part of the sphinx build process
Describe alternatives you've considered
No response
Additional context
No response