Skip to content

Improve search engine / agent discoverability for DataFusion website #23258

Description

@alamb

Is your feature request related to a problem or challenge?

As software development in general becomes more and more agent driven / search driven, it is important to make sure that the content on the DataFusion website is part of the content used to make those development decisions

If the datafusion website is invisible to agents then we won't show up when people ask said agents to help them build tools, etc

There are a few things that the https://datafusion.apache.org is clearly missing

  1. /robots.txt with clear crawl rules (basically should crawl everything) -- for example from duckdb: https://duckdb.org/robots.txt
  2. /sitemap.xml listing canonical URLs, keep it updated on publish - for example from duckdb: https://duckdb.org/sitemap.xml

There is a bunch more stuff from https://isitagentready.com/datafusion.apache.org but I think robots.txt

Describe the solution you'd like

Add robots.txt and sitemap.xml
Ideally using one PR for each feature

The sitemap.xml should be auto generated as part of the sphinx build process

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions