solution seems to be a small model that is primarily there just for RAG search of a multi-terabyte knowledge base
trading off gpu for storage, which is way cheaper
seems like this is what citadel chat is trying to do, just this could be taken to a bigger scale with a massive knowledge base