Running into the same problem. Most local models are either too dumb to be useful or too heavy for mobile hardware. The sweet spot is a ~7B that's been mercilessly fine-tuned for ONE task instead of trying to be a general assistant. Specialist beats generalist at every parameter count.