Per Wikipedia, a point is 1/72 of an inch. It's not uncommon to see literature utilizing 8x6 pixel character maps, so one might reasonably expect to be able to resolve a character of 48 (8x6) pixels. A 9 point font is 1/8" high, so each pixel in a 9 point, 8x6 character would be 1/64 of an inch. If that is oriented along the 2 1/2 foot axis, that suggests that 1920 pixels should be adequate. Along the 2 foot axis, 1536 pixels should be adequate.
This could go poorly if my assumption is bad, that a pixel of text is equal to a pixel of video. One could imagine that video pixels which are exactly 50% offset from the text pixels would require up to 2x greater resolution to resolve. If this is the case, still, one would expect that a 10 MP camera should be adequate.
Interestingly, document scanning for automated OCR suggests settings of 300 dpi, which would require 9,000 pixels across 2 1/2 feet: not reasonably obtainable with today's affordable cameras. I don't recall the IPVM discussion that inspired this effort, but I attempted to OCR generalized scenes with embedded text such as shop signs and writing on vehicles. I was surprised to discover that standard OCR software such as Adobe Acrobat and Abbyy Finereader could not resolve text from a cluttered scene, even though humans had no difficulty doing so. By this I do not mean that the text was embedded in extraneous garbage from the rest of the scene (this was what I had expected to discover). Instead I mean that there was no semblance of the images' text recognizable anywhere in the resultant OCR text. This surprised me a great deal. It seems that standard OCR software expects very high contrast with few distractors, so I would expect automated OCR to be a bridge too far for this application.
The foregoing does not consider issues such as lighting, document motion, or skew based upon oblique presentation of printed material to the camera, which could further challenge video frame rates and resolution required for adequate recognition.
Summarizing, if well lit printed material is perpendicular to the camera, while 1920x1536 pixels may provide legible text, I would want to start with 10 MP (roughly 2x in each dimension) to account for skew between source vs video pixels. I would expect that automated OCR will likely exceed reasonably accessible camera resolutions, even if background suppression or text isolation were tenable.
It will be interesting to see how IPVM's definitive real-world testing aligns with these predictions.