There’s little evidence that “staging” the training of neural networks on language-like input – feeding them part of the problem space initially, and scaling that up as they learn – confers any consistent benefit in terms of their long term learning (as reviewed yesterday). To summarize that post, early computational demonstrations of the importance of…