Recent article from the Washington Post: Inside the secret list of websites that make AI like ChatGPT sound smart
Tech companies have grown secretive about what they feed the AI. So The Washington Post set out to analyze one of these data sets to fully reveal the types of proprietary, personal, and often offensive websites that go into an AI’s training data.
** **** ****** **** ***** ***, we ******** ******’* ** **** ***, a ******* ******** ** *** ******** of ** ******* ******** **** **** been **** ** ******** **** ****-******* English-language ***, ****** ***** ******** ******, including ******’* ** *** ********’* *****. (OpenAI **** *** ******** **** ******** it **** ** ***** *** ****** backing *** ******* *******, *******)
*** ******* *** ** *********** ****** feature ** **** **** ******** **** included. ****.*** ** #***, **** *.* million ******, ***** *.***% ** *** tokens ** *** *****.
**** ** ******** **** ******* **** previews *** ******** **** *** *** public. * ***** **** ***** **** LLaMa *** **** *** ****** *********** about *********'* ******** *******, ***** **** of ***** ******** *** ******.
****** *** *******! * *** *** seen **** *** ****** ** ****** it.
** *** *** ****, **** ** evidently ** **** ****, ** ******* a ****** ** *** ******* **** found **** **?
*** ******* ***** ******** * ***** was ***:
************* ********* *** *** ** ** well (* *** *** ****** *** though) *** * *** ** *** biggest *** *****:
*** ******** ************:
** *** ***** ****, * ** concerned, ** *** ******* ***** ***, that **** *********** ** ***** **** to ***** *******-*. *******, ****** **** ChatGPT-4 * **** ******, *** ******** security ******* *** ***** ********** ****, as #* ****** ***, **** ** our **** ** *********.