๐ก๐๐ป ๐ถ๐๐ ๐ผ๐ณ๐ณ๐ฒ๐ป๐ฏ๐ฎ๐ฟ ๐ฑ๐ถ๐ฒ ๐๐ฎ๐๐๐ฒ ๐ฎ๐๐ ๐ฑ๐ฒ๐บ ๐ฆ๐ฎ๐ฐ๐ธ!
As reported by several media (e.g. https://lnkd. in/e-bvsSX8), Meta has now confirmed that it used the illegal pirated library LibGen to train its AI.
The explanatory memorandum states that books are of course the best source for AI training, as they are often better in terms of language, content and subject matter than any short snippets from social media (logical). They are “well-written representations of human language”.
๐๐ฒ๐๐ต๐ฎ๐น๐ฏ ๐๐๐ฟ๐ฑ๐ฒ๐ป ๐๐ฎ๐ด๐ฒ ๐๐ป๐ฑ ๐๐ฐ๐ต๐ฟ๐ฒ๐ถ๐ฏ๐ฒ ๐ฎ๐ณ๐ฌ ๐ง๐ฒ๐ฟ๐ฎ๐ฏ๐๐๐ฒ ๐๐ฬ๐ฐ๐ต๐ฒ๐ฟ (๐ฐ๐ฎ. ๐ณ.๐ฑ ๐ ๐ถ๐น๐น๐ถ๐ผ๐ป๐ฒ๐ป ๐๐ฬ๐ฐ๐ต๐ฒ๐ฟ ๐๐ป๐ฑ ๐ด๐ฌ ๐ ๐ถ๐น๐น๐ถ๐ผ๐ป๐ฒ๐ป ๐๐ถ๐๐๐ฒ๐ป๐๐ฐ๐ต๐ฎ๐ณ๐๐น๐ถ๐ฐ๐ต๐ฒ ๐ฆ๐๐๐ฑ๐ถ๐ฒ๐ป) ๐ด๐ฒ๐ธ๐น๐ฎ๐๐ – ๐ฎ๐ป๐ฑ๐ฒ๐ฟ๐ ๐ธ๐ฎ๐ป๐ป ๐บ๐ฎ๐ป ๐ฑ๐ฎ๐ ๐ป๐ถ๐ฐ๐ต๐ ๐๐ฎ๐ด๐ฒ๐ป. ๐จ๐ฟ๐ต๐ฒ๐ฏ๐ฒ๐ฟ๐ฟ๐ฒ๐ฐ๐ต๐๐น๐ถ๐ฐ๐ต ๐ถ๐๐ ๐ฑ๐ฎ๐ ๐ป๐ฎ๐๐ฬ๐ฟ๐น๐ถ๐ฐ๐ต ๐ฒ๐ถ๐ป ๐ฎ๐ฏ๐๐ผ๐น๐๐๐ฒ๐ ๐ก๐ผ-๐๐ผ.
Now you can argue that Meta did not steal the data itself, but “merely” used an illegally curated stock for training. And you can argue that training AI does not constitute copyright infringement. The courts will decide on all of this.
๐ ๐ฎ๐ป ๐ธ๐ฎ๐ป๐ป ๐ฎ๐ฏ๐ฒ๐ฟ ๐ฒ๐ฏ๐ฒ๐ป๐ณ๐ฎ๐น๐น๐ ๐ฒ๐ถ๐ป๐บ๐ฎ๐น ๐บ๐ฒ๐ต๐ฟ ๐๐ฒ๐ต๐ฒ๐ป: ๐๐ฎ๐ ๐ด๐ฒ๐บ๐ฎ๐ฐ๐ต๐ ๐๐ฒ๐ฟ๐ฑ๐ฒ๐ป ๐ธ๐ฎ๐ป๐ป ๐๐ถ๐ฟ๐ฑ ๐ด๐ฒ๐บ๐ฎ๐ฐ๐ต๐ – ๐ผ๐ต๐ป๐ฒ ๐ฅ๐ฬ๐ฐ๐ธ๐๐ถ๐ฐ๐ต๐ ๐ฎ๐๐ณ ๐ฅ๐ฒ๐ฐ๐ต๐, ๐๐ฒ๐๐ฒ๐๐๐, ๐จ๐ฟ๐ต๐ฒ๐ฏ๐ฒ๐ฟ. ๐จ๐ป๐ฑ ๐บ๐ฎ๐ป๐ป ๐ธ๐ฎ๐ป๐ป ๐๐ถ๐ฐ๐ต ๐๐ถ๐ฐ๐ต๐ฒ๐ฟ ๐๐ฒ๐ถ๐ป, ๐ฑ๐ฎ๐๐ ๐ ๐ฒ๐๐ฎ ๐ป๐ถ๐ฐ๐ต๐ ๐ฑ๐ถ๐ฒ ๐ฒ๐ถ๐ป๐๐ถ๐ด๐ฒ๐ป ๐๐ถ๐ป๐ฑ, ๐ฑ๐ถ๐ฒ ๐๐ผ ๐ฎ๐ฟ๐ฏ๐ฒ๐ถ๐๐ฒ๐ป. ๐๐ถ๐ฒ’๐ต๐ฎ๐’๐ ๐ต๐ฎ๐น๐ ๐ท๐ฒ๐๐๐ ๐ฒ๐ฟ๐๐ถ๐๐ฐ๐ต๐ ๐๐ป๐ฑ ๐๐ถ๐ป๐ฑ ๐ฎ๐๐ณ๐ด๐ฒ๐ณ๐น๐ผ๐ด๐ฒ๐ป.
๐ฆ๐ฐ๐ต๐ผฬ๐ป๐ฒ ๐ป๐ฒ๐๐ฒ ๐ช๐ฒ๐น๐!
P.S.: currently the users of LLM’s are responsible for their results, i.e. if you now use Meta’s Llama model and the text generated with it uses content from the illegally used training data, you are responsible for it – not Meta!
Hashtag#informatikersindcool Hashtag#kiistdaundbleibt
