Yes, they count, the process of making and continuing to update the underlying LLM is also what drains the lakes, they are all made on pirated info (all the big ones for sure, I’ve not heard of a widely available, usable model trained 100% on legally obtained data, but I suppose it could exist).
Ethics and morality aside.
Yes, they count, the process of making and continuing to update the underlying LLM is also what drains the lakes, they are all made on pirated info (all the big ones for sure, I’ve not heard of a widely available, usable model trained 100% on legally obtained data, but I suppose it could exist).