Q: Whats the Relationship Between Transformer Network Size and Task Performance
I'm exploring the relationship between the number of parameters in Transformer networks and the range of tasks they can perform. Basically I would like to know that networks of these order of magnitude in size are capable of doing "X" type of tasks. I also wonder what are capable of very small networks. Do you know any good sources, papers, or discussions about that.
No comments yet.