Real computers have only a finite number of states, so what is the relevance of Turing machines to real computers?

Why is the finite automata model not enough?

While other answers have already mention many relevant aspects, I believe that the stronges advantage of Turing machines over finite automata is the separation of data and program. This allows you to analyze a quite finite program and make statements about how that program would handle different inputs, without restricting the size of the input.

While it is theoretically possible to describe both an actual computer and something like a Turing machine with finite tape as a state machine, that is not really feasible: the number of states is exponential in the amout of memory your machine has, and the general finite state automaton formalism requires you to explicitely list the transitions between these states. So for a general finite state automaton of that size it's quite infeasible to make any deductions based on a full enumeration of all state transitions.

Of course, in a real computer, states transitions can't happen arbitrarily. There is no command to swap a third of the bits in memory in a single step of the computation. So you could try to come up with a more compact specification for the state transitions. Something like the instruction set specification of your architecture. Of course, real computer architectures are complicated for the sake of performance, so you could simplify this further, to some very simple instruction set, which just performs very small steps using very limited input and output. In the end you may find that your architecture resembles something like a Turing machine interpreter: using some bits of program code and one bit of input, generate a bit of output and move around in your program code.

One alternative would be using the states of a finite state automaton just to represent the data being processed by the program, while encoding the program itself into the state transitions. That would entail the same problem of how to enumerate all states, and a compact representation might again be close to what a Turing machine does.

What is the point of studying these much stronger models with respect to real computers?

On the whole I'd say that a finite-tape Turing machine would probably be a better model for actual computers. But for many scientific questions, the distinction between a finite but large and an infinite tape is irrelevant, so just claiming an infinite tape makes things easier. For other questions, the amout of tape used is at the core of the question, but the model easily allows you to speak about the amount of tape usage without the hassle of specifying what happens if the computation runs out of tape.