Good points there in post #4, Merlin3189. Recursion doesn't scale down well.
Here's a general overview of stacks and recursion - stuff that Merlin3189 already knows, and chopnhack may also know most of this as well.
In order to implement recursion, some kind of stack is needed.
A stack is simply a block of storage (almost always an area of memory) that is used for automatic, dynamic memory allocation, to keep track of information which accumulates over time and is then dismantled in reverse order, in such a way that, usually, only the "top" information is in use.
The classic use for a stack is for storing return addresses and local variables for functions. When one piece of code calls another piece of code, for example when some code in function1() invokes function2(), the current execution point (code address) in funtion1() needs to be kept somewhere while control is transferred to function2(), so that when function2() returns, execution can continue from the next statement in function1().
Since function2() could also call function3(), and so on, a generalised solution is needed, and a stack is used to keep track of these return addresses. This stack is usually also used for local variables - variables that are used within each function, rather than being global - and for parameters passed to functions, because this is a simple and convenient way to allocate storage for data that is associated with a particular invocation of a function.
The stack is just an allocation system that uses a block of storage. Allocation and use of this storage starts at one end (called the bottom of the stack, although it may actually be the highest memory address in the block) and is tracked by a stack pointer (SP), which identifies the current top-of-stack location.
When new information is added to the stack, it is stored at the "top" of the stack, and the stack pointer is advanced to the next piece of storage to be used. This is called pushing data onto the stack. When the information at the top of the stack is no longer needed, and the previously pushed data needs to be accessed, data is popped off the stack by adjusting the stack pointer.
The actual data does not move; only the stack pointer changes. (And often, one or more other pointer registers in the CPU that also allow the stack memory to be addressed.)
The stack is used dynamically, according to the program flow. Every time a function is called, data is pushed onto the stack. Every time a function returns, data is popped off the stack. Therefore the stack is constantly growing and shrinking during program execution.
Since there is a limited amount of memory available for the stack, if the program runs in such a way that the nesting of function calls is too deep, the stack pointer will reach or pass the end of the memory area reserved for the stack, and you will have a stack overflow. Since recursion deliberately uses significant amounts of stack space, you could say that using it is kind of tempting fate!
The effect of a stack overflow depends on the environment - the programming language, the way it's executed (interpreted vs. compiled), and the complexity of the CPU or MCU (microcontroller unit) that's executing it.
Stack overflows cannot reliably be predicted by analysing the source code, because the exact execution path the program takes is often determined by real-time stimuli which can't be predicted - at least, not in their complex combinations over a period of time. Some limits can be calculated - for example, a factorial function that uses recursion can be written so it checks the number being factorialised, and aborts if it's so large that the estimated stack requirement is more than the runtime environment supports. If that amount is known at coding time!
The Python language is interpreted, or partly compiled using a JIT (just in time) compiler, I think. In either case, executing the program requires a properly designed runtime environment, which takes care of the allocation of all resources, including memory, and is able to abort the program with a tidy explanatory error message if the program tries to exceed the available stack memory. The Python runtime environment is too large and complicated to run on very resource-constrained microcontrollers, so it has plenty of memory available. So recursion can be done pretty safely in Python.
Java programs are compiled into bytecode which is then executed by the Java Runtime Environment (JRE). The JRE can run on somewhat resource-constrained devices - in fact it was originally designed for embedded applications, but in all cases, it allocates and monitors resources, and can detect stack overflow and abort the program with some kind of error indication.
Since a function's execution point and its dynamically allocated data (parameters, and dynamically allocated variables, called "automatic" variables in C) both need to be preserved when another function is called, and restored when that function returns, they are often both stored on the same stack, but not always. The choice depends on the design of the runtime environment, or in some cases, the hardware.
With very resource-constrained devices such as the Microchip PIC and the Atmel AVR, several techniques are used to deal with dynamic memory allocation. But many of these devices have so little RAM that a program's allowable function nesting depth and recursion are very restricted anyway, and recursion may not even be permitted.
Specifically, the 8-bit PIC microcontrollers (baseline, mid-range and "high-end") all have a stack that's not within main memory; in baseline and mid-range devices, this stack can only be used for return addresses for function calls (including automatic calls due to interrupts). This stack cannot be used for function parameters or local variables, because (a) it's tiny - only two levels in baseline devices, eight levels in mid-range devices, 16 levels in enhanced mid-range devices, and 28 levels in "high-end" devices, and (b) the core doesn't provide any way to access it (some limited kludges are provided in the "high-end" PICs).
The C programming environments for PICs don't allow recursion. (If there are any that do, they would implement it in a non-standard way.) In fact, parameters and local variables used by functions must be allocated statically (at fixed addresses) by the compiler, which analyses the program to figure out how memory usage can be minimised by overlapping allocation between functions that never appear on the same branch of call trees. The compiler goes to a lot of trouble to make generalised C code run on these architectures, but even then, there are limitations (such as recursion being disallowed) that they have to pass back up to the programmer.
The AVR and most microcontrollers put the stack in globally addressable RAM, so they are better able to implement C-language constructions, but again the small amount of RAM is often a limiting factor.
With small devices, stack overflow may not be automatically detected. On the AVR, and many other microcontrollers that have their stack in main memory, a stack overflow means that memory used for other purposes - statically allocated data, for example - gets overwritten without any warning! On the PIC, the stack "wraps around" and starts overwriting the oldest information first! (The "high-end" PICs can generate an exception on stack overflow.)
It's common for embedded systems programmers to fill the stack memory with a repeating pattern, such as "STACK---STACK---STACK---STACK---", at the start of program execution, so a memory dump later will show how much of the stack was actually used during program operation.
Recursion can be simulated using a loop. The dynamically allocated data associated with each invocation of the function is put into an array in memory, and the stack pointer is replaced by a variable containing either an entry number in that array or a pointer into it, that allows the loop code to locate the data at the current recursion "depth". This trick makes better use of memory, can be used in architectures where real recursion isn't allowed or is too limited, and has other advantages as well.
I wrote an anagram program in Java that used this technique. It would translate to Python pretty easily too. Let me know if you want to see it.