to put it simply, you just can't have image processing (in robotics is called "machine vision") by using a cmos sensor (that's the camera) and microcontroller. there are a lot of tasks involved.
to get you started check this link out
http://emsys.denayer.wenk.be/?project=emcam&page=cases&id=16For image processing you need significant processing power. Some tasks required are compression, frame rate, edge detection, motion detection etc.
There is the "easy" but expensive way and the "hard" and cheap way.
The "easy" way involves the integration of integrated machine vision systems like those in the attachement.
The "hard" way involves a camera that communicates prefferably wirelessly with a pc, which with your own crafted algorithms image processing is performed and the required data are modified accordingly and sent to the robot in order to perform the required action.
In my opinion this is the best way to do it, since you have "unlimited" processing power and several algorithms have already been developed. Additionally it is the cheapest way to do it.
Now if you have other directions by your supervisor we can talk about it again.
It would be great though if you could be more specific about the components you have access to... Do you have to choose your own components, and if yes what is the budget. If the components are already defined, what are they?
Remember that earlier mobile robots with computers on board needed almost 15 minutes to calculate the next step. So response time is another significant factor. Consequently for fast response times off-board processing is compelled.