Introduction to Mapping

The Basics

The maps are constructed in the Immersal Cloud Service. You must first submit a set of images and metadata to the service and then start the map construction job. Small jobs of a few dozen images complete in seconds while large jobs with hundreds of images can take tens of minutes.

The input for map construction is camera images and metadata. The metadata consists of

  • camera intrinsics (pixel focal length and principal point offset)

  • camera pose information (relative image position and orientation in metric scale)

  • optional GPS coordinates

Depth (LiDAR) data from the latest iPhones is not used as of now due to the limited range.

The input must be images, the Cloud Service can not convert a 3D point cloud to machine-readable maps for Visual Positioning.

In very simplified terms, the map construction process:

  • Finds distinct visual features, such as high contrast areas and shapes, in the input images

  • Matches visual features from images from different viewpoints

  • Uses the parallax in matched features to compute a 3D structure

The 3D structure forms the base for the output map. There's a lot more at work to make the Visual Positioning work, but understanding the basics of map construction will help in learning how to map.

Quick Start

You can use the REST API to submit images to the Cloud Service, but the easiest way to map is with our Immersal Mapper App.

Here are some tips to get you started

  • Remember to move around the target object or in the target location. The map construction needs to have different viewpoints of the target area

  • Don't be afraid to capture plenty of images, even for small locations. Any single visual feature should be seen at least from 3 different viewpoints

  • To make sure you capture the visual features from multiple viewpoints, you should try to capture images that overlap with each other

  • A 30-50% overlap between two images is a good rule of thumb

  • You can capture both portrait and landscape images. Just try to have as much useful visual information in the images as possible. The sky, for example, is not very helpful in the map construction process

  • For a great map, you should think about what your user's cameras will see when using the finished AR application. For best results, the map should include similar viewpoints

A quick-start video tutorial for mapping with the Immersal Mapper

Space Requirements

Not all spaces can be mapped. Some locations are more suitable for Spatial Mapping and Visual Positioning than others.

For example, highly reflective surfaces don't have static visual features for map construction. The reflections move around depending on the viewpoint.

Locations that lack distinct visual detail and features are also difficult. An extreme example would be a blank white wall. It would look practically the same no matter what part of the wall you would see.

Highly dynamic lighting can cause problems as the mapped space can look visually very different in drastically different lighting. In these cases, it's best to use multiple maps of the same location for different lighting conditions.

Not enough visual features and most of them are on an object that is likely to move around.

Plenty of visual features. Some of them are still on a moving object, but many others are on static surfaces. This kind of location can be mapped easily.

Reflective surfaces cause the camera to see false visual features. These reflected features will move visually depending on the viewpoint and can not be used for construction.

Low-light scenes will be difficult for the camera to see. Any visual features will likely be fuzzy, noisy and cause problems if they are even detected.

Mapping Instructions

For a good map, you should see the same area from different viewpoints. Any single visual feature needs to be seen from at least three viewpoints. More varied viewpoints mean better accuracy.

No overlap between captured images as all the images are viewing different directions. Matching visual features can't be found between the images and no 3D structure can be computed.

This is the most common reason for failed map constructions!

No parallax between sets, all images are captured from a single viewpoint. Matching visual features are found, but the resulting map either will be very inaccurate.

An easy fix would be to just capture more images from new viewpoints.

A lot of overlap between images and a lot of matching visual features from different viewpoints. This would make a great map of the space!

Ovelrap between images illustrated. Visual features shown for each image
Overlap between images from a real map

The above images illustrate the overlap between captured images. A 30-50% overlap will make sure matching visual features are found.

The visual features in the target area are covered in only two images and the visual difference between the viewpoints is too much. There is no useful overlap and matching the visual features of the images is difficult.


The visual features in the target area are covered by a minimum three images. There is enough overlap and matching the visual features is easy.

Mapping Examples

"AR Hotspot"

Small to medium-sized, focused locations enhanced with Augmented Reality content. These types of locations are a very good use case and do not require mapping everything around the user, just the focused target area.

Examples of AR Hotspots are statues, murals, and other street art. Storefronts, building facades, pop-up stores, exhibition booths, and art installations fall in this category.

Examples of "AR Hotspots"

When mapping AR Hotspots, try to cover the area from as many angles as possible. Take a series of images that overlap with each other. If you need to cover a specific part of the hotspot with extra detail, you can capture additional close-ups.

Mapping an AR hotspot.

If you need to map a very long area, such as a building facade, that would be difficult to cover in one arc, you can try to cover it with multiple "mini-arcs". You can also take additional images from further away. These will help the localizer when viewing the target area from further away.

Mapping a very long area, like a building facade or a mural
A statue would make a good AR Hotspot

Landmarks like statues are often easy to map by just capturing a series of images in a circle around it. Try to fit all important visual features in the images. You can map either in landscape or portrait mode. You can also mix orientations when needed and take close-ups for extra accuracy.

Mapping a landmark from every angle.

Indoor Locations

Large indoor spaces should be divided into separate maps, such as different rooms. You can use the separate maps at the same time or combine them later. But mapping them separately makes the mapping process easier, map construction faster, and the map update process more flexible.

To map an indoor location, map using the "outside-in" method. Take a series of images while moving around the perimeter of the room looking across the space. Remember that you can use either landscape or portrait mode. In smaller rooms, landscape often works very well.

This basic approach works for all types of indoor locations with only a little tweaking. Just try to cover the whole area from as many angles as possible.

Mapping a single indoor space

If you need to map narrow areas or areas connected by narrow doorways, you should take extra care to make sure the different areas can be visually connected by the images.

Making sure you can see from one room to the other
A separate map for each different room
Two separate maps combined in Unity later

Outdoor Locations

Outdoor locations are usually just larger variations of the other types. You could map city streets the same way as you would map a mural. Capture images in multiple directions from multiple viewpoints.

Mapping both sides of a street in one map

For a wide and open area such as a market square, you could capture many "panoramas" from different viewpoints to get perfect coverage in the whole area.

Open area with an obstacle in the middle