How Source Map Works?

Backgrounds

As a front-end web developer, you might always find a so-called javascript source map file ended up with ".map" file extension. As we know, source map is a type of file used by browsers to locate the positions in the original source file, instead of the minified and human-unreadable file, while you are debugging your code and need to find out where the bugs might locate.

But, have you ever tried to open the ".map" file and see what's inside it? How does the map file tell the browser the necessary information for it to construct a relationship/mapping between the original files and the minified files?

In the following paragraphs, I will try my best to provide an explanation for the underlying details.

Source Map Format

There are three versions of Source Map Proposal, with version 3 being the newest and offering the best control over the overall size of the source map.

The full Source Map Revision 3 Proposal can be found here.

A typical sourcemap file generally has the format described as below:

{
  // Source Map Proposal Version, usually version 3
  "version": 3,
  // The source files that this source map file is associated with
  "sources": ["foo.js", "bar.js"],
  // A list of symbol names used by the "mappings" entry
  "names": ["src", "maps", "are", "fun"],
  // The core entry, its value contains the mappings data
  "mappings": "A,AAAB;;ABCDE;",

  // Optional, the generated file
  "file": "dist.min.js",
  // Optional, usally represent the repeated file path prefixes in "sources" entry
  "sourceRoot": "",
  // Optional, corresponding to the content of each file in "sources" entry
  "sourcesContent": [null, null]
}

The most intresting part might be the mysterious "mapping" entry, because the value of "mapping" entry is usually a large chunk of Base64 characters that are hard to figure out at first glance. Indeed, if you have no idea of how these characters are encoded, you can hardly get anything useful even after a careful analysis.

Well, no more nonsense. The rules of how the "mapping" value are as follows.

  • the value is first separated by semicolons(';') to several groups, each represents the mapping data of a certain line in the generated file
  • each group then is separated by commas(',') to several segments
  • each segment (called Base64 Variable-Length Quantity) can be decoded to an array, its length can be 1, 4 or 5

A typical "mapping" entry's value is like the following.

AAAA,IAAIA,KAAK,GAAG,CAAC;AAEb,IAAMC,MAAM,GAAG;EACXC,KAAK,EAAE

We can find two semicolons, separating the string into three parts, meaning that there're three lines. Take line 1 (zero-based, hereinafter the same) as an example.

AAEb,IAAMC,MAAM,GAAG

Three commas separate the line into four segments, each is a Base64 VLQ encoded string.

First, let's figure out what Base64 VLQ is.

Base64 VLQ

Simply put, Base64 Varialbe-Length Quantity (VLQ) encoding can be used to turn an array into a Base64 VLQ string. For example, an array "[ 0, 0, 2, -13 ]" would be encoded into a string "AAEb" using this encoding method.

A common VLQ, according to Wikipedia, is usually represented by a number of 8-bit bytes. In contrast, a Base64 VLQ is represented by a number of Base64 characters, each character can be further represented by 6-bits binary data. The reason why source map mapping uses Base64 VLQ instead of common VLQ might be that the former is more convenient in the web world and could be typed out using just visible alphabet characters and +, =, which are directly shown on your keyboard.

So what on earth VLQ is and how does it be associated with source map mapping?

Well, the core concept of VLQ is its continuation bit. With this feature, VLQ can be used to represent a very long data by a number of short-length data, and what's more, you don't need to worry how each short-length data should be separated, they could be joined without any separator and still stay independent.

Let me show you an example. Suppose you were given a sequence of numbers, e.g. 1|3|5|7|9, and were told to figure out a way to represent the sequence without the repeated separator(i.e. "|"). Well, that should be easy, you can just use 13579 to represent 1|3|5|7|9, because each element has only single digit. But, what if the sequence is 1|3|5|7|9|11? You cannot just remove the separator, because 1357911 might be wrongly interpreted as 1|3|5|7|9|1|1.

And the VLQ can solve the separator issue. The keypoint is the continuation bit in each chunk of VLQ. Specifically for Base64 VLQ, a chunk consists two parts: the continuation bit (1bit) and the real data bits (5bits), with the former indicating if there're more chunks following the current chunk, and the latter carring the real data information. But one point that should keep in mind is that the last bit of the chunk might be used as a sign bit indicating whether the number is a positive one or negative one.

To decode a Base64 VLQ string, first split each character apart, then get the Base64 binary representation of each character. The first bit of each binary representation is the Continuation Bit. Continuation Bit "1" means the current character does not suffice to decode into a correct number, and there must be one or more following characters to join together as a whole to finally converted into a relatively large number. And the encoding process is just the opposite of the above process. Below is an interactive Base64 VLQ inspector, you can try it yourself to get a better understanding of Base64 VLQ.

Base64 VLQ string:

AAEb
A
000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
A
000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
E
000100
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
b
011011
0
1
1
0
1
1
1
1
0
1
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
2
1
1
1
0
1
-13
[ 0, 0, 2, -13 ]
Continuation Bit
Sign Bit

** Note: Flickering means the string you fill in is not a valid Base64 VLQ string.

Now that we know a Base64 VLQ string is some-kind-of equal to an array. So another question arises: what does this array mean?

What does the array mean?

Before digging into the details, let's get back to the previously mentioned "mapping" entry data. According to the rules introduced in the last section, we can turn a line of Base64 VLQ strings back to a 2-dimensional array.

AAEb,IAAMC,MAAM,GAAG
=>
[[0,0,2,-13],[4,0,0,6,1],[6,0,0,6],[3,0,0,3]]

And one step further, we can get a 3-dimensional array for multiple lines of Base64 VLQ strings in the same way.

AAAA,IAAIA,KAAK,GAAG,CAAC;AAEb,IAAMC,MAAM,GAAG;EACXC,KAAK,EAAE
=>
[
  [[0,0,0,0],[4,0,0,4,0],[5,0,0,5],[3,0,0,3],[1,0,0,1]],
  [[0,0,2,-13],[4,0,0,6,1],[6,0,0,6],[3,0,0,3]],
  [[2,0,1,-11,1],[5,0,0,5],[2,0,0,2]]
]

The arrays of arrays above described a mapping relationship between the generated file and source files. The first dimension represents the whole file level or multiple lines. The second dimension represents each line in the file. The third dimension represents each segment in the line.

According to the Source Map Proposal(v3), there might be 1, 4 or 5 fields in the segment array. Each field has its certain meaning, described below.

The fields in each segment are:

  • The zero-based starting column of the line in the generated code that the segment represents. If this is the first field of the first segment, or the first segment following a new generated line (“;”), then this field holds the whole base 64 VLQ. Otherwise, this field contains a base 64 VLQ that is relative to the previous occurrence of this field. Note that this is different than the fields below because the previous value is reset after every generated line.
  • If present, an zero-based index into the “sources” list. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
  • If present, the zero-based starting line in the original source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
  • If present, the zero-based starting column of the line in the source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
  • If present, the zero-based index into the “names” list associated with this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.

(quoted from Source Map Revision 3 Proposal)

I've commented each part of the array below to make it more clear.

[
  [[0,0,0,0],[4,0,0,4,0],[5,0,0,5],[3,0,0,3],[1,0,0,1]], // line 0 in the generated file
  [
    [
      0, // column 0 in the generated file
      0, // source file index in the "sources" entry
      2, // line number in the source file
      -13 // column number in the source file
    ], // segment 0 in line 1
    [
      4, // column 1 in the generated file
      0, // source file index in the "sources" entry
      0, // line number in the source file
      6, // column number in the source file
      1 // symbol name index in the "names" entry
    ], // segment 1
    [6,0,0,6], // segment 2
    [3,0,0,3] // segment 3
  ], // line 1
  [[2,0,1,-11,1],[5,0,0,5],[2,0,0,2]] // line 2
]

According to the quoted content above, the value of each segment field is a value relative to that of the previous occurrence of the same field. In the following example, the 3rd field in line 1, whose original value is "[6,0,0,6]", would be turned into "[10,0,2,12]" before it's used for code location mapping analysis.

AAAA,IAAIA,KAAK,GAAG,CAAC;AAEb,IAAMC,MAAM,GAAG;EACXC,KAAK,EAAE
=>
[
  [[0,0,0,0],[4,0,0,4,0],[5,0,0,5],[3,0,0,3],[1,0,0,1]],
  [[0,0,2,-13],[4,0,0,6,1],[6,0,0,6],[3,0,0,3]],
  [[2,0,1,-11,1],[5,0,0,5],[2,0,0,2]]
]
=>
[
  [[0,0,0,0],[4,0,0,4,0],[9,0,0,9],[12,0,0,12],[13,0,0,13]],
  [[0,0,2,0],[4,0,2,6,1],[10,0,2,12],[13,0,2,15]],
  [[2,0,3,4,2],[7,0,3,9],[9,0,3,11]]
]

And what this segment array means is that, the starting character of the current segment at line 0, column 10 in the generated file, is mapped to the character at line 2, column 12 in the source file 0 (i.e. the file in the "sources" entry array with index 0). But the segment does not give you the length of each segment, actually it is calculated using the starting character location of the next segment array.

In the following example, I have generated a bundle file and a mapping file, using babel-cli, from two source files. From the mapping data in the mapping file, the contents of source files and the contents of the generated file, I've built a visual comparison tools to show you the mapping realtionship between the source file and the generated file. You can hover your mouse on arbitrary part you're intrested in from either the source or the generated file, to get the corresponding part in the generated file or in the source file, respectively.

../src/a.js
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
let apple = 1;
const banana = {
color: 'yellow',
};
class Greeter {
constructor(message) {
this.greeting = message;
}
greet() {
return "Hello, " + this.greeting;
}
}
const greeter = new Sayings.Greeter("world");
const button = document.createElement('button');
button.innerText = "Say Hello";
button.onclick = function() {
alert(greeter.greet());
};
document.body.appendChild(button);
../src/b.js
0
1
2
3
4
5
const cherry = 3;
function getColor(target) {
return `${target}'s color: ${Math.random()}`;
}
dist.min.js
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
"use strict";
function _typeof(obj) { "@babel/helpers - typeof"; return _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function (obj) { return typeof obj; } : function (obj) { return obj && "function" == typeof Symbol && obj.constructor === Symbol && obj !== Symbol.prototype ? "symbol" : typeof obj; }, _typeof(obj); }
function _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError("Cannot call a class as a function"); } }
function _defineProperties(target, props) { for (var i = 0; i < props.length; i++) { var descriptor = props[i]; descriptor.enumerable = descriptor.enumerable || false; descriptor.configurable = true; if ("value" in descriptor) descriptor.writable = true; Object.defineProperty(target, _toPropertyKey(descriptor.key), descriptor); } }
function _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); Object.defineProperty(Constructor, "prototype", { writable: false }); return Constructor; }
function _toPropertyKey(arg) { var key = _toPrimitive(arg, "string"); return _typeof(key) === "symbol" ? key : String(key); }
function _toPrimitive(input, hint) { if (_typeof(input) !== "object" || input === null) return input; var prim = input[Symbol.toPrimitive]; if (prim !== undefined) { var res = prim.call(input, hint || "default"); if (_typeof(res) !== "object") return res; throw new TypeError("@@toPrimitive must return a primitive value."); } return (hint === "string" ? String : Number)(input); }
var apple = 1;
var banana = {
color: 'yellow'
};
var Greeter = /*#__PURE__*/function () {
function Greeter(message) {
_classCallCheck(this, Greeter);
this.greeting = message;
}
_createClass(Greeter, [{
key: "greet",
value: function greet() {
return "Hello, " + this.greeting;
}
}]);
return Greeter;
}();
var greeter = new Sayings.Greeter("world");
var button = document.createElement('button');
button.innerText = "Say Hello";
button.onclick = function () {
alert(greeter.greet());
};
document.body.appendChild(button);
"use strict";
var cherry = 3;
function getColor(target) {
return "".concat(target, "'s color: ").concat(Math.random());
}

Conclusion

And the above is all I want to say about JavaScript source map. Hope this article helps. If there's any dout, please don't hesitate to let me know by joining our Discord (the invite link is at the top right corner of the current page) and sending me your questions.