-
Notifications
You must be signed in to change notification settings - Fork 10
/
HACKING
440 lines (306 loc) · 12.4 KB
/
HACKING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
The FileMaker Pro File Format
--
FileMaker Pro is a consumer-grade database program that uses a binary,
proprietary file format for storing tabular and non-tabular data. This
file describes the knowledge necessary to extract tabular data from files
with extension fp3, fp5, fp7, or fmp12.
There are two basic kinds of FileMaker files, fp3/fp5 and fp7/fmp12. The
two varieties have a similar overall structure and design philosophy but are
otherwise incompatible. The rest of this document will describe their
respective layouts and refer to them by their latest incarnations, fp5 and
fmp12. It is based on the fp5dump project combined with my own efforts.
The fp5dump project is here:
https://github.com/qwesda/fp5dump
The source code has more information about the fp5 type than you will find in
here. I welcome any attempts to merge that information into this document.
Preliminaries: Text Encoding
==
Text data in fp5 files use the native character encoding of the machine that
created them; in most cases, this encoding is MacRoman. iconv can be used to
convert this text data to a more modern encoding, e.g. UTF-8.
The story with fmp12 is more complicated. FileMaker began supporting Unicode
characters before UTF-8 achieved widespread popularity, and appears to use the
now-deprecated Standard Compression Scheme for Unicode (SCSU), which is
documented here:
https://www.unicode.org/reports/tr6/tr6-4.html
SCSU is Latin-1 compatible, so treating the raw bytes as ISO-8859-1 is a good
start. But then it uses control codes to switch to other "windows" of Unicode
characters, including full support for UTF-16BE and extended Unicode planes.
Preliminaries: Integer Encoding
==
Most integer data (e.g. lengths) are encoded big-endian. However, certain
values appear to use a quasi-variable-length encoding. The encoding was fully
variable length in fp5, but seems to have been modified in fmp12. For reasons
that will become clear later, these will be referred to as "path integers" that
consist of one to three bytes.
In all cases, the actual length of the integer can be determined from context,
but they seem designed in a way that they self-report their length, similar to
UTF-8 sequences. This feature is not necessary to parse them, so for simplicity
the sequences will be described assuming the total length is known in advance.
One byte integers have a range of 0 - 127, with the highest bit ignored.
Two byte integers have a range of 128 - 65536. Ignore the highest bit of the first
byte, treat the remaining 15 bits as a big-endian number, and add 128.
[fp5 only] Three byte integers have a range of 49152 and up. Ignore the highest two
bits of the first byte, treat the remaining 22 bits as a big-endian number, and
add 0xC000.
[fmp12 only] Three byte integers have a range of 128 and up. Ignore the first
byte and add 128 to the second two bytes, treated as a big-endian number.
File Structure
==
Files consist of a header sector followed by one or more body sectors. Each
sector contains 1024 bytes (fp5) or 4096 bytes (fmp12). In fp5 files, the first
body sector can be ignored, with the "real" processing starting at offset 2048.
Header Structure
==
The header begins with a 15-byte magic number:
00 01 00 00 00 02 00 01 00 05 00 02 00 02 C0
In fmp12, the magic number is followed by the ASCII sequence "HBAM7". This
sequence can be used to distinguish fp5 files from fmp12 files.
The name of the software that created the file can be found at byte offset
541 in the header. This string is a Pascal string, consisting of a one-byte
length at offset 541 followed by an ASCII, non-terminated string, usually
of the form "Pro X.0", where X is the version number.
Sector Structure
==
Sectors may be unordered; they are arranged as a doubly linked list, and
contain the ID of the previous sector as well as the next sector in the list.
By following the linked list from the beginning, you can traverse the data in
order.
fp5 sector layout:
Offset Length Value
0 1 Deleted? 1=Yes 0=No
1 1 Level (Integer)
2 4 Previous Sector ID (Integer)
6 4 Next Sector ID (Integer)
12 2 Payload Length = N (Integer)
14 N Payload
fmp12 sector layout:
Offset Length Value
0 1 Deleted? 1=Yes 0=No
1 1 Level = Integer
4 4 Previous Sector ID = Integer
8 4 Next Sector ID = Integer
20 4076 Payload
The "Payload" is a byte-code stream that can be used to construct a series
of data chunks. For our purpoes, there are six kinds of chunks:
* Path "push" operation (integer or byte sequence)
* Path "pop" operation
* Simple data (byte sequence)
* Segmented data (segment index + byte sequence)
* Simple key-value pair (integer => byte sequence)
* Long key-value pair (byte sequence => byte sequence)
The path operations define the logical position of the other kinds of data,
and are central to extracting data from the file. It is a primitive sort of
"file system" whose "folders" are usually (but not always) integers.
For example, the file may "push" the numbers 3, 1, and 5 onto the path, in
which case the next piece of data will have a path address of [3].[1].[5].
After a "pop" operation, the next piece of data will have the address [3].[1],
and so on.
A "simple data" chunk is just a sequence of bytes; its path will determine how
to interpret its contents. Most byte sequences in fmp12 need to be "decrypted"
by XOR'ing every byte with the hex value 0x5A.
Segmented data refers to data that does not fit into a single chunk, or even
in a single block. Typically, large strings or objects are split into 1000-byte
segments that share a path. Each segmented data chunk includes a sequential index
that can be used to reconstruct the large object.
Key-value pairs are the most common kind of chunk; multiple key-value pairs
with the same path can represent associative arrays or records. The keys may be
integers or strings (but usually integers), and the values are byte sequences.
The "Codes" sections will describe the byte codes that can be used to decode
the six chunk types. By implementing them, any FileMaker file can be read
into memory. The "Path Structure" sections will describe how to convert these
raw chunks into meaningful data structures.
fp5 Codes
==
Each chunk can usually be identified by its first byte, although in a few cases
examining the second byte is necessary.
The possible chunk types and structures in fp5 files are:
Simple key-value
~~
Offset Length Value
0 1 0x00
1 1 N = Length (Integer)
2 N Value
Key = 0x00 (Integer)
Offset Length Value
0 1 0x40 <= C <= 0x7F
1 1 N = Length (Integer)
2 N Value (Bytes)
Key = C - 0x40 (Integer)
Offset Length Value
0 2 0xFF (0x40 <= C <= 0x80)
2 C-0x40 Key (Bytes)
C-0x3E 2 N = Length (Integer)
C-0x3C N Value (Bytes)
Long key-value
~~
Offset Length Value
0 1 0x01 <= C <= 0x3F
1 1 K = Key Length (Integer)
2 K Key (Bytes)
2+K 1 N = Length (Integer)
2+K+1 N Value (Bytes)
Offset Length Value
0 2 0xFF (0x01 <= K <= 0x04)
2 K Key (Bytes)
2+C 2 N = Length (Integer)
2+C+2 N Value (Bytes)
Simple data
~~
Offset Length Value
0 1 0x80 <= C <= 0xBF
1 C-0x80 Value (Bytes)
Path pop
~~
Offset Length Value
0 1 0xC0
Path push
~~
Offset Length Value
0 1 0xC1 <= C <= 0xFE
1 C-0xC0 Value (Bytes)
fmp12 Codes
==
As with the fp5 codes, each chunk can usually be identified by its first byte,
although in a few cases examining the second byte is necessary.
The possible chunk types and structures are:
Simple data
~~
Offset Length Value
0 1 0x00
1 1 Bytes
Offset Length Value
0 1 0x08
1 2 Value (Bytes)
Offset Length Value
0 2 0x0E 0xFF
2 5 Value (Bytes)
Offset Length Value
0 1 0x10 <= C <= 0x11
1 3+(C-0x10) Value (Bytes)
Offset Length Value
0 1 0x12 <= C <= 0x15
1 1+2*(C-0x10) Value (Bytes)
Offset Length Value
0 1 (0x19 | 0x23)
1 1 Value (Bytes)
Offset Length Value
0 1 0x1A <= C <= 0x1D
1 2*(C-0x19) Value (Bytes)
Simple key-value
~~
Offset Length Value
0 1 0x01
1 1 Key (Integer)
2 1 Value (Bytes)
Offset Length Value
0 1 0x02 <= C <= 0x05
1 1 Key (Integer)
2 2*(C-1) Value (Bytes)
Offset Length Value
0 1 0x06
1 1 Key (Integer)
2 1 N = Length (Integer)
2 N Value (Bytes)
Offset Length Value
0 1 0x09
1 2 Key (Path Integer)
2 1 Value (Bytes)
Offset Length Value
0 1 0x0A <= C <= 0x0D
1 2 Key (Path Integer)
2 2*(C-9) Value (Bytes)
Offset Length Value
0 1 0x0E
1 2 Key (Path Integer)
3 1 N = Length (Integer)
4 N Value (Bytes)
Long key-value
~~
Offset Length Value
0 1 0x16
1 3 Key (Bytes)
4 1 N = Length (Integer)
5 N Value (Bytes)
Offset Length Value
0 1 0x17
1 3 Key (Bytes)
4 2 N = Length (Integer)
6 N Value (Bytes)
Offset Length Value
0 1 0x1E
1 1 K = Key Length (Integer)
2 K Key (Bytes)
2+K 1 N = Value Length (Integer)
2+K+1 N Value (Bytes)
Offset Length Value
0 1 0x1F
1 1 K = Key Length (Integer)
2 K Key (Bytes)
2+K 2 N = Value Length (Integer)
2+K+2 N Value (Bytes)
Segmented data
~~
Offset Length Value
0 1 0x07
1 1 Segment index (Integer)
2 2 N = Length (Integer)
4 N Value (Bytes)
Offset Length Value
0 1 0x0F
2 1 Segment index (Integer)
3 2 N = Length (Integer)
5 N Value (Bytes)
Path push
~~
Offset Length Value
0 1 0x20 | 0x0E
1 1 Value (Integer)
Offset Length Value
0 2 (0x20 | 0x0E) 0xFE
1 8 Value (Bytes)
Offset Length Value
0 1 0x28
1 2 Value (Path Integer)
Offset Length Value
0 1 0x30
1 3 Value (Path Integer)
Offset Length Value
0 1 0x38
1 1 N = Length (Integer)
2 N Value (Bytes)
Path pop
~~
Offset Length Value
0 1 (0x3D | 0x40)
No-op
~~
Offset Length Value
0 1 0x80
fp5 Path Structure
==
fp5 files can contain only one table, which makes things easy. The
known paths are:
[1]: Some kind of word index?
[3].[1]: Column names => Index pairs (String key, Integer value)
These column names are uppercase.
[3].[5].[X]: Metadata for the Xth column (Key-value pairs)
[1] => Column name
[2] => Second byte indicates column type (1=String, 2=Integer)
[5].[X]: Xth record in the table (Path Integer key, String or Integer value)
It appears that later paths located at [32] and up are references to external
FileMaker files on the same hard drive.
fmp12 Path Structure
==
fmp12 introduced the ability to store multiple tables in one file. Individual
tables have a similar layout to the fp5 files, but are stored in a root path
with a value of 128 or above.
For example, if the first table is stored at path [130], that table's column
metadata can be found at [130].[3].[5].
The semantics are slightly changed, as documented below. fmp12 appears to
eliminate the Integer column type in favor of all Strings.
[4].[1].[7].[X]: Metadata about the Xth table
[16] => Table name
[128+X].[3].[5].[Y]: Metadata for the Yth column of the Xth table
[128+X].[5].[Y]: Yth record in the Xth table (Path Integer key, String value)
Note that the sequence of tables is not necessarily compact.