本repo能给各位学弟学妹提供作业思路我非常开心,但是请注意,不要抄代码也不要卖代码,因为此repo已提交给学校的查重库,也已经提交给TA与教授。认真完成这个作业对代码能力是有很大提升的。有问题可以联系我:[email protected] 。
Large Efficient Flexible and Trusty (LEFT) Files Sharing Report
1. Abstract
The main purpose of this coursework is to synchronize large files with each other. Encryption and compression are used in the transfer process to make it faster and more secure. Breakpoint transfer and modification retransmission are also used to make the project more complete and flexible. The application layer protocol is based on TCP for transmission.
2. Introduction
2.1 Project requirement
The coursework requirements are as follows:
- all files or folders in the
./share
folder can be synchronized to thepeer
, afterpeer
'sip
address is entered in themain
function. Regardless of when thepeer
comes online and when the new files appear. - During the transfer, if the file peer is down for any reason, after it comes online, the file can continue to be transferred according to the progress until the transfer is completed.
- After a file has been modified, the modified file can be transferred to the
peer
, so that the file remains consistent. - When opening this application, if there is a
-encryption yes
option in the parameters, the whole process will be transferred using encryption. - All files transferred should be complete and error-free.
2.2 Background
In the information age scenario, people's data is frequently used in multiple devices. So in this case, an application for data synchronization is very necessary. Products like iCloud
, one drive
, drop box
, google cloud
etc. were made by major companies when they found this scenario. This proves the importance of data synchronization in the information age. File synchronization tools have been developed since the 90's [1]. Later the algorithm of data synchronization was improved again to reduce the transmission delay and improve the reliability [2] [3].
I designed the entire application layer transport protocol transport protocol, designed the entire application synchronization protocol, wrote the entire project code, and wrote this coursework report.
3. Methodology
3.1 Proposed protocol
This is the idea of application procedure.
3.1.1 Package
This is the package like:
package:
|--8Bytes-|
+----+----+-----------+-----------------------------------------+
|1234|5678| header | body |
+----+----+-----------+-----------------------------------------+
| |
| +body_length
+header_length
The header and body sections are not necessarily long, so as to provide better expandability and flexibility for future system upgrades. And Use first 8 bytes to divide a message.
- 1 to 4 bytes is
header length
, which is used to record the length ofheader
. - 5 to 8 bytes is
body length
, which is used to record the length ofbody
. - header holds meta information about this type of
message
. - body is the payload of
message
.
3.1.2 Header
The following table is used to display all the fields in the header:
header fields:
(* is must)
1. *methods
2. filename
3. start index
The header
contains the method
field of the message
, the filename
field and the start index
field for modification and retransmission. In order to make the system extensible and portable, the header
of the package
is transferred using json
. This makes it possible to add new fields to the header
and to make the header
logical hierarchy deeper.
In the case of different method
, different header fields are needed to match the functionality of that type of message
.
transfer protocol all methods in header
+------+--------+-------------------------------------------------------+
| \ | method | description |
+------+--------+-------------------------------------------------------+
| 1 | REQ | send request message to get messing file |
| 2 | SED | send send message to send a whole file |
| 3 | UPT | send update message to send a part of whole file |
| 4 | DEL | send delete message to delete all files in the list |
+------+--------+-------------------------------------------------------+
The body
needs to be loaded with different payload
s for different message
s in different headers.
The following table shows the meaning of the different fields in the case of different types of method
, whether they are required or not, and the type of payload
message in body
.
Methods | Filename | Start_index | Body |
---|---|---|---|
request | The name of need to resent files | Resent start index | - |
send | The name of need to sent files | - | A whole file |
update | The name of need to update files | Update data start index(in this coursework, default is 0) | The modified file information is read from the start position. |
delete | - | - | List of file names to be deleted |
3.1.3 Proposed functions and ideas
- I don't think files should be transferred in small chunks. This would cause waste of header information, waste of network bandwidth, and unnecessary trouble when writing code. In the process of sending a complete file, I use to transfer the metadata of the file in the
header
and the payload (all the contents of the file) in thebody
. - Use
compress
when the file is larger than500M
, because for small files, the compression process may take longer than the transfer. - When receiving, first add the file name to the
transfering_set
of thedatabase
, and these file names will be persisted when stopped to be used as the<REQ> message
for the next retransmission, and then add The size of the current file on the above is used asstart_index
.
4. Implementation
4.1 Steps of implementation
My steps to implement this coursework are following steps:
- Clarify the requirements of coursework.
- Start dividing the corresponding function into modules, and write the function name and the function of this function.
- Improve functional modules. The functions that can be put together form a class, and objects are used to perform functions.
- Form functions that can be reused in function modules, and try to make each function only do one thing.
- Clear the context of the functional modules, and start writing multi-thread and multi-process.
- Test and debug.
4.2 Programming skills
Object-oriented programming, modular and multithreading are used in this coursework.
4.2.1 Object-Oriented Programming
database
is regarded as a singleton object. For thread safety, a mutex lock is used for write protection when writing.- The transmission data package
package
uses the objectPackage
for packaging, automatically addingheader_length
andbody_length.
- All files that need to be synchronized are abstracted into
SyncFile
objects, andmtime
andsize
are automatically obtained.
4.2.2 Multithreading
- Every time a request is received by
listener
, areceiver
sub-thread is opened for data reception. - When the
receiver
receives data, open thedata_dump
thread to write the received data to the file. - When starting the program,
file_sys
will be opened to continuously detect file changes.
4.2.3 Modular
- Functions related to
IO
operations are placed inasysio.py
- Code related file systems are placed in
asysfs.py
- Code about
socket send
is inasystp.py
- Code about processing requests and sending information is in
aserver.py
- Code about data storage is in
db.json
- The configuration information about system configuration is in
config.json
- The code for the operation of the entire system is in
asys.py
main.py
is used to schedule multiple threads and the order of functions
5. Testing and results
5.1 Testing environment
tinycore linux
in Virtual box
in Windows 10
Linux system version
tc@box:~$ cat /proc/version
Linux version 5.4.3-tinycore (tc@box) (gcc version 9.2.0 (GCC)) #2020 SMP Tue Dec 17 17:00:50 UTC 2019
5.2 Testing plan
5.2.1 Pre test
- Use
Virtual box
to open each of the 3tinycore linux
instances, and useifconfig
to get the ip address of the instance. Getip_VM1,ip_VM2
. 2. - Type:
python3.6 main.py --ip ip_VM1,ip_VM2
on each of the 3 instances, and start the program. - Name each of the three instances:
VM_1
,VM_2
,VM_3
. - The files are all added using softlinks:
ln -s [source postion] [destination position]
. - Generate 2 test files
file_original
andfile_modified
, the first 0.1% of these two files are not identical in size, the last 99.9% are identical. (file_original
is all of "hello world", andfile_modified
is 10000 lines "modifiedinfo" at front of file)
5.2 Testing results
Order | Testing Operation | Result | Check |
---|---|---|---|
1 | Put a 20M small file into VM_1 |
VM_2 , VM_3 received |
MD5 is correct |
2 | Put a small file (size 50M) with 50 random contents into VM_1 , close VM_2 after 2 seconds, restart VM_2 after 2 seconds |
VM_2 , VM_3 received |
MD5 is correct |
3 | Put a 1000M large file into VM_2 |
VM_1 , VM_3 received |
MD5 is correct |
4 | Put a file_original , file into ./share of VM_2 . /share, name it test` |
VM_1 , VM_3 received |
MD5 is correct |
5 | Replace file_original with file_modified |
VM_1 , VM_3 received |
MD5 is correct |
6. Conclusion
In this coursework, I designed the application layer protocol for file synchronization, designed the system for file synchronization, and implemented it in python
. This system can transfer files to and from different devices with encryption, compression, breakpoint transfer, modification and retransmission, and it passed my test.
6.1 Future plan
- In terms of encryption, the file header should be included in the encryption.
- When receiving
package
, you should check if it is generated by this system to ensure security. - maintain a list for each file to record the hash of each file block, if the current hash is different from the recorded hash, the current block of the file has changed, just retransmit the file block.
7. Reference
[1] Christoffel, J. (1997, October). Bal-A Tool to Synchronize Document Collections Between Computers. In LISA (pp. 85-88).
[2] S. Agarwal, D. Starobinski, and A. Trachtenberg, "On the scalability of data synchronization protocols for PDAs and mobile devices," IEEE network, vol. 16, no. 4, pp. 22-28, 2002.
[3] J. Hughes, B. Pierce, T. Arts and U. Norell, "Mysteries of DropBox: Property-Based Testing of a Distributed Synchronization Service," in 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), Chicago, IL, USA, 2016 pp. 135-145.