-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sanitization and validation for files #3
base: main
Are you sure you want to change the base?
Changes from all commits
5903a04
dac2b73
dc84075
5e3e2d2
f485401
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,46 @@ | ||
use std::str::FromStr; | ||
|
||
use crate::{ | ||
common::timestamp, | ||
traits::{HasPath, TimestampId, Validatable}, | ||
APP_PATH, | ||
}; | ||
use mime::Mime; | ||
use serde::{Deserialize, Serialize}; | ||
|
||
use url::Url; | ||
#[cfg(feature = "openapi")] | ||
use utoipa::ToSchema; | ||
|
||
const MIN_NAME_LENGTH: usize = 1; | ||
const MAX_NAME_LENGTH: usize = 255; | ||
const MAX_SRC_LENGTH: usize = 1024; | ||
const MAX_SIZE: i64 = 10 * (1 << 20); // 10 MB | ||
|
||
const VALID_MIME_TYPES: &[&str] = &[ | ||
"application/javascript", | ||
"application/json", | ||
"application/octet-stream", | ||
"application/pdf", | ||
"application/x-www-form-urlencoded", | ||
"application/xml", | ||
"application/zip", | ||
"audio/mpeg", | ||
"audio/wav", | ||
"image/gif", | ||
"image/jpeg", | ||
"image/png", | ||
"image/svg+xml", | ||
"image/webp", | ||
"multipart/form-data", | ||
"text/css", | ||
"text/html", | ||
"text/plain", | ||
"text/xml", | ||
"video/mp4", | ||
"video/mpeg", | ||
]; | ||
|
||
/// Represents a file uploaded by the user. | ||
/// URI: /pub/pubky.app/files/:file_id | ||
#[derive(Deserialize, Serialize, Debug, Default, Clone)] | ||
|
@@ -31,6 +64,7 @@ impl PubkyAppFile { | |
content_type, | ||
size, | ||
} | ||
.sanitize() | ||
} | ||
} | ||
|
||
|
@@ -43,11 +77,66 @@ impl HasPath for PubkyAppFile { | |
} | ||
|
||
impl Validatable for PubkyAppFile { | ||
// TODO: content_type validation. | ||
fn sanitize(self) -> Self { | ||
let name = self.name.trim().chars().take(MAX_NAME_LENGTH).collect(); | ||
|
||
let sanitized_src = self | ||
.src | ||
.trim() | ||
.chars() | ||
.take(MAX_SRC_LENGTH) | ||
.collect::<String>(); | ||
|
||
let src = match Url::parse(&sanitized_src) { | ||
Ok(_) => Some(sanitized_src), | ||
Err(_) => None, // Invalid src URL, set to None | ||
}; | ||
|
||
let content_type = self.content_type.trim().to_string(); | ||
|
||
Self { | ||
name, | ||
created_at: self.created_at, | ||
src: src.unwrap_or("".to_string()), | ||
content_type, | ||
size: self.size, | ||
} | ||
} | ||
|
||
fn validate(&self, id: &str) -> Result<(), String> { | ||
self.validate_id(id)?; | ||
// TODO: content_type validation. | ||
// TODO: size and other validation. | ||
|
||
// Validate name | ||
let name_length = self.name.chars().count(); | ||
|
||
if !(MIN_NAME_LENGTH..=MAX_NAME_LENGTH).contains(&name_length) { | ||
return Err("Validation Error: Invalid name length".into()); | ||
} | ||
|
||
// Validate src | ||
if self.src.chars().count() == 0 { | ||
return Err("Validation Error: Invalid src".into()); | ||
} | ||
if self.src.chars().count() > MAX_SRC_LENGTH { | ||
return Err("Validation Error: src exceeds maximum length".into()); | ||
} | ||
|
||
// validate content type | ||
match Mime::from_str(&self.content_type) { | ||
Ok(mime) => { | ||
if !VALID_MIME_TYPES.contains(&mime.essence_str()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Declared mime type could be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. The user doesn't really have an incentive to lie about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm saying for normal users there's no need to lie about the content type of their files. |
||
return Err("Validation Error: Invalid content type".into()); | ||
} | ||
} | ||
Err(_) => { | ||
return Err("Validation Error: Invalid content type".into()); | ||
} | ||
} | ||
|
||
// Validate size | ||
if self.size <= 0 || self.size > MAX_SIZE { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we taking the client declared There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the user has any incentive to lie about the size of the file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something we can write into this specs crate? Should be as part of the |
||
return Err("Validation Error: Invalid size".into()); | ||
} | ||
Ok(()) | ||
} | ||
} | ||
|
@@ -97,7 +186,7 @@ mod tests { | |
fn test_validate_valid() { | ||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"/uploads/example.png".to_string(), | ||
"pubky://user_id/pub/pubky.app/blobs/id".to_string(), | ||
"image/png".to_string(), | ||
1024, | ||
); | ||
|
@@ -110,7 +199,7 @@ mod tests { | |
fn test_validate_invalid_id() { | ||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"/uploads/example.png".to_string(), | ||
"pubky://user_id/pub/pubky.app/blobs/id".to_string(), | ||
"image/png".to_string(), | ||
1024, | ||
); | ||
|
@@ -119,21 +208,60 @@ mod tests { | |
assert!(result.is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_validate_invalid_content_type() { | ||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"pubky://user_id/pub/pubky.app/blobs/id".to_string(), | ||
"notavalid/content_type".to_string(), | ||
1024, | ||
); | ||
let id = file.create_id(); | ||
let result = file.validate(&id); | ||
assert!(result.is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_validate_invalid_size() { | ||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"pubky://user_id/pub/pubky.app/blobs/id".to_string(), | ||
"notavalid/content_type".to_string(), | ||
MAX_SIZE + 1, | ||
); | ||
let id = file.create_id(); | ||
let result = file.validate(&id); | ||
assert!(result.is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_validate_invalid_src() { | ||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"not_a_url".to_string(), | ||
"notavalid/content_type".to_string(), | ||
MAX_SIZE + 1, | ||
); | ||
let id = file.create_id(); | ||
let result = file.validate(&id); | ||
assert!(result.is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_try_from_valid() { | ||
let file_json = r#" | ||
{ | ||
"name": "example.png", | ||
"created_at": 1627849723, | ||
"src": "/uploads/example.png", | ||
"src": "pubky://user_id/pub/pubky.app/blobs/id", | ||
"content_type": "image/png", | ||
"size": 1024 | ||
} | ||
"#; | ||
|
||
let file = PubkyAppFile::new( | ||
"example.png".to_string(), | ||
"/uploads/example.png".to_string(), | ||
"pubky://user_id/pub/pubky.app/blobs/id".to_string(), | ||
"image/png".to_string(), | ||
1024, | ||
); | ||
|
@@ -143,7 +271,7 @@ mod tests { | |
let file_parsed = <PubkyAppFile as Validatable>::try_from(&blob, &id).unwrap(); | ||
|
||
assert_eq!(file_parsed.name, "example.png"); | ||
assert_eq!(file_parsed.src, "/uploads/example.png"); | ||
assert_eq!(file_parsed.src, "pubky://user_id/pub/pubky.app/blobs/id"); | ||
assert_eq!(file_parsed.content_type, "image/png"); | ||
assert_eq!(file_parsed.size, 1024); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
use crate::{ | ||
traits::{HasPath, HashId}, | ||
APP_PATH, | ||
}; | ||
|
||
use serde::{Deserialize, Serialize}; | ||
|
||
#[cfg(feature = "openapi")] | ||
use utoipa::ToSchema; | ||
|
||
const SAMPLE_SIZE: usize = 2 * 1024; | ||
|
||
/// Represents a file uploaded by the user. | ||
/// URI: /pub/pubky.app/files/:file_id | ||
#[derive(Deserialize, Serialize, Debug, Default, Clone)] | ||
#[cfg_attr(feature = "openapi", derive(ToSchema))] | ||
pub struct PubkyAppBlob(pub Vec<u8>); | ||
|
||
impl HashId for PubkyAppBlob { | ||
fn get_id_data(&self) -> String { | ||
// Get the start and end samples | ||
let start = &self.0[..SAMPLE_SIZE.min(self.0.len())]; | ||
let end = if self.0.len() > SAMPLE_SIZE { | ||
&self.0[self.0.len() - SAMPLE_SIZE..] | ||
} else { | ||
&[] | ||
}; | ||
|
||
// Combine the samples | ||
let mut combined = Vec::with_capacity(start.len() + end.len()); | ||
combined.extend_from_slice(start); | ||
combined.extend_from_slice(end); | ||
|
||
base32::encode(base32::Alphabet::Crockford, &combined) | ||
} | ||
} | ||
|
||
impl HasPath for PubkyAppBlob { | ||
fn create_path(&self) -> String { | ||
format!("{}blobs/{}", APP_PATH, self.create_id()) | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
use crate::traits::HashId; | ||
|
||
#[test] | ||
fn test_get_id_data_size_is_smaller_than_sample() { | ||
let blob = PubkyAppBlob(vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]); | ||
let id = blob.get_id_data(); | ||
assert_eq!(id, "041061050R3GG28A"); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ mod bookmark; | |
mod common; | ||
mod feed; | ||
mod file; | ||
mod file_blob; | ||
mod follow; | ||
mod last_read; | ||
mod mute; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we cannot use
hash_id
. We have acreated_at
timestamp anyway, therefore it is duplicated information. It's a straightforward method for the user to post 3 times the same.gif
without creating 3 times the file on his homeserver or the indexer.Alternatively, we have
name
field, that could as well be the file and blob name instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the emphasis on hash_id for? I understand its use case for something like tags, but we don't need that here.
The use case you're mentioning is not solved by having the same hash_id. With the current way, a sane user can create one blob, one file and then use that one file anywhere he wants. Should we stop the ability of someone to create multiple blobs and multiple files? I don't think so. I think it's important to remember we're a proxy here and people can do whatever they want with their homeservers, so while we enable, we shouldn't really look into ways of prohibiting some use cases, no matter how fringe they might sound to us, when there's no problem with having them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not necessarily saying
hash_id
should go on the/file
object only, I am talking about the obvious big benefits ofhash_id
for the data blob. Are the/blobs
and the way Nexus uses them covered by the specs?There is no use for the timestamp ids.
Consider what happens when in pubky-app I am a shit poster and I reply to 10 users by dropping into the post editor modal the same animated gif to make fun of them.
This is barely an argument as the specs are created with the explicit intention of restricting the way data is written into homeservers into a common set of rules that allow interoperability between social pubky-app clients and indexers. A user can write whatever data he wants in whatever spec breaking way he wants and simply share a URI.
I think free storage saving for anyone using pubky-app according to specs is really good for such a small change (just hash_id for blobs). Might seem a stupid optimization this early but changing IDs and schemas post-launch will be harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For blob ids it does make sense to have hash_ids.