1. Validation
6 min read
Draft: 1
Introduction
Validation of the input data is one of the most important functions of any system.
A large part of the systems works in this schema:
- get some input values (optional)
- do processing
- return some response (optional)
For example, when a user sends a GET request to /api/v1/users you may not need to validate any input parameters. Unless the endpoint is secured and you need to validate an API key. But if the user passes optional pagination parameters /api/v1/users?page=5&limit=100 you may want to validate if the limit is in a range that makes sense in terms of performance.
When a user sends a GET request to /api/v1/users/:user_id you may need to check if user_id is a proper UUID.
When a user sends a POST request to /api/v1/users with the following payload
{
"email": "string",
"password": "string",
"repeat_password": "string"
}
You should check at least:
- if the email is a proper-looking email address
- if the password is not empty
- if repeat_password content matches the content of the password
(This is a short version for the sake of simplicity. The real-life scenario is more complicated.)
You need to understand that input data may not only come from the user. Another automated system may want to communicate with your system. You may ask LLM to process some data and return a response as JSON structured data that you will want to validate. You may want to take some input data from other third-party systems or databases.
Another important source of input for your system is malicious actors who try to bypass the security of your system and may want to achieve it by submitting data that causes side effects on which your system is not prepared.
For web applications, you have 3 layers where you should validate data:
- client application
- server application
- database
Validation of data on each of these layers is important. You should never neglect any of these layers.
Usually, client application-side validation is easy to bypass by malicious actors. However, proper validation on client application makes the whole system easy to use.
Server-side validation is the first real line of defense for your application. On the server side, it's much easier to validate data than on the database side.
The database side is the last line of defense against manipulated data for your application. Using database constraints you will check uniqueness and proper values for foreign keys. Also, the database will return errors if you send too long strings or data that has a wrong type.
In this chapter, we will focus on server-side validation.
String validation
We will create a simple web application that will demonstrate multiple examples of how to handle data validation.
Cargo.toml
[package]
name = "validation"
version = "0.1.0"
edition = "2024"
[dependencies]
axum = "0.8"
serde = { version = "1.0", features = ["derive"] }
tokio = { version = "1.44", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
validator = { version = "0.20", features = ["derive"] }
src/main.rs
use axum::{
Json, Router,
http::StatusCode,
response::{IntoResponse, Response},
routing::post,
};
use serde::{Deserialize, Serialize};
use tokio::net::TcpListener;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
use validator::{Validate, ValidationErrors};
#[derive(Debug, Deserialize, Serialize, Validate)]
pub struct StringPost {
#[validate(length(max = 256, min = 2, message = "It cannot be empty. Minimal length = 2, maximal length = 256."))]
string: String,
}
async fn string_validation(Json(input): Json<StringPost>) -> Result<impl IntoResponse, AppError> {
input.validate()?;
Ok((StatusCode::OK, Json(input)).into_response())
}
#[tokio::main]
async fn main() {
tracing_subscriber::registry()
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| format!("{}=debug", env!("CARGO_CRATE_NAME")).into()),
)
.with(tracing_subscriber::fmt::layer())
.init();
let app = app();
let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
tracing::debug!("listening on {}", listener.local_addr().unwrap());
axum::serve(listener, app).await.unwrap();
}
fn app() -> Router {
Router::new().route("/string-validation", post(string_validation))
}
#[derive(Debug)]
pub enum AppError {
Validation(ValidationErrors),
}
impl IntoResponse for AppError {
fn into_response(self) -> Response {
let (status, error_message) = match self {
AppError::Validation(error) => {
let error_message = format!("Validation problem: {:?}", error);
(StatusCode::BAD_REQUEST, error_message)
},
};
let body = Json(ResponseError {
error: error_message,
});
(status, body).into_response()
}
}
impl From<ValidationErrors> for AppError {
fn from(inner: ValidationErrors) -> Self {
AppError::Validation(inner)
}
}
#[derive(Debug, Serialize)]
pub struct ResponseError {
pub error: String,
}
It's a super simplified example of a web application. Don't take any advice from it regarding structure and error handling. I just wanted to demonstrate how to use the validator
crate.
Let's see how it works. When we will send a request that is valid.
curl --location 'http://localhost:8080/string-validation' --header 'Content-Type: application/json' --data '{
"string": "test"
}
'
{"string":"test"}
We will see the data that we sent in response and we will receive a 200 response code.
Let's try to send invalid data.
curl --location 'http://localhost:8080/string-validation' --header 'Content-Type: application/json' --data '{
"string": "t"
}
'
{"error":"Validation problem: ValidationErrors({\"string\": Field([ValidationError { code: \"length\", message: Some(\"It cannot be empty. Minimal length = 2, maximal length = 256.\"), params: {\"value\": String(\"t\"), \"min\": Number(2), \"max\": Number(256)} }])})"}
We get an error from the validator and we will receive a 400 response code.
Ok, this error doesn't look nice. It should be handled better.
But let's get back to what is important - validation. Why would we want to validate if a certain string has certain numbers of chars? There are at least two good reasons:
- we know that certain values should not take more than X chars
- we have a VARCHAR type field in a database to store a certain value
If we know that a certain string should have a certain format and should not exceed a certain length a validation for a length is just a first step in how we should handle it. We can also use a regular expression to validate its content. For example, postal codes in Poland have xx-xxx format. Knowing that we may want to use VARCHAR(6) type of field in a database for storing them and use length + regex validation to make sure that we store only postal codes in a valid format.
If we have a VARCHAR type of field in a database where we store values we want to validate data for a length before we send a request to a database because if we send there data that will exceed field length we will receive an error. The database keeps its eye on data consistency. If we try to store the wrong data we will get an error. Handling database errors in server applications has some performance costs.
Because of the cost of handling errors some malicious actors may want to use such a situation for causing a Denial Of Service type of problem in our application. For example we have some endpoint that for some reason is not secured very well and by sending too long data in payload malicious actors may cause an error on the database side and an internal server error on our application side. Such a situation could be exploited in a Denial Of Service attack.
Email validation
Let's modify our sample program to check how to handle email validation.
src/main.rs
[..]
#[derive(Debug, Deserialize, Serialize, Validate)]
pub struct EmailPost {
#[validate(email(message = "You need to provide a valid email address."))]
email: String,
}
async fn email_validation(Json(input): Json<EmailPost>) -> Result<impl IntoResponse, AppError> {
input.validate()?;
Ok((StatusCode::OK, Json(input)).into_response())
}
[..]
fn app() -> Router {
Router::new()
.route("/email-validation", post(email_validation))
.route("/string-validation", post(string_validation))
}
[..]
The same way as with string validation we send a properly looking payload.
curl --location 'http://localhost:8080/email-validation' --header 'Content-Type: application/json' --data '{
"email": "test@example.com"
}
'
{"email":"test@example.com"}
We will see the data that we sent in response and we will receive a 200 response code.
Let's try to send invalid data.
curl --location 'http://localhost:8080/email-validation' --header 'Content-Type: application/json' --data '{
"email": "test"
}
'
{"error":"Validation problem: ValidationErrors({\"email\": Field([ValidationError { code: \"email\", message: Some(\"You need to provide a valid email address.\"), params: {\"value\": String(\"test\")} }])})"}
We get an error from the validator and we will receive a 400 response code.
This is a simple, regular expression-based email validation. It doesn't check if email address exists. The best way to check if a given email address exists is to send them an email and ask for confirmation.
You want to validate all email addresses provided by the user. For at least two practical reasons:
- you may want to contact the user, and you need a valid (at least at the time of registration) email
- you may be obligated by the law to verify the identity of a user of your platform, and having a valid email address may be part of the process
You may want to use two validators to check email. For example if you are using a VARCHAR type of field for storing emails in a database.
#[derive(Debug, Deserialize, Serialize, Validate)]
pub struct EmailPost {
#[validate(email(message = "You need to provide a valid email address."), length(max = 256, min = 10, message = "It cannot be empty. Minimal length = 10, maximal length = 256."))]
email: String,
}
It's not a good practice to use TEXT fields to store emails in a database.
The first reason is that you should always think about performance. Most of the email addresses are short. You don't need to take into account edge cases if the user has a really long email address. The 256 chars for storing email addresses ought to be enough for anybody.
In most cases, you want to save emails in fields with UNIQUE constraints. A few years ago MySQL database had a limit for UNIQUE contraint length. Right now it may not be problematic, but you can not expect that the database will handle UNIQUE constraints correctly on a TEXT field.