Upload and validate your data

Use the validation API endpoints to upload and validate data files, configure
validation rules, and download validation results.

Format a data file for validation

Before validation, your data files must:

  • Be in either Parquet or CSV format.
  • Use no compression, or gzip compression.
  • Have a time series index in one of the following formats:
Date or datetime formatExample
YYYY-MM-DD2023-01-01
YYYY-MM-DD HH:MM2023-01-01 23:59
YYYY-MM-DD HH:MM:SS2023-01-01 23:59:00

See the Data format reference for examples.

Set up the environment

Create a .env file and set up your access token as an environment variable.

You might also need to set REQUESTS_CA_BUNDLE as per the example below.

SIG_API_KEY="abcdef123.abcdef123.abcdef123..."
REQUESTS_CA_BUNDLE="/path/to/certs.pem"

Load third-party libraries and environment variables:

from dotenv import load_dotenv
from json import dumps
from os import environ
from requests import get, post, put
from time import sleep

load_dotenv()

API_BASE_URL = "https://api.sigtech.com"
SIG_API_KEY = environ.get("SIG_API_KEY")
HEADERS = {
    "Authorization": f"Bearer {SIG_API_KEY}",
}

Upload the file

First, use PUT /validation/upload?filename={filename} to create a
presigned upload URL:

# Create upload URL
filename = "master.csv"
url = f"{API_BASE_URL}/validation/upload?filename={filename}"
response = put(url=url, headers=HEADERS)
response_content = response.json()
// <Response [200]>
{
    "uploadUrl": "https://s3.eu-west-1.amazonaws.com/...",
    "uploadFormData": {
        "key": "...",
        "AWSAccessKeyId": "...",
        "x-amz-security-token": "...",
        "policy": "...",
        "signature": "..."
    }
}

Then, send a POST request to the upload URL. In your request, include the
form data object and data file contents:

# Upload file
url = response_content["uploadUrl"]
data = response_content["uploadFormData"]
with open(filename) as f:
    b = f.buffer.read()
files = {"file": b}
response = post(url=url, data=data, files=files)  # Success: <Response [204]>

Validate the file

Use POST validation/run/{filename} to start validating the file:

# Validate file
url = f"{API_BASE_URL}/validation/run/{filename}"
response = post(url=url, headers=HEADERS)  # Success: <Response [202]>

Get the validation results

Poll GET validation/results/{filename} for the results.

When the validation process has finished, the the status field of the
response JSON content is "SUCCEEDED" and the results field is
populated.

# Get validation results
url = f"{API_BASE_URL}/validation/results/{filename}"
status = None
while status != ("SUCCEEDED" or "FAILED"):
    response = get(url=url, headers=HEADERS)
    response_content = response.json()
    status = response_content["status"]
    if status == "RUNNING":
        sleep(5)
results = response_content["results"]
{
    "status": "SUCCEEDED",
    "results": {
        "lastValidatedOn": "2023-07-02T23:59:37.072786",
        "issuesDownloadUriCsv": "https://s3.eu-west-1.amazonaws.com/...",
        "issuesDownloadUriHtml": "https://s3.eu-west-1.amazonaws.com/...",
        "numberOfTotalRows": 6,
        "numberOfErrorRows": 1,
        "errorCountByRule": [
            {
                "name": "AbsoluteChangeRule",
                "description": "In open or close: absolute change > 9999.0",
                "numberOfErrors": 1,
            },
        ]
    }
}

The results field contains download URLs for the detailed results in CSV and
HTML format. To download the detailed results:

# Download detailed results
url = results["issuesDownloadUriCsv"]
results_filename = f"{filename}_results.csv"
#url = results["issuesDownloadUriHtml"]
#results_filename = f"{filename}_results.html"
response = get(url=url)
with open(results_filename, "w+b") as f:
    f.write(response.content)

Get the file's current validation configuration

Use GET validation/config/{filename} to retrieve the file's current validation config:

# Get validation config
url = f"{API_BASE_URL}/validation/config/{filename}"
status = None
while status != "SUCCEEDED":
    response = get(url=url, headers=HEADERS)
    response_content = response.json()
    status = response_content["status"]
    if status == "RUNNING":
        sleep(5)
config = response_content["config"]
// <Response [200]>
{
    "status": "SUCCEEDED",
    "config": {
        "schema": {...},
        "index": {...},
        "rules": {...},
    }
}

Get a list of available validation rules

Use GET validation/rules/{filename} to retrieve a list of the
validation rules available for the file:

# Get validation rules
url = f"{API_BASE_URL}/validation/rules/{filename}"
response = get(url=url, headers=HEADERS)
response_content = response.json()
// <Response [200]>
{
    "rules": [
        {
            "title": "Absolute change too large",
            "description": "...",
            "type": "AbsoluteChangeRule",
            ...
        },
        ...
    ]
}

See the Validation rules reference to learn more about the rules, their parameters,
and how to configure them.

Update the file's validation configuration

Use PUT validation/config/{filename} to update the file's
validation configuration:

# Update validation config
url = f"{API_BASE_URL}/validation/config/{filename}"
rules = [{
    **rule,
    "properties": {
        **rule["properties"],
        "gap_size": 1,
    },
} if "LargeGaps" in rule["type"] else rule for rule in config["rules"]]
payload = {"config": {**config, "rules": rules}}
response = put(url=url, headers=HEADERS, json=payload)
response_content = response.json()
//<Response [200]>
{
    "config": {
        "schema": {...},
        "index": {...},
        "rules": {...},
    }
}

Complete script

Here's the complete script to upload and validate a file:

from dotenv import load_dotenv
from json import dumps
from os import environ
from requests import get, post, put
from time import sleep

load_dotenv()

API_BASE_URL = "https://api.sigtech.com"
SIG_API_KEY = environ.get("SIG_API_KEY")
HEADERS = {
    "Authorization": f"Bearer {SIG_API_KEY}",
}

filename = "master.csv"

# Create upload URL
url = f"{API_BASE_URL}/validation/upload?filename={filename}"
response = put(url=url, headers=HEADERS)
response_content = response.json()

# Upload file
url = response_content["uploadUrl"]
data = response_content["uploadFormData"]
with open(filename) as f:
    b = f.buffer.read()
files = {"file": b}
response = post(url=url, data=data, files=files)

# Validate
url = f"{API_BASE_URL}/validation/run/{filename}"
response = post(url=url, headers=HEADERS)

# Get validation results
url = f"{API_BASE_URL}/validation/results/{filename}"
status = None
while status != ("SUCCEEDED" or "FAILED"):
    response = get(url=url, headers=HEADERS)
    response_content = response.json()
    status = response_content["status"]
    if status == "RUNNING":
        sleep(5)
results = response_content["results"]

# Download full validation results
#url = results["issuesDownloadUriHtml"]
#results_filename = f"{filename}_results.html"
url = results["issuesDownloadUriCsv"]
results_filename = f"{filename}_results.csv"
response = get(url=url)  # Success: <Response [200]>
with open(results_filename, "w+b") as f:
    f.write(response.content)

# Get available validation rules
url = f"{API_BASE_URL}/validation/rules/{filename}"
response = get(url=url, headers=HEADERS)
response_content = response.json()
available_rules = response_content["rules"]

# Get validation config
url = f"{API_BASE_URL}/validation/config/{filename}"
status = None
while status != "SUCCEEDED":
    response = get(url=url, headers=HEADERS)
    response_content = response.json()
    status = response_content["status"]
    if status == "RUNNING":
        sleep(5)
config = response_content["config"]

# Update validation config
url = f"{API_BASE_URL}/validation/config/{filename}"
rules = [{
    **rule,
    "properties": {
        **rule["properties"],
        "gap_size": 1,
    },
} if "LargeGaps" in rule["type"] else rule for rule in config["rules"]]
for rule in rules:
    print(dumps(rule, indent=4))
new_config = {**config, "rules": rules}
payload = {"config": new_config}
response = put(url=url, headers=HEADERS, json=payload)
response_content = response.json()
config = response_content["config"]
print(response)
print(response_content)
print(config)