The Data Cleaning Suite provides a set of endpoints to:
- Authenticate
- Create a Job
- Upload a File
- Update Mappings & Enrichments
- Retrieve the Enriched File
Below is a flow diagram that outlines how to use these endpoints effectively.
Before using any of the endpoints, you must authenticate. This ensures you have the necessary permissions to access the data.
POST /authenticate
This endpoint creates a data cleaning job, which acts as a container for the file and subsequent actions. Each job is uniquely identified by an id
.
POST /dataCleaning/jobs
{
"name": "Data Cleaning Job 03-10-20xx"
}
{
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2025-02-07T14:07:10.8766667",
"modifiedAt": "2025-02-07T14:07:10.8766667",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "created",
"source": "dataCleaning",
"archived": false,
}
This endpoint uploads the file to be processed. The id
from the job creation step must be passed in the path to associate the file with the job.
The file must be sent as form-data
, and you must specify whether the file includes a header using the hasHeader
property.
POST /dataCleaning/jobs/{id}/upload
{
"correlationId": "2a7b5537-3950-4903-810d-9814c91d5564",
"id": "9e824d9e-0e77-43ef-1f10-08dd712f5830",
"sourceFilename": "test-file-input.csv",
"hasHeader": true,
"createdAt": "2025-04-03T12:24:32.2266667",
"modifiedAt": "2025-04-03T12:24:32.2266667",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"status": "uploaded",
"active": true
}
This endpoint maps the columns of the uploaded file to the required fields for matching.
Use the available ENUMs (column headers) to match your file's columns to the Creditsafe database.
NOTE The first column starts at position '0'.
PUT /dataCleaning/jobs/{id}/mappings
[
{
"mapping": "companyId",
"value": "0"
},
{
"mapping": "orgNumber",
"value": "1"
},
{
"mapping": "name",
"value": "2"
}
]
NOTE Ensure your column headers match the available ENUMs as closely as possible. This forms the basis of the matching process.
This endpoint submits the file for matching against the Creditsafe database. The job id
must be passed in the path, and an empty request body is required.
POST /dataCleaning/jobs/{id}/submit
This endpoint retrieves the current status of the job. It can be used periodically to track progress. The job must reach the jobMatchingComplete
status before proceeding to enrichment.
GET /dataCleaning/jobs/{id}
{
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2019-08-24T14:15:22Z",
"modifiedAT": "2019-08-24T14:15:22Z",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "jobMatchingComplete",
"source": "dataCleaning",
"jobSummary": {
"totalRows": 0,
"matched": 20,
"manualMatched": 0,
"unmatched": 0,
"duplicates": 0
},
"archived": true
}
This endpoint applies the desired enrichment type to the matched data. Enrichment types include:
- basic
- basicPlus
- standard
PUT /dataCleaning/jobs/{id}/enrichments
It is possible to remove properties not required for enrichment credit type. It is not possible to add additional tags beyond the maximum allowable tags for that credit type
{
"enrichments": [
{
"enrichment": "general.safeNumber"
},
{
"enrichment": "general.connectId"
},
{
"enrichment": "general.ggsId"
},
{
"enrichment": "general.companyName"
},
]
}
NOTE Refer to the API documentation for the full list of allowable enrichments for each type.
This endpoint submits the request to enrich the matched data. The job id
must be passed in the path, and an empty request body is required.
POST /dataCleaning/jobs/{id}/enrich
This endpoint is used to check the status of the submission request, this endpoint may be used multiple times for periodic checks.
It is important to note that the endpoint after this point (Return Enriched File) can not be carried out without the 'Matching' process to reach a status of enrichmentComplete
.
The data cleaning job id
needs to be passed into the path.
GET /dataCleaning/jobs/{id}
{
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2019-08-24T14:15:22Z",
"modifiedAT": "2019-08-24T14:15:22Z",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "enrichmentComplete",
"countryCode": "GB",
"portfolioId": "string",
"source": "dataCleaning",
"jobSummary": {
"totalRows": 0,
"matched": 20,
"manualMatched": 0,
"unmatched": 0,
"duplicates": 0
},
"jobEnrichmentSettings": {
"creditType": "basic"
},
"archived": true
}
This endpoint retrieves the completed, enriched file. The job id
must be passed in the path.
By default, the response is a .csv
file, but if the file contains fewer than 300,000 rows, it can also be returned as .xlsx
.
GET /dataCleaning/jobs/{id}/enrichedFile
{
"correlationId": "string",
"filePath": "string"
}