The Data Cleaning Suite provides a set of endpoints to:
- Authenticate
- Create a Job
- Upload a File
- Update Mappings & Enrichments
- Retrieve the Enriched File
Below is a flow diagram that outlines how to use these endpoints effectively.
Before using any of the endpoints, you must authenticate. This ensures you have the necessary permissions to access the data.
POST /authenticateThis endpoint creates a data cleaning job, which acts as a container for the file and subsequent actions. Each job is uniquely identified by an id.
POST /dataCleaning/jobs{
"name": "Data Cleaning Job 03-10-20xx"
} {
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2025-02-07T14:07:10.8766667",
"modifiedAt": "2025-02-07T14:07:10.8766667",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "created",
"source": "dataCleaning",
"archived": false,
}This endpoint uploads the file to be processed. The id from the job creation step must be passed in the path to associate the file with the job.
The file must be sent as form-data, and you must specify whether the file includes a header using the hasHeader property.
POST /dataCleaning/jobs/{id}/upload{
"correlationId": "2a7b5537-3950-4903-810d-9814c91d5564",
"id": "9e824d9e-0e77-43ef-1f10-08dd712f5830",
"sourceFilename": "test-file-input.csv",
"hasHeader": true,
"createdAt": "2025-04-03T12:24:32.2266667",
"modifiedAt": "2025-04-03T12:24:32.2266667",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"status": "uploaded",
"active": true
}This endpoint maps the columns of the uploaded file to the required fields for matching.
Use the available ENUMs (column headers) to match your file's columns to the Creditsafe database.
NOTE The first column starts at position '0'.
PUT /dataCleaning/jobs/{id}/mappings[
{
"mapping": "companyId",
"value": "0"
},
{
"mapping": "orgNumber",
"value": "1"
},
{
"mapping": "name",
"value": "2"
}
]NOTE Ensure your column headers match the available ENUMs as closely as possible. This forms the basis of the matching process.
This endpoint submits the file for matching against the Creditsafe database. The job id must be passed in the path, and an empty request body is required.
POST /dataCleaning/jobs/{id}/submitThis endpoint retrieves the current status of the job. It can be used periodically to track progress. The job must reach the jobMatchingComplete status before proceeding to enrichment.
GET /dataCleaning/jobs/{id}{
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2019-08-24T14:15:22Z",
"modifiedAT": "2019-08-24T14:15:22Z",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "jobMatchingComplete",
"source": "dataCleaning",
"jobSummary": {
"totalRows": 0,
"matched": 20,
"manualMatched": 0,
"unmatched": 0,
"duplicates": 0
},
"archived": true
}This endpoint applies the desired enrichment type to the matched data. Enrichment types include:
- basic
- basicPlus
- standard
PUT /dataCleaning/jobs/{id}/enrichmentsIt is possible to remove properties not required for enrichment credit type. It is not possible to add additional tags beyond the maximum allowable tags for that credit type
{
"enrichments": [
{
"enrichment": "general.safeNumber"
},
{
"enrichment": "general.connectId"
},
{
"enrichment": "general.ggsId"
},
{
"enrichment": "general.companyName"
},
]
}NOTE Refer to the API documentation for the full list of allowable enrichments for each type.
This endpoint submits the request to enrich the matched data. The job id must be passed in the path, and an empty request body is required.
POST /dataCleaning/jobs/{id}/enrichThis endpoint is used to check the status of the submission request, this endpoint may be used multiple times for periodic checks.
It is important to note that the endpoint after this point (Return Enriched File) can not be carried out without the 'Matching' process to reach a status of enrichmentComplete.
The data cleaning job id needs to be passed into the path.
GET /dataCleaning/jobs/{id}{
"id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
"name": "Testing From Technical Author",
"createdAt": "2019-08-24T14:15:22Z",
"modifiedAT": "2019-08-24T14:15:22Z",
"managingUserId": 123456789,
"managingCustomerId": 987654321,
"owningCustomerId": 987654321,
"owningUserId": 123456789,
"status": "enrichmentComplete",
"countryCode": "GB",
"portfolioId": "string",
"source": "dataCleaning",
"jobSummary": {
"totalRows": 0,
"matched": 20,
"manualMatched": 0,
"unmatched": 0,
"duplicates": 0
},
"jobEnrichmentSettings": {
"creditType": "basic"
},
"archived": true
}This endpoint retrieves the completed, enriched file. The job id must be passed in the path.
By default, the response is a .csv file, but if the file contains fewer than 300,000 rows, it can also be returned as .xlsx.
GET /dataCleaning/jobs/{id}/enrichedFile{
"correlationId": "string",
"filePath": "string"
}