Last updated

Data Cleaning Endpoints

The Data Cleaning Suite provides a set of endpoints to:

  • Authenticate
  • Create a Job
  • Upload a File
  • Update Mappings & Enrichments
  • Retrieve the Enriched File

Below is a flow diagram that outlines how to use these endpoints effectively.


Flow Diagram

Authenticate
Create Job
Upload File
Update Mappings
Submit Job
Check Job Status
Update Enrichments
Start Enrichment
Retrieve Enriched File

1. Authenticate

Before using any of the endpoints, you must authenticate. This ensures you have the necessary permissions to access the data.

Example Request

POST /authenticate

2. Create A Job

This endpoint creates a data cleaning job, which acts as a container for the file and subsequent actions. Each job is uniquely identified by an id.

Example Request

POST /dataCleaning/jobs

Example requestBody

{
  "name": "Data Cleaning Job 03-10-20xx"
}

Example Response

   {
        "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
        "name": "Testing From Technical Author",
        "createdAt": "2025-02-07T14:07:10.8766667",
        "modifiedAt": "2025-02-07T14:07:10.8766667",
        "managingUserId": 123456789,
        "managingCustomerId": 987654321,
        "owningCustomerId": 987654321,
        "owningUserId": 123456789,
        "status": "created",
        "source": "dataCleaning",
        "archived": false,
        
    }

3. Upload A Job File

This endpoint uploads the file to be processed. The id from the job creation step must be passed in the path to associate the file with the job.

The file must be sent as form-data, and you must specify whether the file includes a header using the hasHeader property.

Example Request

POST /dataCleaning/jobs/{id}/upload

Example Response

{
    "correlationId": "2a7b5537-3950-4903-810d-9814c91d5564",
    "id": "9e824d9e-0e77-43ef-1f10-08dd712f5830",
    "sourceFilename": "test-file-input.csv",
    "hasHeader": true,
    "createdAt": "2025-04-03T12:24:32.2266667",
    "modifiedAt": "2025-04-03T12:24:32.2266667",
    "managingUserId": 123456789,
    "managingCustomerId": 987654321,
    "status": "uploaded",
    "active": true
}

4. Update Mappings

This endpoint maps the columns of the uploaded file to the required fields for matching.

Use the available ENUMs (column headers) to match your file's columns to the Creditsafe database.

NOTE The first column starts at position '0'.

Example Request

PUT /dataCleaning/jobs/{id}/mappings

Example requestBody

[
  {
    "mapping": "companyId",
    "value": "0"
  },
  {
    "mapping": "orgNumber",
    "value": "1"
  },
  {
    "mapping": "name",
        "value": "2"
  }
]

NOTE Ensure your column headers match the available ENUMs as closely as possible. This forms the basis of the matching process.


5. Submit Job

This endpoint submits the file for matching against the Creditsafe database. The job id must be passed in the path, and an empty request body is required.

Example Request

POST /dataCleaning/jobs/{id}/submit

6. Return Job By Id Number

This endpoint retrieves the current status of the job. It can be used periodically to track progress. The job must reach the jobMatchingComplete status before proceeding to enrichment.

Example Request

GET /dataCleaning/jobs/{id}

Example Response

{
  "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
  "name": "Testing From Technical Author",
  "createdAt": "2019-08-24T14:15:22Z",
  "modifiedAT": "2019-08-24T14:15:22Z",
  "managingUserId": 123456789,
  "managingCustomerId": 987654321,
  "owningCustomerId": 987654321,
  "owningUserId": 123456789,
  "status": "jobMatchingComplete",
  "source": "dataCleaning",
  "jobSummary": {
    "totalRows": 0,
    "matched": 20,
    "manualMatched": 0,
    "unmatched": 0,
    "duplicates": 0
  },
  "archived": true
}

7. Update Enrichments

This endpoint applies the desired enrichment type to the matched data. Enrichment types include:

  • basic
  • basicPlus
  • standard

Example Request

PUT /dataCleaning/jobs/{id}/enrichments

Example requestBody

It is possible to remove properties not required for enrichment credit type. It is not possible to add additional tags beyond the maximum allowable tags for that credit type

{
    "enrichments": [
        {
        "enrichment": "general.safeNumber"
        },
        {
        "enrichment": "general.connectId"
        },
        {
        "enrichment": "general.ggsId"
        },
        {
        "enrichment": "general.companyName"
        },
    ]
}

NOTE Refer to the API documentation for the full list of allowable enrichments for each type.


8. Start Enrichment

This endpoint submits the request to enrich the matched data. The job id must be passed in the path, and an empty request body is required.

Example Request

POST /dataCleaning/jobs/{id}/enrich

9. Return Job By Id Number

This endpoint is used to check the status of the submission request, this endpoint may be used multiple times for periodic checks.

It is important to note that the endpoint after this point (Return Enriched File) can not be carried out without the 'Matching' process to reach a status of enrichmentComplete.

The data cleaning job id needs to be passed into the path.

Example Request

GET /dataCleaning/jobs/{id}

Example Response

{
  "id": "f31c786a-1fa8-44d2-8193-c61d77ca2acd",
  "name": "Testing From Technical Author",
  "createdAt": "2019-08-24T14:15:22Z",
  "modifiedAT": "2019-08-24T14:15:22Z",
  "managingUserId": 123456789,
  "managingCustomerId": 987654321,
  "owningCustomerId": 987654321,
  "owningUserId": 123456789,
  "status": "enrichmentComplete",
  "countryCode": "GB",
  "portfolioId": "string",
  "source": "dataCleaning",
  "jobSummary": {
    "totalRows": 0,
    "matched": 20,
    "manualMatched": 0,
    "unmatched": 0,
    "duplicates": 0
  },
  "jobEnrichmentSettings": {
    "creditType": "basic"
  },
  "archived": true
}

10. Return Enriched Job File

This endpoint retrieves the completed, enriched file. The job id must be passed in the path.

By default, the response is a .csv file, but if the file contains fewer than 300,000 rows, it can also be returned as .xlsx.

Example Request

GET /dataCleaning/jobs/{id}/enrichedFile

Example Response

{
  "correlationId": "string",
  "filePath": "string"
}