Parameters for importing data
The import mechanism options allow you to set rules for handling updates, duplicates and concerns.
Fields marked with an asterisk are mandatory fields. Key elements for transferring data between environments to maintain data consistency will be indicated.
System name - is used to identify the type of data for which settings are used. When importing to maintain data consistency between two environments, this name is irrelevant. When using more advanced downloading, it is useful to separate the import options into the types of data being downloaded and name them accordingly
What to do with update records - available options are OVERWRITE
, SKIP
, ASK_USER
.
Default way to update fields - available options are: OVERWRITE
, SKIP
, FILL_EMPTY
, ADD_NEW_VALUES
, OVERWRITE_BY_NOT_EMPTY
.
What to do with duplicate records - available options are OVERWRITE
, SKIP
, ASK_USER
.
Default way to update duplicate fields - available options are: OVERWRITE
, SKIP
, FILL_EMPTY
, ADD_NEW_VALUES
, FILL_EMPTY
.
What to do with questionable records - available options are ASK_USER
, SKIP
, SAVE_BEST_MATCH
.
Option | Meaning or available options |
What to do with update records | This option applies to import records having system IDs already existing in the database (duplicates by ID). Available options: override / skip / query the user |
What to do with duplicate records | The option applies to import records considered duplicates, without system ID. override / skip / query the user |
What to do with records with doubts | This option applies to import records with more than one match from the database. Available options: Ask the user / skip / save without resolving doubts (best fit) |
Set aside questions for the user at the end of the import | When to ask the user about concerns |
Note user answers | Whether to remember the user's previous answers |
Default way to update fields | Specifies how the fields of the overwritten record are to be updated by default. The selected action will be applied to all fields for which no custom update rules are specified. Options available: overwrite / omit / fill in the blank / overwrite with a non-empty value |
Rules for updating fields | Custom rules for updating selected fields. You can define any number of rules. For each rule, specify:
|
Maximum number of authors | The maximum number of authors to be uploaded to the author field (for publications). If there are more authors than the specified number in the imported publication, they are aggregated to the 'collective author' field |
Update external identifiers | Whether to update external identifiers in uniquely matched duplicates in the database during import, even if the duplicate is skipped. |
Update indicators | Whether to update the publication rates of uniquely matched duplicates in the database during import, even if the duplicate is skipped. |
Matching by external ID - blacklisting | The parameter allows you to indicate which external identifiers are to be ignored when verifying whether an imported record is a duplicate. This eliminates groups of local identifiers that may be duplicate between repositories (such as personnel identifier, directory identifier). Specify the system names of the identifiers from the local systems. |
Matching by external ID - white list | By default, during import, the system first attempts to match records based on external IDs. The ID whitelist parameter allows you to narrow the list of verified IDs to selected ones (e.g. ORCID, POL-on). Provide system names of IDs from local systems. After setting the parameter, the system will verify only the indicated ID during import - the others will be ignored. If the whitelist is used, the blacklist is unnecessary. |
Push record status | Do not change / Complete / Incomplete |
Detect nested dependencies | A parameter important especially when importing hierarchical data. It guarantees that the master record will be imported before the records in which it is nested (as long as it is in the list of records to be imported). |
Save without matching | The option allows you to disable deduplication mechanisms, which speeds up the import process. |
When setting import options for the operation of copying data between environments, the following settings are recommended:
update records: OVERWRITE
update fields: OVERWRITE
duplicate records: SKIP
duplicate fields: SKIP
Duplicate record min confidence - minimum percentage of matching records that results in considering them as duplicates - the higher, the more items must match
Ambiguous record min confidence - minimum percentage of matching records that results in considering them as ambiguous records and redirects to the editor's decision
Defer questions to the user at the end of the import - checking this option causes the import not to finish by itself, only the doubtful records will wait for the editor's decision
When setting up the import for the action of copying data between environments, it is recommended to check the flag "Save without matching" . → then the system does not verify whether it is downloading a duplicate, but only overwrites what it has already found in the system, based on the record ID.
In the import options, there is an option to check the flag Save log of the import process
- if unchecked, no additional logs will be saved after each import.