Structuring your import CSV to indicate complex relationships

by Rachel Lynn — May 24, 2022

We have talked with some of our friends and colleagues who manage library and cultural heritage repositories about their import experiences and pain points. With customer feedback and our own experience importing content in mind, we decided to focus some design sprints on improving the bulk import process. Our goal: Make the CSV import process intuitive and efficient to use.

As discussed in Preparing your import, the Tenejo bulk importer supports structured content with complex “parent/child” relationships. In this post I’ll discuss in finer detail how the process works. I’ll walk through sample content to show how I would set up my CSV with Collections, Works and Files so they display in the repository with the intended relationships.

Use case: I have three Collections with multiple Works (or objects) that have varying numbers of files per Work. I really want to import the whole thing at once even though they have some complex relationships. I have Collections which have child Works and their associated files. And I also have one Collection that has two child Collections, which each have child Works and associated files. (see figure 1)

Sample Content Summary

Fig.1 - Sample Content Summary

Here is what the CSV looks like. For the sake of this exercise, I’m only showing the columns that demonstrate the complex relationships. (see figure 2)

Sample Import CSV

Fig.2 - Sample Import Content

For the importer to understand these complex relationships each object needs a unique identifier which will indicate the parent/child relationship in the CSV.

The Identifier for the Pamphlet Collection [Pamphlet000] is used as the Parent identifier for the child works – Dainty Desserts for Dainty People and Party Plans for Food and Games.
And likewise, the Identifier for the Postcard Collection [PC000] is used as the Parent identifier for the Greater Minnesota Postcard Collection and Minneapolis Postcard Collection.

A note about files. We found that when there are only a few files for a Work (ie. in a given cell), using the “pipe tilde pipe” separator works well. See the postcards in the example above, which only have two files. However, when there are many files like the pamphlets, it’s really hard to scan them for mistakes. So, you can unfurl them so each file occupies a line in the CSV. It’s much easier to scan down a column of file names and spot inconsistencies. This helps speed the process of proofing and correcting long CSVs prior to import.

The above use case shows just some of the many complex relationships that can exist in a repository. Feel free to make your own CSV to play around with some of your content. Let us know if you come up with questions or have any feedback.

Summary: Tenejo supports structured content with complex relationships via the CSV bulk importer. The sample above demonstrates how to structure the CSV to accommodate the complex relationships in your Collections, Works and files so they display as intended in your repository.

Are you interested to learn more about Tenejo? Let's chat. Check out the links below or shoot me an email.