Data migrations, from an eTMF perspective – Part 2

February 14, 2017

  • eTMF Resources

Every company using an eTMF system has conducted (or soon will) a study migration project of some scale. Whether it is finding a way to move an entire paper TMF into their new eTMF system, moving a CRO-conducted study into their eTMF, or transferring studies from a legacy eTMF system into the Wingspan eTMF, study migrations are an integral part of ensuring Completeness, Consistency and Compliance in your eTMF processes.

I sat down with Lou Pasquale, a Senior Director in Wingspan’s Professional Services organization, to learn more about his experiences leading various eTMF study migrations.

In the first part of this interview, Lou described common migration scenarios and some of the obstacles you can expect to face. Here in Part 2, Lou explains how Wingspan has refined its migration process to address these common migration obstacles.

Can you describe the strategy behind Wingspan’s migration process?

Our strategy is based on the assumption that a migration will be primarily automated with some manual components. TMF’s can be quite large and conducting a migration at any time after Study Start-up includes so many documents and so much metadata that an automated migration is almost always necessary.

How would you outline the steps in a Wingspan migration project?

There are four major steps in a migration project:

  1. Study Prioritization
  2. Document type mapping and content transformations
  3. Data dictionary alignment and metadata mapping
  4. Batching the migration


How do you prioritize studies?

We begin by determining which studies are in scope and which are out of scope for the migration. This is allows us to set realistic timelines and ensures all subsequent activities focus on the proper studies.  During this phase, determining which studies will be migrated using an automated approach (rather than a manual migration) is very helpful.

We then group the studies in a logical manner that allows reuse of setup and tool configurations steps.  For example, all studies from the same source system or same CRO will likely have similar content and data structures and fairly comparable data quality.  Perhaps all Phase 3 studies will be migrated first with a further refinement by CRO or source system.  Studies grouped in this manner will probably require the same (or very similar) data setup, cleanup, and document type / attribute value mappings so a fair amount of the time spent on these activities can be leveraged for all studies in the group.  With any luck, the project can also work with the same point of contact at the CRO or within the client’s organization to resolve questions about the data and content.

Some migrations add another level of grouping by prioritizing ongoing studies, closed studies or by focusing on pivotal studies that have a more immediate impact on business operations.  If a prioritization approach is used it can be laid on top of the logical study grouping to determine the overall order for study migrations.  Priorities may change over the course of a large project to migrate many TMFs, but the decisions made during the logical grouping and initial prioritization can inform subsequent decisions about which studies are the current focus for the migration.

Once the studies are prioritized and grouped, how do you map the document types?

Each document type from the source system is mapped to a specific document type in the target eTMF.  The TMF Reference Model has made type mapping much more straightforward.  A source system that closely maps to the Reference Model is more easily migrated since much of the initial data analysis is completed as represented by the Reference Model ID mappings in the source system(s).

What common obstacles do you face during document type mapping?

Oftentimes type mapping will identify some ambiguous mapping scenarios where a single document type in the source system maps to more than one document type in the target eTMF.  In these cases, some other data point in the source document’s data will be necessary to define the proper target type.  In the worst cases, source documents that do not cleanly map to a target document type may need to be held aside for a “hard” mapping to a specific target document type (e.g., migrate this document ID to that target document type).  Ultimately, if a document cannot be mapped successfully it can be withheld from the automated migration for manual processing during the remediation phase of the migration.  One-to-one mapping on a per-document basis is labor intensive and subject to inconsistencies if the manual migration steps are not tightly controlled. It should be avoided whenever possible.

What types of content transformations happen during this step?

Document content file formats require specific analysis. Does the source system have native content files and PDF renditions?  Are documents eApproved or eSigned?  Are signature pages manifested in the source system at time of viewing or are they part of the PDF content file / rendition?  In some cases – usually for closed studies –  only PDF files are migrated while for ongoing studies the native content and PDF renditions are both migrated.  As with all aspects of the migration, specific decisions on the handling of content files during the migration will be recorded in the Migration Plan.

The next step is Data Dictionary alignment and data type/value mapping. What is involved here?

It is during this part of the migration that you get a good picture of the data quality (or lack thereof) in the source system.  Invariably some aspect of the source system’s dictionary values for Vendor Names, Committee Types, Languages, etc. will vary from those of the destination eTMF.  The variations may be minor such as abbreviations in vendor names or they may be more challenging like different field lengths and data types.  All decisions made regarding the data dictionary alignment are recorded in the Migration Plan making it clear how source system data will be represented in the target system.

Not only mapping decisions are recorded in the plan, but other changes to the source data are also detailed.  Truncating source data fields to fit in the target system fields or otherwise changing the data value is a specific data transformation that occurs quite often in a migration.  The transformation rules will be fully documented in the Migration Plan and some provision may be made to retain the original values in an external data file for later use.

In some cases data elements of the source system may not have a place to “live” in the target eTMF.  This often occurs when migrating a study from a generic EDMS into a TMF-specific repository.  The generic source EDMS may have required attributes to support source system processing that have no equivalent in the destination eTMF.  Like data truncations or other transformations, this scenario will be documented in the Migration Plan and, if necessary, data not migrated may need to be retained in an external data file for archival purposes and to support inspections.

Conversely, there may be required data elements in the target eTMF that have no corresponding value in the source system.  The process for determine these elements – either specific values or a “data not present” concept – is also documented in the Migration Plan.

Once you have completed all the mapping, you batch the migration. What happens during this step?

Many study TMFs contain tens of thousands of documents.  If you are migrating an entire program of studies, the documents can number in the millions.  Simply “turning on” the migration and letting the tools chug through those millions of documents is rarely advisable.  Daily users of an in-production target eTMF may be impacted by a brute force migration during normal operating hours.  To address this, we break the studies in manageable chunks or batches of documents.  We can also “throttle” the rate at which the migrations tools load documents – running things more slowly during peak usage times and then migrating at full speed when most users are not on the system.  This combination of manageable batches and the ability to dial up / dial down the migration rate allows us to keep the migration flowing while impacting the users as little as possible. The migration should also include a Communication Plan to keep users informed of when migration activity is occurring.  This will let them know when they can expect to see a study’s documents in the system but can also help them anticipate when the peak loading times will be in the event certain system functions may be less available (e.g., full-text or advanced indexing features can sometimes lag during large, fast-paced migration activates).

At the end of the migration, how do you handle documents that were excluded from the automated migration?

On some level, the migration can be thought of as an accounting problem.  X documents for automated migration + Y documents to manually migrate + Z documents that will not be migrated = the total of all documents in the source system.  When the automated migration is complete some number of documents will have failed.  These will be remediated during subsequent automated migration runs or uploaded manually to the target system.  When it is all said and done the “accounting problem” must add up so our team can demonstrate a known outcome for each document in the source system.  This final reckoning is communicated in the Migrations Summary report.

Why with such powerful tools to automate migrations, why does your approach rely on some level of manual processing?

Yes, we have made a number of references to manual migrations, either as a component of a larger automated migration or as a planned approach for smaller studies.  The decision to migrate a document manually versus using the automated tools often boils down to a simple cost-benefit analysis.  Are there hundreds of documents that failed the automated migration for the same reason?  If so, perhaps another automated pass is the best approach.  If, however, document failures tend to be more one-off scenarios, then manual remediation may be the preferred approach.  Regardless of why a document is selected for manual migration, who will do this manual work can be a burden to our clients.  To address those concerns, Wingspan offers Full Service Provider (FSP) resources.  These trained Study Owners, Documents Specialists, and Contributors can be added to a migration project to handle all manual processing. You can contact us for more information about these services.

Lou, thank you for taking the time to walk us through a Wingspan eTMF migration. 

You’re welcome. While migration projects can seem very overwhelming, we have established a very effective process with checkpoints along the way to help our clients better understand where we are in the process and what will happen to their data and documents. Our migrations to-date have been 100% “clean”, that is to say, we have accounted for each source system document with a known outcome in the migration.  As we say, “We have not lost a document yet.”