DMS NIH Webinar I Q & A

Q: What are the main differences between the new policy and the 2003 data sharing policy?

A: The main difference between the 2003 data sharing policy and the 2020Data Management and Sharing Policy is that the 2003 policy applied to applications that were seeking $500,000 per year or more, while the 2020Data Management and Sharing Policy is not limited and does not have a cost threshold in it,so it applies much more broadly. I think the other important distinction between the two is that the data sharing policy really only expected the submission of a relatively simple data sharing plan and didn't have as much sort of detail in the expectations for what we would expect people to be describing to NIH.

Q: What are some of the expectations around sharing qualitative data?

A: Taunton Paine: Yes, sure. So I think in terms of qualitative data, so the policy does apply to studies that potentially generate qualitative data, if the qualitative data in question meets the definition of scientific data, which is defined in the policy as the data of sufficient quality to validate and replicate the research findings. Qualitative data are often shared, and there are existing repositories that can accept qualitative data, but just as we've said for other kinds of scientific data, we've provided a number of sort of justifiable limitations and additional considerations for sharing that would also be applicable for qualitative data. I do want to point out, and I think this was mentioned during the webinar as well, that we have published for public comment earlier this year a draft framework for protecting research participant privacy while sharing data and will be finalizing that framework and discussing privacy protections in more detail in the second webinar.

Q; We received a few questions about sharing software, particularly code that maybe an investigator wrote themselves. Does that fall under what we expect to be shared?

A: So the Data Management and Sharing Policy does not expect researchers to attempt to maximize sharing of software or code either that are used in the research or that are the products of research itself. I do want to point out, however, that in the elements of a Data Management and Sharing Plan, there is an element that relates to software or code that would be necessary to access or to actually use the data itself, so there's an expectation that people would disclose if there are such tools or such software or code and where they can be found and if they will be sort of made available. I do want to also point out sort of separate from this, the NIH recently published a set of frequently asked questions about best practices for sharing software and code in those instances when software and code are shared.

Q: Regarding repositories and selecting the appropriate repository, this first question is with regards to generalist repositories like Figshare. If folks are already using that, should folks be looking for other kind of more specific, discipline-specific repositories, or is it okay to continue to use those generalist repositories?

A: The policy does not differentiate or express any preferences for which repositories investigators should use, so it really comes down to the investigators thinking about the considerations, practical considerations that are important in selecting a repository and for repositories that meet the kinds of criteria that we discussed earlier. So it really comes down to where ... which repositories investigators feel can best handle their data and can best maximize its usability and FAIR principles and also, in terms of resources and budget, which combination repositories make sense given the scope of the project.

Q: What about very large data sets? imaging, genomic data? Some repositories may not be able to handle the size and the file types. What should folks be thinking about?

A: I think that the ecosystem for repositories is constantly changing, and so I think for discipline-specific, for different data-specific types, it is possible to find repositories that can handle that data and also discuss with project officers different options for particular data types and where best to share it and make that successful.

Q: One of the resources mentioned earlier, that NIH actually has a kind of sortable, filterable list of repositories that NIH maintains, so if folks are looking for a place to start, that could be a great place to look at, and there's also some lists there. As you said, the ecosystem is large, and it's constantly changing, but we're trying to help people navigate that as much as possible, and the lists are on the sharing website.

Q: What if the data that is generated can't be shared during a repository? For example, it can't be de-identified, but maybe it could be shared with a data use agreement and shared sort of under other auspices that are not within a repository. What should folks be considering and how many years maybe the institution must retain this data?

A: The policy is really flexible in how data is most appropriately shared for given projects, and so the Data Management Plan should discuss, given the potential limitations for a specific project, what the best proposed plan for sharing the project might be. In terms of how long data should be shared, that really depends on a number of considerations. If it's shared in a repository, most of the repositories may have their own best practices. Institutions might also have their own policies with requirements for maintaining data and records, and there might be other applicable laws and regulations, so there are a lot of different considerations that could drive how long data should be shared.

Q: Regarding funding and budgets: If expenses to maintain the infrastructure for sharing data will extend beyond the project period, how should people consider putting in budget and funding requests for that type of cost?

A: In terms of what that timing would be, that's really flexible and left up to what makes sense for each project, but, yes, there would certainly be cases, probably most cases, when you would want to share data beyond the project period to ensure that the data you generated do remain useful. And so in that case, all costs to support the sharing, repository fees, for example, those must be incurred during the performance period, and that includes for scientific data that will be preserved and shared beyond the award period. As an example, if your DMS Plan proposes to preserve and share scientific data for 10 years in an established repository that does have a data deposition fee, the cost for that entire 10-year period must be paid prior to the end of the period of performance.

Q: Mentioned was allowable and unallowable costs and certain things like infrastructure to kind of conduct the research versus something that's very specific like a repository fee. What are those distinctions?

A: When you're thinking about an overall budget for your project, you will need to think about the different activities that you'll need to do to preserve and share your data, and some of that that is really specific to your project, that's most likely going to be the direct costs that you'll be including in your budget request, but there might be other types of costs that if your institution has resources available, institutional data librarians, for example, who are helping across projects, that might be funded out of your institutions and direct costs, and I know we can't answer every individual question about different situations, but generally it's project-specific data management and sharing costs versus institutional overhead costs that are spread across multiple projects. The applicants should keep in mind however you want to account for those costs, you just need to be consistent in how your applying direct and indirect costs throughout your budget and also as it applies to your DMS costs.

Q: Will NIH be funding larger amounts of money per award to account for these costs?

A: We're definitely understanding that this is going to be a new thing for money to think about these costs separately, and we do just want to make you aware that NIH does provide many resources to support data management and sharing generally. This includes in the form of repositories, tools and sometimes funding opportunities to support data sharing in certain cases. NIH will continue to assess data management and sharing costs to determine appropriate resources as needed in the future. This question also makes me think of some questions that I think we've been getting about whether NIH will be raising different caps or thresholds and limits, and on that question, I do want to turn to OPERA to provide any insight.

As mentioned, this is a new policy. We're using this initial year to kind of work out the kinks and assess how we're going to move forward, so with that being said, our ... NIH is currently finalizing the strategies for how we're going to address additional costs that either may be incurred during the project period or after the project period and more ... We'll provide more formal guidance on that in the near future.

Q: Regarding research involving humans: DMS plans do ask about some things that are included in something that would be reviewed by your IRB. Should folks be working with their IRBs ahead of applying, ahead of creating their DMS plan? What would you suggest?

A: NIH doesn't require that IRBs review DMS plans, but certainly part of the discussion [Indistinct] of the plan will include consideration that limitations to data sharing, and often those limitations come from informed consent or might come from IRB review, so as investigators develop those plans, if there's any questions about what are the appropriate applicant limitations, that's a good time to reach out to the IRB for clarification and just for discussion. And another time point to do that too is in developing informed consent forms and thinking about what might be appropriate language to describe future data sharing, and NIH issued in May 2022 some sample informed-consent language related to data sharing and future use of data that's a good resource for that, but that's another point in time that investigators could consider talking with IRBs as well.

Q: How does NIH intend to approach data-sharing requirements when working with sovereign native nations? I know that you've been developing some additional guidance on that.

A: There's actually several relevant provisions in the policy and in related guidance for when researchers are actually working with tribes. To repeat what's in the policy and in the guidance and say that NIH respects and recognizes tribal sovereignty and American Indian and Alaska Native -- the concerns that those entities sometimes have with data sharing, and we've indicated through an FAQ to the policy that justifiable limitations on data sharing do include when there are explicit tribal laws, regulations or policies that would prohibit or would limit disclosure of data in some way. Earlier this year, NIH actually proposed additional considerations for when working with tribes in a draft, supplemental information on responsible management and sharing of those data, and as was mentioned during the webinar, a central tenant of these proposed considerations is to facilitate respectful partnership and mutually agreed upon data management and sharing practices, which may ultimately require limitations on sharing in some of those cases, and we're actually working to finalize this document, and we plan to address these factors in more detail in a second webinar.

Q: Will there be additional guidance on controlled access sharing and associated data use agreements?

A: The draft supplemental information that was published earlier this year, there's actually a second one on protecting research participant privacy that goes into a little bit more detail about what NIH is thinking in terms of what would constitute sort of best practices and operational principles for sharing human research participant data and does try and actually provide a set of considerations for when it would be warranted to share data through a controlled access mechanism and some thoughts also on the use of data-use agreements, so we'll be finalizing that later this year as well and also try to take that up in more detail in the second webinar too.

Q: What about a competitive renewal? Are there any implications for what happens to data that were under kind of the prior award period?

A: For those projects where you have an ongoing award, you won't be expected retroactively to satisfy this policy. When you'd come under the policy is when you are coming in for a competitive renewal, and in that case, we do expect you to propose a data management and sharing plan, and the policy is quite flexible, so we'd like you to think on what data you will be generating, and for projects that have been ongoing and generated some data and will continue to add to that, you might need to think through that a little differently than a completely new project, but the idea is that we're looking for proactive planning, and so when you're coming in for the next segment, then think about the data that you'll be generating and whether there are any considerations about the data you have already generated that you'll continue to use, so I guess it's a little hard to answer that because it will vary depending on the situation, but the idea is that looking ahead you'll be considering what data you might be appropriate to share.

Q: What about the allowability of hyperlinks in a DMS plan?

A: Sure, so hyperlinks are not allowed per policy unless that they're specifically noted in a particular funding opportunity announcement or a form field instructions, and so that means that we will not be looking for hyperlinks, and, sorry, you should not include hyperlinks in your DMS plans. Where you'll more likely be telling us about what you've generated and

where it's shared would be the RPPR. So ... Well, you shouldn't have hyperlinks in your initial DMS plan. Once that's been approved and you start generating and sharing data, we certainly want to hear about the progress of doing so, and so you can point to where shared data might be located in your progress reports, but due to the hyperlink policy, we do not want you to include those links in your DMS plan.

Q: It has been mentioned in applicability that training and fellowship would be excluded from this policy. If in fact these recipients will be generating scientific data, will they still have to propose these DMS plans, or are they not required to be submitting these plans?

A: If you are coming in with a training or a fellowship application, you will not be submitting a DMS plan, and that's because the purpose of training and fellowship programs is to support training rather than research, and so consistent with other NIH data-sharing policies, the T's and the F's will not be subject to the DMS policy, training programs, support stipends, tuition fees, training-related expenses, travel, facilities and administrative costs and fellowship programs for stipends, tuition fees, training-related expenses and travel care, so these are not research-focused, and therefore they're not subject to the DMS policy, so as both the T's and the F's do not support the generation of scientific data, they will not be subject to this policy, and in general I've seen a number of questions coming in about certain specific activity codes, and NIH will continue to review the full scope of activity codes and will update FAQs accordingly to communicate which ones are subject to the policy and which are not, and we will provide more information on the website when that is available.

Q: Regarding the timing of sharing, must data be shared upon the first publication, or is it acceptable to share data once all publications have been submitted and approved?

A: The Data Management and Sharing Policy indicates that data should be shared by the time of the publication in a peer-reviewed journal. Otherwise, they can be shared by the end of the period of performance of the award.

Q: Who is responsible for confirming that the PI has complied with the plan? Is it NIH, or is it the institution?

A: In terms of compliance with the plan, what we are looking for is that what you indicated in your version of the plan that was approved, that you have followed through with that, and what exactly will be looked at will vary, but generally we're expecting that you'll put things like timelines for generating and depositing data in repositories, that type of thing. And so on an annual basis as part of the RPPR annual reporting process, we'll be asking the recipients to report on their progress implementing their specific DMS plan, and then the NIH staff that oversee that award will be looking at that information and seeing whether you did do what you said you would and working with you in case there are any challenges, and if anything changes over time, then you would have the opportunity to update your plan as appropriate. And if there's a question of who's responsible for that, NIH will certainly monitor your compliance, but we're also asking in your plan for you to indicate who at your institution will be responsible for looking at things from your side, so it really is a partnership there.

Q: Will data-sharing plans be made public, for example, in RePORTER or some other way?

A: we are looking to learn the best way to do this. We would like to make these public in a way that makes the most sense and make sure that that helps individuals locate data in case they want to use that in their research, for example, and so we're going to be looking at the plans we receive and thinking about how that information can be made available and where would make the most sense to do that, so that remains open, but we're definitely interested in that.

Q: What about negative data that are generated during a project that really aren't going into publication? Will those be up for sharing?

A: The policy was pretty clear back when it was published in October 2020 that NIH did want to include things like null results and data that underlied those as part of this policy, so if they're not published, it would be expected to be shared by the end of the period of performance.

Q: Regarding consent, what if participants do not want their data shared, and must a participant agree to having their data kind of shared before participating in the research?

A: This is something that really is described and indicated in the protocol and informed-consent document, so it should be very clear in informed-consent documents whether participation in a particular project involves sharing of data or does not, and often consent forms will have options for which critical aspects of participant [Indistinct] are required and which are not required [Indistinct] optional sometimes. Having data shared in the future could be optional, but it really is something that is specific to a particular protocol as described in informed consent.

Q: Is a preprint considered published with regards to the sort of the time point of sharing?

A: The description of the time point, the policy describes that sharing through a peer-reviewed publication in whatever format that publication comes through would be the time point for sharing data underlying that publication. So it is correct to say preprint, not peer-reviewed, does not kind of satisfy that time point.

Q: What might happen if the DMS plan submitted might not be complete or the NIH may not deem it fully acceptable?

A: We do want to provide as many resources as we can now and point you to folks at your institution who could help you write a good DMS plan the first time, but we do understand this will be a learning period at first as this is a new activity you might not have done before, and so depending on the information you do provide in your DMS plan with your application when the program staff assigned to your application is assessing the content there, they're going to be looking at your information there to ensure that the elements of the DMS plan have been adequately addressed and to assess the reasonableness of those responses, and so, depending on what they find there, if they're missing information and they're unable to determine if that is reasonable or not, then you would expect NIH staff to reach out to you during the just-in-time process to ask for clarification, and in some cases, they might ask for a revised DMS plan and that you would work together to get your DMS plan complete so that it can be approved, and applications selected for funding will only be funded when program staff determine that the DMS plan is complete and acceptable.

Q: Regarding sharing data that doesn't have a best practice or that may be useful but not necessary to replicate a finding: What should folks consider in terms of using a repository repository versus sort of keeping that data on an institutional server and making it available as needed?

A: There are two questions here, first about maybe whether there are best practices or standards for sharing data, and second data that maybe perhaps don't meet the definition of scientific data in a policy. So in the first case, we would, generally speaking, expect people to describe sort of anticipated standards that you'll be using when sharing data. You can indicate though that there are no sort of consensus standards within your scientific field for actually applying to the data that you're proposing to share, but again, that's for scientific data. So in a case where you have sort of other data that might not meet the definition of scientific data perhaps because it's not necessary to replicate or validate research findings, that policy would not, generally speaking, expect those to be shared or to be sort of addressed in plans. There's nothing that prevents you from being able to share those as well, and I do want to point out that there are other NIH policies, such as the NIH Genomic Data Sharing Policy, which may actually expect sharing of data that could have a potentially broader scope than the Data Management and Sharing Policy, so there may be other expectations that would expect those data to be shared, but this policy would not necessarily expect you to do so.

Our policy also states that when there's a situation where it's scientific data that's not leading to the generation of a publication, it's expected to be shared, but we just ask that you share it by the end of the award period as opposed to data that is actually leading to the generation of a publication. That's expected to be shared as soon as possible and no later than the time of the associated publication. Just to clarify, that's for data that meet the definition of scientific data, but there may be some cases where you have additional data types that might not meet that definition.