Best Practices in Using Generative AI in Research

The Research Working Group’s view of Best Practices is largely informed by the University of Illinois Urbana-Champaign (UIUC) Library’s Guide on Generative AI and the University of Illinois System Policy on Integrity in Research and Publication

Principles 

The responsible and ethical use of generative AI (GenAI) in research can be viewed as a special case of “responsible conduct of research” (RCR) principles and practices (e.g., Shamoo & Resnik, 2022).  These general principles include honesty, carefulness, transparency, accountability, confidentiality, fair use, and social responsibility. Furthermore, the responsible and ethical use of GenAI in research mirrors much of its use in teaching and learning.  The UIUC Research Working Group concurs with the UIUC Teaching and Learning Working Group’s principles of supporting generative AI literacy; fostering academic integrity; supporting appropriate roles for humans and their generative AI tools; and supporting equity, inclusion, access, privacy, and security. 

Best Practices for Effective and Ethical Use of Generative AI in Research 

We describe best practices at two levels: the individual researcher/creator and the organization/university. 

Best Practices for Individual Researchers/Creators 

Choose Wisely for Your Own Protection 

To the best of your ability, investigate the capabilities, limitations, and terms of service of the generative AI tools of choice.  Choose the tools that best fit the task at hand and that meet your own ethical standards of conduct. In the same way that you might prefer fair trade coffee or sustainably made products, you may prefer to use products that are, or claim, to be “responsible AI”. This point will be revisited below in the section on Organizational Best Practices. 

Terms of service include copyright, privacy, data ownership rights, and use of your data by the third-party GenAI provider.  For example, to protect the privacy of your data, you should understand how to access, control, and delete your data from the third party GenAI provider. Unclear, ambiguous, or exploitative terms of service should be rejected. A number of third party GenAI providers have flexible options to opt out of data sharing and tracking history – use them! 

Respect the Context for Dissemination in your Community 

As an author or creator, you should verify acceptable practices in the venue where you plan to publish or exhibit.  Consider the publications ethics guidance for the particular journal or professional society where you plan to publish your work (e.g., Ganjavi et al, 2024).  Consider the criteria for the particular venue for juried exhibitions. Follow the guidance for your specific context. 

For example, in June 2024, Springer Nature’s editorial policies on AI Authorship state that Large Language Models (LLMs), such as ChatGPT, do not qualify as authors but should be properly documented in the Methods section of the manuscript. Generative AI images are not allowed unless obtained from agencies that they have a contractual relationship with and that have created images in a “legally acceptable manner.”  

As another example, as part of its publication ethics policy, Elsevier’s AI author policy states that authors are allowed to use generative AI and AI-assisted technologies in the writing process before submission, but “only to improve the language and readability of their paper and with the appropriate disclosure,” per the Elsevier’s Guide for Authors. Editors should not upload a submitted manuscript into a generative AI tool, even if for the purpose of improving language and readability. This policy does not cover spelling or grammar checkers nor “reference managers that enable authors to collect, organize, annotate and use references to scholarly articles — such as Mendeley, EndNote, Zotero and others.” 

A third example is from the Institute of Electrical and Electronics Engineers (IEEE), a professional society that publishes more than 150 journals, transactions, and letters in a wide variety of engineering disciplines. IEEE author guidelines state that the use of AI-generated content (text, images, figures, code) should be disclosed in the Acknowledgements section of the manuscript. 

In the humanities, the PMLA permits the use of all content created by an AI tool provided it was fully cited in the manuscript at submission, but the journal does not provide extensive guidelines about how the AI-generated text should be documented.   

Balance the Use of Generative AI Tools with Your Own Responsibility as an Author/Creator 

Most, if not all, publishers and professional societies do not outright prohibit the use of generative AI tools. Instead, the general guidance is to use generative AI tools for high-level “gists” or editorial tasks. “Gists” include generative texts to provide a brief introduction to a topic, assist in brainstorming and ideation, help discover and understand existing literature in an area, and provide summary tables or outlines.  Editorial tasks include finding synonyms or keywords, copyediting (e.g., spelling and grammar checks), and suggesting stylistic changes (e.g., “make this more formal”).  

However, as a researcher, you have an ethical obligation to verify and check the provenance, quality and sources of outputs that any tool provides. You should verify facts and citations with reliable sources; ensure that proposed citations to the literature are accurate; maintain privacy of high-risk, sensitive, and internal data as appropriate; and assess outputs for bias.  The latter is a complex point that may demand organizational resources to support individual researchers.  However, you as the author or creator are ultimately responsible for your work. 

Disclose Your Use of Generative AI in your Research Process 

Consistent with the general principles of honesty, traceability, and accountability, you should disclose your use of any generative AI tools in your research in your publications and works. Research journals and creative venues vary in their requirements, so the following is a broad list of what to disclose, which should be tailored to the applicable journal or venue: 

  • Summary of why and how you used generative AI tools in your work; clearly explain what research tasks you performed with generative AI tools. 
  • Provide full citations (name, version) of the generative AI tools (for example, using MLA style). 
  • Document the dates and timestamps of your usages of the generative AI tools. 
  • The prompts that you gave to the generative AI tools. 

Also, you have an obligation to archive a copy of the unedited generative AI outputs for full traceability and provide the copy when requested.  

Protect Your Work Online 

As an author/creator, you may have concerns about your work being reused without your permission by generative AI technologies.  You can take a number of actions to protect your work online, such as considering publishers that allow authors to opt out of data mining and avoiding websites and publishers whose terms of service allow data mining or are unclear on data privacy.  Specific tactics include using a robot.txt file to prevent web crawlers from accessing your site; protecting text by substituting other symbols for English letters (for example, replace the English ‘a’ with a Cyrillic ‘a’; but note that this text now becomes inaccessible to screen readers); protecting images with products such as Glaze or Nightshade; and protecting audio files with products such as AntiFake.  

When publishing your work, review the author’s contract carefully for any clauses that permit the use of your work for text mining and language model training. Keep in mind that academic publishers are increasingly using AI-powered chatbots to help scholars find relevant articles, so opting out of text mining may affect the discoverability of your research and negatively effect your impact factor.  

Use Best Practices for Prompt Engineering 

There is emerging guidance on how to write ‘good’ prompts to have a useful result from a generative AI tool (Mollick, 2024; OpenAI, 2024).   Similar to guidance on how to write a good set of instructions, principles of good prompt engineering include:  

  • Be clear and specific. 
  • Use the imperative voice (i.e., state your prompt as a command, not a sentence or question). 
  • Use positive language, not negative language (i.e., state what you want, not what you don’t want). 
  • Break down complex questions into smaller parts. 
  • Engage in iterative testing and refinement. 
  • Do not violate any existing privacy or security laws, policies, guidelines, or measures, such as basing prompts on protected, sensitive, or high-risk data. 

Organizational Best Practices 

In addition to individual researchers and creators performing due diligence in their individual use of generative AI tools, the organization itself should provide consistent guidance to relieve administrative burdens from individual researchers and creators.  The university already has legal obligations to protect data; for example, regulated data (e.g., student data is subject to FERPA, health data is subject to HIPAA) and intellectual property.   

The University of Illinois Urbana-Champaign does now have several approved vendors of generative AI tools whose terms of service are consistent with the University’s acceptable use and privacy policies.   Other vendors and their products are under consideration.  The National Center for Supercomputing Applications (NCSA) has developed its own chatbot infrastructure: uiuc.chat.  

The University of Illinois Urbana-Champaign provides numerous training and engagement opportunities and infrastructures, such as the Scholarly Commons, Library Guides, Researcher – Technology Resource Guide,   Research IT portal, LinkedIn Learning, Coursera, and the OVCRI Research Training Portal.  The University’s worldclass research portfolio in artificial intelligence, machine learning, and related topics offers many sophisticated training and collaboration opportunities as well, including the Department of Computer Science, Department of Electrical and Computer Engineering, School of Information Science, NCSA’s Center for Artificial Intelligence Innovation, C3.AI Digital Transformation Institute, the INVITE Institute, and AI FARMS. NCSA is also part of the National Science Foundation’s National Artificial Intelligence Research Resource (NAIRR) Pilot program.  

A broad range of training, coaching, and communications is needed to support the many kinds of research on campus. 

In the future, institutions can also provide a consistent approach to verify ethical use (e.g., FairlyTrained.org), detect biases, and validate performance metrics of generative AI tools (such as OECD’s list of technical metrics for trustworthy AI). 

References 

Ganjavi C, Eppler MB, Pekcan A, Biedermann B, Abreu A, Collins GS, Gill IS, Cacciamani GE. Publishers’ and journals’ instructions to authors on use of generative artificial intelligence in academic and scientific publishing: bibliometric analysis. British Medical Journal. 2024 Jan 31;384:e077192. doi: 10.1136/bmj-2023-077192. PMID: 38296328; PMCID: PMC10828852. 

Mollick, E. Captain’s log: the irreducible weirdness of prompting Ais. One Useful Thing. 2024 March 4. https://www.oneusefulthing.org/p/captains-log-the-irreducible-weirdness. Retrieved August 13, 2024. 

OpenAI. Best practices for prompt engineering with the OpenAI API.  2024. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api. Retrieved August 13, 2024. 

Shamoo AE and Resnik DB. Responsible Conduct of Research (fourth edition). 2022. Oxford University Press.