Abstract
                                                                        Information Disguise (ID), a part of computational ethics in  Natural Language Processing (NLP), is concerned with best practices  of textual paraphrasing to prevent the non-consensual use of authors’  posts on the Internet. Research on ID becomes important when authors’  written online communication pertains to sensitive domains, e.g., mental  health. Over time, researchers have utilized AI-based automated word  spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased  content still leads to the source when queried on search engines. There is  limited prior work on judging the effectiveness of paraphrasing methods  for ID on search engines or their proxies, neural retriever (NeurIR) models. We propose a framework where, for a given sentence from an author’s  post, we perform iterative perturbation on the sentence in the direction  of paraphrasing with an attempt to confuse the search mechanism of a  NeurIR system when the sentence is queried on it. Our experiments involve the subreddit “r/AmItheAsshole” as the source of public content  and Dense Passage Retriever as a NeurIR system-based proxy for search  engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multi-level phrase substitutions  via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before  making it public. We also release the code of our approach