The Identification of Common Druggable Targets For Acute Dysentery in Four Pathogenic Bacteria, Shigella, Salmonella, Campylobacter and E.Coli

Purpose: Dysentery is a severe form of diarrhea and caused by four bacteria include Shigella, Campylobacter, E.coli and Salmonella. They are responsible for higher morbidity and mortality rates resulting from dysentery every year across the world. Antibiotic therapy of this disease plays a critical role in decreasing the prevalence as well as the fatality rate of this infection. However, the management of this disease remains challenging, owing to the overall increase in resistance against many antimicrobials. Hence, it has become important to identify as well as develop therapeutic methods presenting novel avenues for infections. In the current study, proteome based mapping was utilized to find the potential drug targets for dysentery. There is need to identify novel drug and vaccine targets to control this disease. This study is designed to identify new drug targets to develop drug and vaccines to battle bacillary dysentery . Subtractive genomics approach was used in this study to find novel drug targets. Methodology: The proteomes of Shigella, E. coli, Campylobacter and Salmonella were retrieved from Uniprot. Paralogs in these proteomes were removed by CD-HIT. Gene essentiality was screened by Geptop server 2.0. Host-pathogen interaction was analyzed through HPIDB database. To identify non-homologous proteins, the essential proteins were analyzed in Blastp against Homo sapiens ( H. sapiens ). The unique metabolic pathways were recognized by comparison of metabolic pathways of selected strains of bacteria and H. sapiens using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. Subcellular localization and functional classification of proteins involved in unique metabolic pathways were analyzed through PSORTb and SVMProt. Druggability potential of unique essential proteins was investigated using the DrugBank database. Findings: Over all analysis showed 7 proteins that were common in all four bacteria. These proteins were essential to pathogens. Out of 7 proteins 6 proteins MURA, MURG, DAPA, MURB, DAPE, DnaA are reported as a drug target in different bacteria and one protein DAPD is novel. The study will enable the development of natural and cost-effective drugs against dysentery infections. Recommendation: However, further validations are necessary to confirm their drug effectiveness and biocompatibility.


INTRODUCTION
Dysentery also known as bloody diarrhea is usually initiated due to contamination by different bacteria and parasites.The most well-known parasitic reasons for dysentery are Shigella, Campylobacter, E.coli and Salmonella.These bacteria are well known to cause this disease in humans worldwide.Every year in the United States, it approximately causes around 450,000 cases, with around four cases for each 100,000 population (Kotloff et al., 2018).Every year around the world, they are evaluated to cause 165 million infections.In this study, Shigella, Campylobacter, E.coli and Salmonella bacteria were examined in order to find the drug targets which were common in all these four bacteria.
Infectious disease dysentery has become the most common cause of both developed and developing countries endemic disease.The main host and natural reservoir known for the bacteria is human gastrointestinal tract.They all mainly spread through the fecal route, sexual contact, and consumption of contaminated water or food.Nearly half or more strains are now resistant to multiple drugs throughout various parts of the world.There is currently no effective dysentery vaccine or drugs, but frequent hand washing can often prevent a person to person transmission.To combat this infection, there is a dire need to develop drugs that are common in all these bacteria and can be used against dysentery.
Shigella is commonly known as microorganism that is gram-negative, facultative, anaerobic, nonspore forming, non-motile, rod-shaped as well as extremely related to E. coli.It is named after Kiyoshi Shiga, who previously found it in 1897 (Yabuuchi, 2002).It is the major agent that causes human dysentery, Shigella produces disease in mammals.It is normally found in humans and gorillas (Erken,2008).During disease, it regularly produces dysentery.Shigella is also known as one of the foremost bacterial causes for bacterial dysentery worldwide, causing expected 80-165 million cases yearly.The number increases every year and it is estimated at the range of 74,000 and 600,000 (Mani et al., 2016;McGuire et al., 2018).
It relies in the list of four microbes which is the root of moderate-to-extreme dysentery in south African and Asian children (Kotloff et al., 2018).The main species of Shigella are categorized by four serogroups.S. dysenteriae S. flexneri S. boydii and S. sonnei are four groups of this bacteria.Shigella dysenteriae and boydii are similar while S. sonnei could be distinguished on bases of biochemical assesses (Baron et al., 1996) No vaccines or drugs are available for dysentery.The best way to prevent this disease is washing hands with soap and water thoroughly, repeatedly and cautiously, before handling the food, after and before using the washroom (Sati et al., 2019).Strict compliance with standard food and water safety precautions is also important.Water from lakes, untreated swimming pools, or ponds should also not be swallowed.Avoid having sex with diarrhea patients (Kotloff et al., 2018).
Various approaches like comparative along with the subtractive genomics have been utilized to find the targets in several human microbes.It is the newest methodology used to detect novel pathogen drug targets.These approaches are used to select potential drug targets that are used in clinical studies.The drug targets are identified based on the determination of the non-homologous and essential proteins in pathogenic organisms.A crucial step in the process of computer-aided drug design is to identify drug targets.The need for a rapid search for small molecules bound to objects of biological interest is crucial in the drug discovery process.The accessibility of pathogenic genome sequences has provided a wide range of knowledge that could be helpful in identifying targets for vaccines and drugs.The difference between host and pathogen proteins can be effectively used to design pathogen-specific drugs (Jamal et al., 2017).The main objective of this study is to identify the targeted essential genes that show no homology with the Homo sapiens so that the drug targets can be used with minimum off-target effects in humans.Here, we employed an integrated in-silico subtractive genomics approach to the entire proteome of four bacteria to find the cure of bacillary dysentery.

METHODOLOGY
The identification of common druggable targets for acute dysentery in four pathogenic bacteria, Shigella, Salmonella, Campylobacter and E.Coli was the main purpose of this study.The subtractive genomic approach was used in order to meet the results.The complete framework of the methodology used in this study was illustrated in figure 1.
European Journal of Biology ISSN 2709-5886 (Online) Vol.8, Issue 1, pp 13 -22, 2023 www.ajpojournals.org To exclude paralogous sequences from Shigella, Campylobacter, E.coli and Salmonella proteome, the sequences of set0 were subjected to CD-HIT analysis.The CD-HIT analysis was carried out with identity cutoff of 0.6 which was able to remove redundant sequences with more than 60% identity from set0 (Huang et al., 2010).The remaining proteins were listed as set1 as nonparalogous protein sequences.

Identification of Essential Proteins
Druggable targets are always made from essential genes.These essential genes are involved in the proper functioning of body.For this, druggable target cannot be made from non-essential genes.
Hence non-essential genes were removed from proteomes by Geptop 2.0 server.Essential proteins are necessary to organisms for their survival and are considered the basis of life.Geptop 2.0 server (http://cefg.uestc.cn/geptop/)was used to retrieve essential proteins.The essential proteins were listed as set2 (Wen et al., 2019).

Identification of Proteins Sequences Non-Homologous to the Human Proteome
This step aimed to avoid functional similarity with human proteome to prevent bindings of therapeutic compounds to the active site of the host homologous proteins.Non-paralogous essential proteins of the pathogen were analyzed using BLASTp.The proteomes were then subjected to BLASTp against the Homo sapiens proteome using RefSeq database.The proteins having an E-value of 10 -4 were removed as they are considered to have a certain level of homology with host genome.The resultant data consisting of sequences of pathogens having zero homology with host proteome were listed as set3 (Salgaonkar et al., 2011).

Analysis of Unique Metabolic Pathway
The analysis of metabolic pathways of the non-human homologous essential proteins were computed at KEGG.KEGG is the Kyoto Encyclopedia of Genes and Genomes that contains the complete metabolic pathways present in living organisms.Those metabolic pathways that were unique and are absent in humans were listed as set4 (Kanehisa et al., 2017).

Host-Pathogen Interaction
"HPIDB" a host-pathogen PPI database.It is used as integrated source to determine interaction between host and pathogen.Specially, HPIDB assimilates investigational PPIs from numerous open databases into single, non-redundant web available resource.This database could also be searched with diversity of options such as sequence identifiers, symbol, taxonomy, publication, author, or interaction type.HPIDB permits user to search protein sequences using BLASTP to retrieve homologous host/pathogen sequences.For high-throughput analysis, the user can search multiple protein sequences at a time using BLASTP and obtain results in tabular and sequence alignment formats.The resultant proteins were subjected to this database to analyze the level of interaction between host and pathogen (Kumar & Nanduri, 2010).

Subcellular Localization Analysis
The program PSORTb was used for subcellular localization prediction.It provides essential information concerning the function of proteins and is helpful in the annotation of genome and designing several proteome experiments as well as in the identification of certain diagnostic potential drug vaccine targets for the proteins of bacterial pathogens.The functional annotation of essential proteins was also retrieved.It was essential to determine subcellular localization as it predicts the type of druggable targets.So, if the protein is present in outer membrane, it is possible that it can be highlighted as a vaccine.However, the proteins present in the cytoplasm (set6) were identified for further analysis (Yu et al., 2010).

Protein Functionality Analysis
The tool SVMprot http://bidd.group/cgi-bin/svmprot/svmprot.cgi was used to determine the molecular functions and biological processes of resultant proteins (Blum et al., 2020).

Identification of Druggable Targets
The proteins having cytoplasmic localization were subjected to DrugBank.DrugBank (https://www.drugbank.com/) is an online database used to find preferential druggable targets.E.value of 10 -5 was set and the druggable targets for pathogens were obtained that are causing bacillary dysentery disease in humans (Wishart et al., 2018).

FINDINGS
Inquisition of novel therapeutic targets against Shigella, Campylobacter, Salmonella and E.coli was the preliminary purpose of this study, and here a subtractive genomic approach was used for the whole proteome by using several online databases and computational tools.A brief insight and findings of Shigella are shown in Table 1.

Removal of Paralogous Sequence
The proteomes of Shigella, Campylobacter, Salmonella E.coli were retrieved from uniprot consisted of 5894, 4048, 13314 and 26120, sequences respectively.The paralogs were removed from total proteome sequences retrieved from uniprot with the help of CD-HIT analysis one by one.The overall results showed 1892, 1789, 8965, 5886 non-redundant sequences in Shigella, Campylobacter, E.coli and Salmonella respectively.

Selection of Essential Genes
Antibacterial compounds are mainly prepared to dock and inhibit essential gene products.Essential proteins are therefore regarded as the most advantageous drug target.The Geptop 2.0 server was used to identify essential proteins.Four bacterial proteomes were subjected to the server one by one and results were obtained in a well precise manner.A total of four lists were acquired.The number of essential proteins in Shigella, Campylobacter, Salmonella and E.coli were 418, 474, 1246 and 1489 respectively.

Removal of Human Homologous Protein
Homology between host-pathogen protein results in undesirable cross-reaction as well as cytotoxicity.Therefore, drugs developed and helped to bind the target proteins of pathogens must evade cross-reactivity with host homologous proteins.The Blastp analysis was done with Blastp RefSeq database.The essential proteins were subjected to Blastp and analyzed with human genome.A total of 247, 118, 389 and 549 sequences were found among Shigella, Campylobacter, Salmonella and E.coli respectively.

Host-Pathogen Interaction
The non-homologous sequences were subjected to HPIDB database which is used to provide information about proteins which have interactions between host and pathogen proteomes.Thus, the proteins similar to humans were discarded.The total of 56, 39, 78 and 118 of Shigella spp, Campylobacter, Salmonella and E.coli proteins were left.

Analysis of Unique Metabolic Pathway
The KEGG database was helpful in identifying the proteins involved in common and uncommon pathways.The KEGG server was helpful to find the pathways which were unique to humans but present in pathogen.It provided the list of metabolic pathways that were common and unique in humans.A total of 56, 39, 78 and 118 proteins were subjected to KEGG sever and only 14, 11, 8 and 18 and were unique in Shigella, campylobacter, Salmonella and E.coli.

Analysis of Subcellular Localization
The localization of leftover unique proteins were analyzed and identified.PSORTb server was used to identify the localization.All of the 14, 11, 8 and 18 proteins were subjected to PSORTb one by one and a precise result was found.All proteins were cytoplasmic except MURG which was cytoplasmic membrane.Those proteins were preferred in this regard which were associated with localization in cytoplasm and cytoplasmic membrane.

Druggability Analysis
When the localization was identified, druggable targets were retrieved using Drugbank.Out of 14 proteins of Shigella 12 were found as targets.Out of 11 proteins of Campylobacter 10 were found as targets.In Salmonella 7 targets were found out of 8 proteins.Out of 18 proteins of E.coli 15 targets were found (Table 4.6).In all these proteins only 7 was found common in four bacteria.The proteins are MURA, MURG, DAPA, DAPE, DAPD, MURB, DnaA.

Discussion
Dysentery affects millions of people annually and causes several deaths.Mild cases are treated without antimicrobial agents and their recovery is quite rapid.Antibiotics control the spread of this disease inside the intestine as well as reduce the duration required for it to enter the entire body.The use of anti-diarrheal medications is not recommended as it may aggravate the disease (Sati et al., 2019).There is currently no effective vaccine for dysentery, but frequent and thorough hand washing can prevent person-to-person transmission.
To combat this life-threatening situation, there is a dire need to develop drugs against dysentery immediately.In this study, the subtractive genomics approach was employed to screen the drug targets against four bacteria that are causing dysentery.This approach was utilized to find the targets based on the determination of essential as well as non-homologous proteins within pathogenic organisms.Identifying drug targets is a crucial step in the computer-based drug designing procedures (Hosen et al., 2014).Recent advances in the disciplines of bioinformatics as well as computational biology have created a variety of approaches to drug design and in silico analysis, reducing the time and expense associated with trial and error of ions devoted to drug development (Barh et al., 2011).
In recent years, a large amount of preferential druggable targets are identified for those bacteria which either provide resistance against the drug or for which no vaccine is available and they are associated with a specific pathogen.The in silico subtractive genomics was applied to find drug targets in four bacteria that were Shigella, Salmonella, E. coli and Campylobacter as it was cost effective for target identification as well as easy approach completed in a short time, thus reducing the research time.As discussed above, subtractive genomics involves subtraction of sequences.So at the end those sequences were selected in this study which was unique, non-homologous and essential for bacterial pathogens Shigella, Salmonella, E. coli and Campylobacter.The result of this study was quite intriguing.Out of more than twenty thousand sequence proteomes downloaded, 7 potential drug able targets were found.Out of 7 targets, 6 were reported in many other disease-causing bacteria which explain 58 the significance of this approach.It limits the proteins in small numbers making the discovery of druggable targets easy.