Drones, games, and proteomics: Khoury College’s fall 2024 master’s apprenticeship showcase

On December 6, the fall 2024 Khoury research apprentices in Boston presented their ongoing research projects to a room of showcase attendees. Their work, guided by faculty advisors, featured innovations in fields ranging from AI to psychology to astronomy.  

The apprenticeship program, which began in 2019, allows faculty to submit research proposals alongside student nominees who excel academically and show promise in research fields. This semester, 19 master’s students were selected; their work is highlighted below. 

Kalli Hale: Decoding the Emotional Resonance of Narrative Play 

Hale’s project, guided by Bob De Schutter, focuses on the effectiveness of narrative games in evoking empathy from players. Specifically, she asks which demographic variables are correlated with greater empathy, as well as which barriers inhibit empathetic expression.  

“In our literature review, we discovered that existing measures don’t necessarily capture the underlying variable structures which would shed light on or key motivating questions, namely who would be emotionally affected by a game?” Hale said, adding that she also plans to examine how certain themes in narrative games influence a player’s overall experience.  

To collect data, Hale will recruit study participants to review selected scenes from Catch the Butterfly, a narrative game designed to recreate the story of a Syrian man’s experience migrating to the US. Study participants will complete a survey about their sociodemographic and experiential contexts, then self-report levels of empathy using a scale.  

Hale hopes to create a framework that demonstrates how narrative play can be used to foster collective empathy.  

“Empathy is both universal and profoundly shaped by context,” Hale said. “By iteratively testing and codesigning our instrument through a process of collaborative community input, we hope to shed light on the practical impact of narrative play.” 

Michelle Figueroa: AI in Health Care: LLMs, Clinical Summarization, Bias Mitigation, Explainability and Data Challenges 

Figueroa’s project, guided by Akram Bayat, explores how large language models can transform health care, specifically by analyzing clinical information and reducing bias. 

“The goal is to identify and develop innovative solutions that enhance patient outcomes, streamline clinical processes, and support providers with advanced data tools,” Figueroa said.  

The first stage of Figueroa’s project involved a literature review of AI models used for diagnosing illnesses, specifically pancreatic cancer and respiratory failure. More advanced versions of these models can process images like CT scans and MRIs, increasing the accuracy diagnostic predictions. 

So far, Figueroa has found that combining the strengths of multiple AI models leads to better performance with health care tasks. Her next steps include interviewing medical professionals in Boston and Silicon Valley to identify more gaps in AI models. This data will be used in the second stage of the project, in which Figueroa will develop a new model that addresses the weaknesses of existing ones. 

Joyce Hsu: Exploring Behavioral Data Interpretation Through Dashboard Design 

Hsu’s project, guided by Vedant Das Swain and Varun Mishra, examines how users engage with behavioral data — specifically pertaining to mental health — and seeks to inform human-centered technological design. 

“The lack of transparency and user involvement often reduces user engagement, compromises privacy, and limits the user’s sense of control,” Hsu said.  

In response, Hsu applied machine learning concepts to develop a heuristic model for predicting depression risk based on participant sensing behaviors. She also designed a LLM prompt to display key activities and insights contributing to these predictions for a technology probe known as DYMOND, or Dynamic Monitoring for Depression. As Hsu describes it, DYMOND “empowers users to understand their behavioral data and actively influence how the algorithm assesses their mental health.” 

Study participants get access to a dashboard that displays their behavioral data, DYMOND’s depression prediction, and key behaviors influencing the predictions. The dashboard includes a configuration panel that allows participants to adjust DYMOND’s tracking so it’s more aligned to their behaviors. 

To further understand how users engage with their data, Hsu conducted dashboard codesign exercises during check-in sessions. So far, she has found that users like filtering information by specific behaviors and insights, as well as viewing past journal entries. Users also expressed interest in integrating external elements like weather data and calendar events into the dashboard to provide more insight into their mental health. 

Future study steps could include a comparative analysis between a group using DYMOND and a control group to evaluate how the tool impacts user engagement and understanding of their mental health. 

Mahesh Babu Kommalapati: Towards Preventing Intimate Partner Violence by Detecting Disagreements in SMS Communications 

Kommalapati is creating a mechanism to detect disagreements between romantic partners with the goal of mitigating intimate partner violence among juvenile adolescents. Specifically, he employs machine learning and natural language processing techniques to analyze text data, which allows him to detect and track disagreements between adolescents over time.  

“Conflicts in relationships are pretty common, especially in high-risk populations such as juvenile adolescents,” he said. “If these conflicts are not resolved just in time, then they can lead to several detrimental effects such as depression or addiction to illegal substances.”  

 For his study, Kommalapati used transcripts of digital messages between adolescents, pairing them with surveys about their emotions and questions related to disagreements. He then used several computational models, including large language models, to analyze the message patterns and detect potential conflicts. The study found that there are noticeable differences in communication patterns between disagreement and nondisagreement groups. 

“To date, our models have demonstrated promising accuracy in detecting disagreements by analyzing interaction patterns alongside sentimental and contextual factors,” Kommalapati said. “These findings suggest the potential for developing real-time intimate partner violence prevention tools with further refinement.” 

Kommalapati, advised by Aarti Sathyanarayana, said these findings could help psychologists and relationship counselors, as well as other mental health practitioners, track disagreements over time and act proactively to prevent relationship violence. Going forward, Kommalapati aims to expand the existing dataset to enhance the model’s robustness. He also plans to transition from adolescents’ self-reported ecological momentary assessments to expert-validated ones to improve reliability.  

Zefeng Zhao: Sturgeon-GRAPH Application: Constrained Graph Generation from MIDI Examples 

Zhao’s project, guided by Seth Cooper, uses the Sturgeon-GRAPH system to model and generate new music in MIDI. Sturgeon-GRAPH is a system that can learn from example graphs and generate new graphs with similar characteristics; MIDI, or Musical Instrument Digital Interface, is a code used to describe and play music digitally.  

In Zhao’s system, a MIDI file is analyzed to extract important details like note pitch and duration. That information is turned into a graph that represents the music’s structure. The graph is then fed into Sturgeon-GRAPH, which analyzes the original music for structure and pattern. Once Sturgeon-GRAPH generates a new graph inspired by the original one, this graph is then converted back into a MIDI file, which can be played.  

“The whole process creates new music based on the structure we analyzed and gives us a unique way to create new music,” Zhao said.  

Zhao said future work includes developing the system so it can capture more complex musical patterns. The team will analyze the “listenability” aspects — like smoothness and quality of the generated music and work to improve it. 

Lohitha Reddy Indupuru: Applications of Cryptography to Social Reporting System 

With the help of advisor Ariel Hamlin, Indupuru created a cryptographic system that aims to protect the security and privacy of users’ information when they report sensitive issues to organizations.   

“Have you ever been in a situation where you felt the need to report sensitive issues at work, but you were worried about your identity being revealed?” Indupuru said. “Our project is a solution to that problem.” 

Indupuru’s system uses cryptography, namely multiparty computation, to secure a user’s personal information while still getting the complaint to the relevant entities. The system works through three vectors; first, the client or employee submits a report to a secure server, attaching a keyword related to what they’re reporting. The server encrypts the report until it receives the same keyword — like “harassment” — three times. At that point, it decrypts the report for the organization’s review, revealing only the information in the report. 

“By doing this, we have achieved the privacy of the reporting entity and we have also distributed the trust among the multiple parties,” Indupuru said. 

Indupuru hopes to optimize the server for a faster and more resource-efficient operation, and to make the decryption threshold for reports customizable.  

Cai Peng: Expanding a Global Network to Tackle Misinformation: The IPIE Approach 

Peng’s project was inspired by the growing need to curb the spread of misinformation online. Her goal is to build an inclusive network of experts from a range of fields and regions who can provide legitimate information to consumers. 

“We wanted to create tools and strategies that make it easier to find and connect with qualified experts, especially in underrepresented areas, to address the unique challenges misinformation presents across different contexts,” Peng said.  

To do so, Peng focuses on expanding the network of the International Panel on the Information Environment (IPIE), which fights misinformation by providing policymakers with well-researched insights from experts. Currently, most IPIE experts hail from the US and UK, Peng’s analysis found. 

Under the mentorship of Saiph Savage — who leads the IPIE’s membership panel — Peng used machine learning and data analytics to identify global experts who could contribute to IPIE.  

“So far, we’ve shown that using these tools can help us build a more diverse and effective network, improving our ability to address misinformation from multiple perspectives,” she said.  

Peng’s next step involves enhancing the natural language processing model to better handle cultural and regional contexts. Peng hopes the finished tool will allow IPIE to recruit experts more strategically and with greater impact. 

Anshuman Raina: Automated Benchmarking Framework for MS-Based Proteomics Analysis 

Raina is creating a benchmarking framework to ensure that MSstats, a tool for analyzing proteomics data, remains reliable. Proteomics is the study of proteins and protein complexes in cells, which can be used to detect diseases before the onset of symptoms.  

Raina’s was inspired by a recent study that found that traditional statistical methods to analyze proteomics data can be variable and produce false positives.  

“Proteomics data shows high variability, a lot of missing values and a lot of noise,” Raina said.  

With the guidance of Olga Vitek, Raina created a benchmarking framework that automatically tests MSstats for accuracy, precision, and false positives and negatives. The researchers integrated the framework with Northeastern’s high-performance computing resources and GitHub, which allows for the processing of large amounts of data.  

Currently, the false discovery rate for the model is higher than expected, which Raina plans to investigate.  

Vinesh Gande: Data Driven Automation of Nonverbal Behavior

Gande is using videos of nonverbal behavior, or human gestures, to generate robust and context-specific data for training realistic virtual humans and replicating human features. This process requires vast amounts of data, which is time-consuming and expensive to compile. Current gesture generation models also fail to account for a speaker’s role.  

“For example, politicians have very expansive gestures while clinicians use minimal gestures to convey the same thing,” Gande said. “Our goal is to automate the creation of this annotated data so that it could help the current machine learning and deep learning approaches to make better gestural generation and help psychological research perform better analysis on human gestures.” 

To achieve this, Gande, advised by Stacy Marsella, extracted frames from videos depicting nonverbal gestures. He analyzed the gestures, mapping key physical points and joint positions in each movement to identify what makes each unique. Combining this data with gesture theory and regression analysis, he created annotations showing how different types of people use distinct nonverbal gestures. This helps train deep learning and language models on automated data that considers a speaker’s background.   

This development can be used to better replicate human gestures in various virtual contexts, including video games, virtual reality, online education, therapy sessions, and immersive entertainment experiences.  

Lokesh Saiphanibabu Saipureddi: Transform Based Approach for Reverse Image Search 

Saipureddi’s goal is to create an easier mechanism for people to search for products, specifically relying on images instead of text.  

“The e-commerce industry is growing rapidly every day,” Saipureddi said. “This justifies a different search method than just typing text.” 

Saipureddi, advised by Mohammad Toutiaee, used a large dataset of images and product categories to train his model. The system currently has two steps; in the indexing phase, product images and types are processed using the OpenAI CLIP model, which extracts features from both product images and related text. This data is then stored for later retrieval. In the searching phase, a user uploads an image of a product they’re searching for. The image is processed by CLIP, generating captions and features. These features are then indexed for similarities, and the model outputs images of similar products. It then indexes and searches images to find their product type.  

Saipureddi tested the system using a toy image; the model returned similar products stored in the database in about two seconds. In future, he plans to scale the architecture to larger datasets, refine indexing strategies, and explore model integration for enhanced accuracy. 

Pruthvi Prakash Navada: Statistical Analysis and Interpretation of Biomolecular Networks

Navada’s project combines several existing analysis tools into one open-source model that helps biologists better interpret results of biomolecular experiments. Specifically, Navada, advised by Olga Vitek, is integrating MSstats, a system that identifies proteins with changes in concentration after an experiment, with INDRA, which assembles and stores protein regulatory relationships at scale. 

“Oftentimes, apart from the main targets that biologists have in mind, there could be other proteins for which the concentration would change,” Navada said. “Interpreting these results directly from the statistical measurements is difficult.” 

The new software helps biologists and researchers visualize protein interactions and regulatory relationships. 

“This would better help the biologist understand why a protein was upregulated or downregulated when the experiment was performed,” Navada said. 

 Sai Chandra Pandraju: Security and Privacy in Large Language Models

Pandraju’s project, guided by Alina Oprea, was inspired by the increasing reliance on retrieval-augmented generation (RAG) systems, which large language models (LLMs) use to retrieve relevant information before answering a question. RAG helps improve some LLM limitations like knowledge cutoffs and hallucinations.  

“Despite RAG advantages, these systems are vulnerable to adversarial attacks that can compromise the integrity and reliability of retrieved content,” Pandraju said. “The project aimed to address these vulnerabilities by exploring how adversarial attacks target the retriever component and developing robust defense strategies to mitigate the associated risks.” 

Adversaries can inject poisoned documents into the RAG system, which filter into the LLMs’ responses. So Pandraju first studied how attackers can manipulate the data in RAG systems, then developed tools to defend against these attacks. The defenses included measures like ensuring the content makes sense, assessing reading difficulty, and checking the flow of ideas in a text.   

“When we were creating these defenses, they needed to work on a wide variety of these adversarial documents,” he said. “Second, since the RAG architecture itself is complex, our defenses should be simple enough to not take too much time or resources to run.” 

Pandraju found that simpler attacks are easier to catch, while more sophisticated mechanisms require layered defenses. Next, he aims to refine the defense strategies to handle more complex attacks, then generalize these strategies to work with different types of RAG systems.  

“We hope our findings will strengthen the reliability of AI-powered tools used in health care, legal, research, and customer support, where accuracy is paramount,” Pandraju said.  

Paola Alsharabaty: SLFS: A Serverless Distributed File System 

Alsharabaty’s interest in her project was sparked when she took a distributed systems course. She realized she wanted to learn more about computing systems and gain practical experience in the field, making a serverless file system the perfect fit. 

Alsharabaty’s project, guided by Ji-Yong Shin, aims to build a distributed file system using the serverless framework, a cloud computing model where servers are abstracted away and where functions run without server management. Her research also seeks to address the limits of serverless environments, such as systems requiring extra time to run a function for the first time (cold starts).  

“The goal of the apprenticeship was to enhance an existing distributed file system,” Alsharabaty said. “Additionally, we designed and implemented an evaluation plan for that system.” 

The research team is in the process of completing experiments and drawing conclusions about the effectiveness of the system. 

“I hope this research can make an impact in the cloud community,” Alsharabaty said. “We are proposing suggestions and improvements to existing serverless frameworks. By enabling stateful applications, users will have the opportunity to optimize their systems and better utilize resources.” 

Rohan Jamadagni: Networking for a Next Generation minimal RISC-V OS 

The goal of Jamadagni’s project is to add networking capabilities to an open-source operating system called EGOS, which helps students learn about operating systems and which simplifies concepts like memory management and process scheduling in operating systems. Specifically, it teaches students about operating systems on a platform known as RISC-V, and is used for courses at numerous universities, including Northeastern. 

“One key thing missing from EGOS was networking,” Jamadagni said. “Networking is probably the most important component of an operating system in 2024.” 

So Jamadagni is creating a networking lab in EGOS to teach students to implement networking functionality and gain practical understanding of how networking functions within an operating system. With the help of Cheng Tan, Jamadagni created a custom Ethernet for the system, added a lightweight networking protocol, and built a simple web server as proof of concept. 

Once the project is completed, students will be able to reimplement parts of Jamadagni’s project and gain hands-on experience networking within a system. 

“Students will be able to implement their own web server from scratch,” Jamadagni said, “going from writing assembly for switching between processes to fully functioning web server in just one course.” 

Hardik Bishnoi: Fast Galaxy Image Simulation by Using Low-Rank Adapters on DDPMs 

Bishnoi’s project, advised by Tala Talaei Khoei, focuses on generating data for machine learning in astronomy, which is important considering the high cost of equipment such as telescopes.  

As a solution, he uses DDPMs — which generate high-quality images — to generate synthetic data. However, DDPMs are expensive to train and produce a lot of carbon emissions, so Bishnoi went one step further, fine tuning pretrained DDPM models instead of training them from scratch.  

Using a method known as LoRA, or low-rank adaptation, Bishnoi fine-tuned pretrained DDPM models by introducing efficient updates that shrink computational overhead while retaining model performance. 

“We have some really good results for image similarity,” he said. “As for image quality, we see there’s some anomalies in certain cases, but we managed to control them using negative prompts.” 

Seunghan Lee: Drone Simulation Testbed Development 

Lee worked with a team to develop a drone simulator that allows users to write and test algorithms in a simulated software environment without flying real drones, enabling users to avoid the risk of physical testing. This addresses the lack of comprehensive open-source software for drone testing.  

“We wanted to build an infrastructure that allows experimenting with drone algorithms to further explore potential security issues in a controlled environment,” he said. “With the software, we can test the same algorithms without any risk, as well as conduct research with the collected data.”  

Lee’s role, which was guided by Aanjhan Ranganathan, was to develop obstacle-avoidance algorithms. The next steps in the project include developing more security-issue-tolerant algorithms and deploying the software in physical drones.  

Soni Rusagara: Digital Ethnography of Black Femme Content Creators 

Rusagara’s project explores the challenges faced by marginalized content creators on social media and the ways that algorithms affect their visibility.  

“Social media has become a tool for creativity, but it’s also become a tool for financial opportunities,” Rusagara said. “For content creators of color, it comes with a challenge.” 

Rusagara is using TikTok’s API to collect and analyze a dataset of more than 30 videos created by Black women and nonbinary creators. With support from Alexandra To, Rusagara aims to uncover bias in TikTok’s algorithm and content moderation, which may affect creators’ compensation and reinforce existing beliefs about people of color.  

To select videos to analyze, Rusgara searched by specific keywords and hashtags to identify patterns in engagement and content moderation.  

“We want to understand biases, and social media is not just for popularity; people are actually making money,” she said. “So, if we think algorithms are unfair to content creators of color, we want to analyze the algorithms and potentially change what they’re doing.” 

Moving forward, Rusagara’s project will expand its analysis, refine its research questions, and seek to understand even more about creator experiences.  

Yichen Yan: Visualization of WebAssembly Code 

Yan’s project visualizes the operation of WebAssembly code, a popular and high-performance language used in web browsers, to enhance understanding of the system. 

“WebAssembly is a low-level language and it’s very difficult to read and debug,” Yan said. “So, in this project, our goal is to generate a graph to capture the code’s logic and semantics.” 

Those graphs visualize things like control and data flow, action block and control flows, and data blocks and flows.  

“This could facilitate the program’s understanding and optimization,” he said.  

In the future, Yan hopes to integrate interactive features into the graphs, such as zooming in and expanding.  

Zitong Bao: Race, Gender, and the Visual Culture of Domestic Labor, 1870s to 1940s 

Bao’s project merges digital humanities and computer science to explore racial and gender stereotypes through an analysis of late 19th and early 20thcentury trade cards and postcards. The postcards are part of a private collection by Satya Shikha Chakraborty of the College of New Jersey, who is an advisor to Bao along with Joydeep Mitra

“We want to design a website to visualize these cards by using data visualization tools and image visualization tools to make these cards more accessible to critical analysis,” Bao said.  

Bao is building a website to digitize these virtual artifacts, which reflect historical views on domestic labor. Because they depict sensitive topics, such as racism, she’s adding layered architecture that allows website viewers to choose whether they see certain images. The project will also feature a spatial library that shows users where each postcard is from.  

In the future, Bao said she wants to make the website more interactive, including enhancing its GIS functionalities, or how data from the postcards is managed, analyzed, and mapped.  

The Khoury Network: Be in the know

Subscribe now to our monthly newsletter for the latest stories and achievements of our students and faculty

This field is for validation purposes and should be left unchanged.