Computer Science


In computer science, we work on and learn computer science related things. We participate in competitions such as ACSL and Cyberpatriot, and learn algorithmic design in Java.


Tokenizer

For one of my personal projects, I wrote a BPE tokenizer in C. It tokenizes text by splitting it into tokens and iteratively replacing common combinations of tokens. This is valueable for language models because they don't embed entire words or single characters - embedding single characters would lead to gibberish, and embedding entire words would lead to the inability to infer new words. Also, embedding entire words would lead to a small vocabulary size, and responses would probably be bland. The project runs decently. It was designed as the prelude to a language model, but I never got to that.

Code for Binary Counting

ICSP

My independent CS project was a gradient boosting model resembling XGBoost (Also the MTFC basis model) implemented in Rust. It is trained and tested on past data from the S&P 500, and for a bar, outputs a prediction about whether the next bar's close will be above or below it's open. It has a 61% accuracy rate, which is a fabulous edge if it can be preserved in a real-world implementation. The loss and leaf weights/values are incorrectly implemented according to the original XGBoost paper, but that is a topic for a future project.

Code for Binary Counting

Apps for Good: InfoGuard

As part of the apps for good project, me and my group focused on misinformation. A large issue with the modern day internet is that information is never guaranteed to be true, especially with the growing influence of AI text and image generation. The solution we created was a web extension, which allows the user to highlight text on a webpage, and check that text for misinformation based on scholarly articles. The target audience are people who would benefit from an easy interface to check information safety, for example, researchers who need quick fact checking, teachers who need to verify student claims, students who want to write correct claims, and the elderly, who want to consume trustworthy health information. The MVP (Minimum viable product) was an extension connected to a backend which could provide reliable, academic fact checking information. The strategy was a) scraping semantic scholar based on keywords from the original query, and sending the academic abstracts to an LLM through a Groq API call, asking it to decide whether the claim was true or false based on the information provided.

Code for Binary Counting