CS

In computer science, we work on and learn computer science related things. We participate in competitions such as ACSL and Cyberpatriot, and learn algorithmic design in Java.

Tokenizer

For one of my personal projects, I wrote a BPE tokenizer in C. It tokenizes text by splitting it into tokens and iteratively replacing common combinations of tokens. This is valueable for language models because they don't embed entire words or single characters - embedding single characters would lead to gibberish, and embedding entire words would lead to the inability to infer new words. Also, embedding entire words would lead to a small vocabulary size, and responses would probably be bland. The project runs decently. It was designed as the prelude to a language model, but I never got to that.

ICSP

My independent CS project was a gradient boosting model resembling XGBoost (Also the MTFC basis model) implemented in Rust. It is trained and tested on past data from the S&P 500, and for a bar, outputs a prediction about whether the next bar's close will be above or below it's open. It has a 61% accuracy rate, which is a fabulous edge if it can be preserved in a real-world implementation. The loss and leaf weights/values are incorrectly implemented according to the original XGBoost paper, but that is a topic for a future project.

Computer Science

Tokenizer

ICSP