Universal Pretraining - Exploring the Value of Modality
/ 2 min read
Last Updated:About the Project
Late into my undergraduate studies at SMU and into my early work on my Master’s degree, I worked on some research into transformers and the value that certain data types have. During this research, we evaluated pretraining with different data modalities across different transformer architectures to explore the importance of data. This was inspired by An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale and Pretrained Transformers As Universal Computation Engines. The idea was to explore if different modalities would inherently cause the model to improve at a faster rate. This was early exploration of the work and would eventually be taken over by a new direction. That new direction being exploring the importance of data mixtures in multimodal pretraining (what are the impacts of the mixture?). This research was abandoned for several reasons, the methodology was deeply flawed and to get this methodology to get meaningful results would require too much compute and ablation effort. Additionally, the authors went our separate ways for our future careers and have yet to further develop the research. We did have a paper in progress that we did not release or submit for publication (it was still a nice project to submit for some grad school work though) and if you are interested in reading it in its current state you can find it here
If you are interested the code can be found on GitHub
If you are interested in other research, I have done or am working on, feel free to reach out. I am also open to any and all research positions.