Apple employees built an LLM that taught itself to produce good user interface code - but worryingly, it did so independently

Students using Apple devices in a classroom.
(Image credit: Apple)

  • Apple started with almost no Swift examples and achieved surprising results
  • StarChat-Beta was pushed into uncharted territory without clear guidance
  • Nearly one million working SwiftUI programs emerged after repeated iterations

Apple researchers recently revealed an experiment in which an AI model was trained to generate user interface code in SwiftUI, even though almost no SwiftUI examples were present in the original data.

The study began with StarChat-Beta, an open source model designed for coding. Its training sources, including TheStack and other collections, contained almost no Swift code.

This absence meant the model did not have the advantage of existing examples to guide its responses, which made the results surprising when a stronger system eventually emerged.

Creating a loop of self-improvement

The team’s solution was to create a feedback cycle. They gave StarChat-Beta a set of interface descriptions and asked it to generate SwiftUI programs from those prompts.

Each generated program was compiled to ensure it actually ran. Interfaces that worked were then compared with the original descriptions using another model, GPT-4V, which judged whether the output matched the request.

Only those that passed both stages remained in the dataset. This cycle was repeated five times, and with every round, the cleaner dataset was fed back into the next model.

By the end of the process, the researchers had nearly one million working SwiftUI samples and a model they called UICoder.

The model was then measured against both automated tests and human evaluation, where results showed it not only performed better than its base model, but also achieved a compilation success rate higher than GPT-4.

One of the striking aspects of the study is that Swift code had been almost entirely excluded from the initial training data.

According to the team, this happened by accident when TheStack dataset was created, leaving only scattered examples found on web pages.

This oversight rules out the idea that UICoder merely recycled code it had already seen - instead, its improvement came from the iterative cycle of generating, filtering, and retraining on its own outputs.

While the results centered on SwiftUI, the researchers suggested the approach “would likely generalize to other languages and UI toolkits.”

If so, this could open paths for more models to be trained in specialized domains where training data is limited.

The prospect raises questions about reliability, sustainability, and whether synthetic datasets can continue to scale without introducing hidden flaws.

UICoder was also trained under carefully controlled conditions, and its success in wider settings is not guaranteed.

Via 9to5mac

You might also like

TOPICS
Efosa Udinmwen
Freelance Journalist

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking. Efosa developed a keen interest in technology policy, specifically exploring the intersection of privacy, security, and politics. His research delves into how technological advancements influence regulatory frameworks and societal norms, particularly concerning data protection and cybersecurity. Upon joining TechRadar Pro, in addition to privacy and technology policy, he is also focused on B2B security products. Efosa can be contacted at this email: udinmwenefosa@gmail.com

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.