Building a Dynamic VRChat World

Background

I first tried VRChat in 2021 after a friend showed it to me, and I saved up and bought a headset as soon as I could. It was pretty much love at first sight and my only regret is not getting into it sooner.

VRChat is a social VR application, originally released in 2014 and currently available on PC and Meta Quest platforms. In contrast to most video games, VRChat doesn’t have any kind of plot. It’s more like a platform than a game, where user-generated content is king and the analogue of “gameplay” revolves around users’ interaction with that content and each other. Most players are using VR hardware, but it’s also usable with a standard keyboard and mouse or gamepad.

But that’s not to say that VRChat can’t be a game! The main forms of content creation are avatars (the 3D models that users control) and worlds (the places in which they do it, like a game map). Several of the most popular VRChat worlds are in fact implementations of games, but worlds really run the gamut from intense games, to group exercise areas, educational communities, places to cuddle, dance clubs, trippy visuals, and more. A friend likened VRChat to a modern-day Geocities, and a return to the personal homepages of the 90s internet - it’s entirely user-generated content with very little restriction on what you can publish, and (at present) almost no monetization.

With the expression possible through social VR and the unprecedented development of the VRChat platform both by the company itself and the community, I believe it is something to be treasured, and it’s something I want to contribute to. It has become extremely important in my life and has changed the way I interact with friends and loved ones forever. There is a lot of complexity to this and the freedom of the platform in this capitalist world may end up being its downfall one day, but that’s a topic for another article :)

My interest in chiptunes

I had always had a love for chiptune music, but it really picked up in 2017 when I got into retro consoles. It has been said many times that an artist can sometimes produce better work when working under a set of restrictions than they could if given carte blanche to do whatever they desired, and I think chiptune is an excellent demonstration of this. It’s the sort of thing that requires multidisciplinary knowledge to master - you’re given a set of primitive ways to create sounds and it’s entirely up to you to not only compose your music, but to construct the very sounds used in it. There are some toolkits that abstract away the programming side of it, but you really end up needing at least a basic understanding of the various synthesis parameters. Many of the best composers also had the skills to understand the hardware and write their own sound drivers in assembly.

After getting a Sega Mega Drive to relive some childhood nostalgia, I ended up embarking on a multi-year project to create standalone devices to play this music using the original sound chips. The culmination of my efforts is in a project I’ve called MegaGRRL, which ended up being released in a more DIY-friendly form factor as MegaGRRL Desktop. This project taught me a lot about hardware development and writing firmware, and I ended up gaining a pretty deep understanding of the sound chips used in the Mega Drive (and by extension, the whole Yamaha OPN series of chips).

Storing binary data in weird places

From my interest in obsolete video tech, I had been aware of PCM adaptors and products like ArVid. Video tape is a medium capable of dense digital data storage (for the time, at least), but the ability to do so is limited by VCRs only having standard video inputs and outputs. These products would encode/decode data to/from standard video signals, allowing the use of standard consumer VCRs instead of dedicated purpose-built decks (such as the Alesis ADAT XT or Technics SV-P100).

Binary data was also commonly stored on audio tape in years past, something I had become intimately familiar with from my research into animatronics control systems. This typically worked in a similar method, using purpose-built encoders and decoders to allow the use of commercially available audiotape decks. A drawback of encoding data for audio instead of video is that the available bandwidth is much lower - typically on the order of a few thousand baud for the more common encoding methods in the 80s, and certainly for methods that can be expected to survive a trip through a modern lossy audio codec. If you’re forced by some platform limitation to do a kind of archaic hack like this, using video is definitely the way to go.

With this knowledge stored in the back of my mind, it was like a solution waiting for a problem to solve.

Can the virtual world communicate with the outside world?

Essentially immediately upon getting into VRChat, I went browsing through worlds looking for the most impressive productions. Many have beautiful level design with complex geometry and lighting that brought my PC to its knees, but this is something that people (whether rightfully or wrongfully) have come to expect from a modern game on a modern engine. Of greater interest to me personally were the numerous worlds that offload heavy processing to a GPU shader, for every purpose from simulating a huge ball pit to running Linux. Of course, what I was really looking for were worlds that pushed the boundaries of what the VRChat tooling allows creators to do, and I had finally started to find them.

One world that caught my eye was Club Orion. There are a lot of club worlds in VRChat and the rave scene deserves its own article, but Orion got my attention due to the lighting in the world being controlled by DMX, a real-world lighting control standard used in the live event industry. I had already done some research into the possibilities of VRChat’s SDK and knew that the only way to bring external data in-game (without using local MIDI or OSC) was to use video players. A few searches later, I found that Orion had implemented VR Stage Lighting, a (mostly) open source project. Looking at the project’s Git repo confirmed what I had suspected - the DMX data was being brought into VRChat by being embedded into video. Sound familiar?

SDK limitations

So a few details about building worlds for VRChat. VRChat uses the Unity engine, and by extension all worlds are created in Unity. The primary limitation is that typical Unity scripting is not available to world creators. As VRChat lets anyone upload anything and Unity scripting is totally capable of doing all sorts of evil things to the user’s computer, this obviously cannot be allowed.

In the latest SDK version (SDK3), VRChat has implemented the “Udon VM”, executing a custom bytecode format. A graph-based language and visual editor known as Udon is available (similar to Unreal Engine’s Blueprint). The Udon VM only has a few instructions - in fact, the full list is NOP, PUSH, POP, JUMP_IF_FALSE, JUMP, EXTERN, ANNOTATION, JUMP_INDIRECT, and COPY. EXTERN is the primary workhorse and simply calls .NET internals. In this way, VRChat can limit what functions are usable via EXTERN to ones deemed safe, and then the user has full freedom to write whatever they want while still being bound by those constraints in all cases.

The Udon graph works great for those adding some simple scripting or interactivity to their worlds, but it’s cumbersome to build anything complex when you’re working with visual building blocks, and it’s slow. Like really slow. You have to call out to .NET functions for basically everything, even basic arithmetic, and this incurs a lot of overhead, both in the Udon assembly and in VRChat’s VM code. This is really just a fact of life that needs to be dealt with for VRChat world creators, and short of creative programming techniques or offloading processing to shaders, there’s not much that can be done about it.

That cumbersome nature has been solved by the UdonSharp project. Written by a community member who has since been hired by VRChat, U# takes the logical route of compiling C# code to Udon assembly. This allows creators to write code almost as if they were directly coding for Unity, without the need for visual scripting or writing assembly.

As an aside, I hope especially with the hire of the U# author that VRChat can look into some optimizations for the Udon VM. I’m sure there’s some desire to have creators build compelling experiences without the overhead of a mountain of code behind them, but requiring a mountain of code is just the nature of some things that are being built. Offloading processing to shaders is certainly possible, but passing data to/from the shader incurs its own performance overhead, and makes it much more difficult to interact with the standard Unity API.

I wanted to try building a world, and the idea was obvious

With my general love for music (especially chiptune), and after browsing around some music visualizer worlds, the project idea basically invented itself. I’d make a chiptune music visualizer world where the effects are driven by the raw sound chip data. No simple audio level meters or FFT processing here! We can see exactly which notes are being played, we can read all of the synthesis parameters, and we can do this for each channel on the sound chips individually which allows a very detailed look into the composition of these pieces of music. FM synthesis as used by the main YM2612 chip on the Mega Drive is a personal favorite of mine, so the decision was made - the YM2612 music visualizer project was born.

First steps

Immediately faced with the reality of having a cool idea but almost no clue how to make it happen, I started with a simple proof of concept of encoding and decoding data. Unity was completely alien to me but I had a good understanding of what needed to be done on the encoding side, so I set to work writing an encoder and trying to cobble together a simple decoder in Unity. A basic encoder was written, just encoding a few incrementing decimal values, with each bit of data output to an 8x8 pixel video region - black for a 0 bit, white for a 1 bit.

With each bit taking an 8x8 block, a 1920x1080 video can store almost 4KB of data per frame. I decided to double this resolution, using a 1920x2160 video frame with half for encoded data, and half for video content. I wanted to render oscilloscope views of each sound chip channel, and all ten would fit nicely in a 1920x1080 area with good resolution.

The initial encoder took in the raw data and output a series of PNG files, one for each frame, which would be assembled into a video by ffmpeg. Obviously, at 1920x1080 60 FPS, the storage requirements get ridiculous very quickly. Some documentation browsing later, I had ffmpeg doing all the work - taking the binary data in, converting it to a visual representation, scaling it so each bit was an 8x8 block, and combining the video to output a final MP4.

Back in Unity, I got a simple media player created, and set out to write the decoder. This was my first time using Udon, UdonSharp, and actually my first time using Unity at all, so it was an uphill battle. Immediately it became apparent that performance was going to be a serious concern, and sure enough it was something I had to keep in mind from the very first prototype up until final release. Not only was there serious overhead from decoding the data bit-by-bit in Udon, but the time it took to just copy in the entire 1920x2160 video frame was destroying the framerate. I don’t remember exact numbers, but it was bad. I couldn’t shrink the video resolution because I needed it for the oscilloscope views. There needed to be some kind of intermediate step, to take in the full video frame and output a small, optimized texture that could then by read by Udon efficiently.

Oh no, I have to learn how shaders work

Let me begin this section with a disclaimer: I’m not a shader expert. I’m hardly anything more than a shader beginner, and my knowledge is entirely specific to this domain of the sort of processing I needed to do for my world. If you need a shader to apply a visual effect to something, I’m not your gal, I have no clue. As such, please don’t take anything here as The Right Way To Do Things - it’s simply what I figured out and what ended up working for me. There are probably lots of optimizations that could be done, or better methods altogether.

The solution to my problem, as in so many impressive worlds, was to offload processing to a shader. It was my first time writing a shader and I think it’s one of those things that seems extremely daunting until you actually try it. I was surprised that I could write code in a very natural way. After an evening of learning and experimentation, I had created a pixel shader that could decode data from the video and output to a small 64x64 texture, which is much faster to read and process from Udon.

In the context of the full system in the world, this works as follows: The media player outputs the video image to a render texture. The render texture is used as an input on the shader, with the shader assigned to a material on a surface hidden far away in the world. For each pixel output from the shader, it reads 8 pixels (8 bits) from the input. A camera pointing at this surface captures what it sees to a 64x64 render texture, which will thus contain the decoded data in the red channel, one byte per pixel. Finally, an Udon script reads that texture’s color data into an array, which is then made available to other Udon scripts in the scene.

This is all much faster than the 100% Udon-based system. The shader-based decoder was the key to improving performance, and is one aspect of the world that remained unchanged throughout all later development.

Preparing the data - converting VGM to VRChat

The de facto standard for storing Mega Drive chiptunes is VGM. VGM is not a sampled audio format like WAV or MP3, but instead stores the data written to the sound chips in order to reproduce a given piece of music. Files are comprised of data blocks and a series of commands - write X value to Y chip, delay for Z amount of time, seek to offset A in data block and output the value through register B on chip C, etc. The format supports a huge number of sound chips, but we really only care about the ones relevant to the Mega Drive (YM2612 and an SN76489 variant known as the “Sega PSG”).

VGM is a format I’m very familiar with due to my work on MegaGRRL, and I have a parser I could drop into a project and go. But we can’t be streaming VGM directly into VRChat for a number of reasons:

The data rate is, in some cases, way too high.
Decoding the data through a shader in realtime (to remain synced to the audio) means we cannot expect to receive a consistent stream of data, due to the varying framerate of the game client, which can often drop below the video’s 60 FPS. Therefore, to avoid complicating the decoding process, each frame must contain the entire state needed by the world, with no reliance on past frames.
With Udon’s overhead, it would be impractically slow to maintain the state of the sound chips, apply the effects of commands to the state, and convert the state into useful information. Some tracks write to the sound chips tens of thousands of times per second.

Thus the decision was made to write a converter, abstracting all of this, and storing all the information relevant to the world in each frame. This includes whether channels are keyed-on (playing a note), the frequency of the note on each channel, etc.

Parsing VGM is non-trivial, and while MegaGRRL has solid code for handling it, I decided to modify an existing PC-based player rather than porting my firmware. An intermediate format was created, storing simple “write X to chip Y” commands, each with a timestamp. I hacked up vgmtest to silently play through a VGM file (faster than real-time - another reason I didn’t base this on the MegaGRRL code) and output my simplified log format, and then began work on a conversion utility in C which would take in this log, store state internally, and then write out the world data for the entirety of the track (which would later be encoded to video by ffmpeg).

The binary data half of the video frame primarily contains:

Data ID, frame number, and other metadata
Two seconds of upcoming frequency and key-on data
Values representing the current audio level of each channel
The entire set of YM2612 registers
The current PCM data block playback position (presently unused in the world)

The other half of the video frame contains an oscilloscope view for each channel, generated by Corrscope.

This clip shows the raw video containing binary data (left) and its intermediate decoded form as output by the shader (right):

Building the framework

Obviously the main control available in the world would be a way to select what music to listen to. I intended to have a large number of game soundtracks available, so I made a basic UI allowing game selection, followed by track selection within that game. With the video decoder system, it seemed like a good idea to have this be completely server-driven, and I set to work on some tools to generate and encode a track index using a custom binary serialization format. The world would load this index when first launched, and decode it to populate the track selector. (For those curious, you can view the unencoded serialized data here)

World construction progressed in a linear fashion, experimenting with different systems and learning Unity as I went. My background in various assembly dialects and bare-metal programming was useful in optimizing the Udon scripts, since Udon is relatively slow as discussed earlier. A lot of the common embedded tricks, such as LUTs and unrolled loops, are right at home here. There’s not really much else to say about the overall control systems in the world… it was pretty much a situation where what was required was obvious (tracklist display, seeking, continuing through playlists, syncing state to other users, etc) and I just sat down and banged it all out.

While the player-to-player networking requirements for the world were simple, VRChat’s networking system (and multiplayer game networking in general, really) is ripe for race conditions and other such problems. I’m very thankful for one of my testers in particular, a skilled programmer in her own right, who was instrumental in helping me track down these problems throughout testing. It’s a wonderful thing when a bug happens and someone immediately goes into troubleshooting mode, without even having any direct knowledge of the codebase.

Effecting effects

With access to the sound chip data, I implemented and tested various different effects that would be displayed in the world. It’s about 50/50 between what worked well and what ended up being scrapped.

What worked

Piano roll. Imagine note blocks falling as if onto a virtual keyboard, reminiscent of some popular MIDI visualizations on YouTube. This worked out great, aside from some performance concerns which will be discussed later.
Particle blasters. Using frequency and audio level data to control the color and size of particles, I created audio-reactive particle blasters that players could pick up and aim around. This took some fine-tuning but also ended up working out great.
Oscilloscope screens. Not much to say about these - they’re just screens that display an oscilloscope view of each sound chip channel, and I knew they’d work out well. Just a matter of sizing and positioning them so you can comfortably view them all in VR without having to look around too much. I also added some other data to the screens, like a colored bar based on the note being played, the audio level, left/right panning data, etc.
Haptics. A last minute addition that I’m mostly happy with. Most soundtracks have a profile defined that specifies how the audio level of the audio channels should affect controller vibration. The drawbacks here are the differences in how vibration works on different VR controllers, and how reactive the haptic feedback feels tends to vary from a nice pulsating buzz to a rough rattle depending on the controller used. Syncing to the audio was also a bit of a problem due to varying framerates, but I did what I could.
AudioLink. There’s a de-facto standard in VRChat known as AudioLink, used to apply sound-reactive effects to worlds and avatars. While this world obviously has its own method of reacting to music, I implemented AudioLink anyway so players with compatible avatars have working effects.

What didn’t work

Particle bounce effects. Since the data includes advance knowledge of upcoming notes that would be played, I made some particle systems that would spawn particles of a frequency-derived color ahead of time, and they would bounce off the floor at the moment the note was actually being played. I couldn’t figure out how to make this look very good, and the accuracy would drop at low framerates causing some notes to appear missed.
Moving objects. These were really just based on some initial tests of using music notes to move blocks around. They hung around until the end but were scrapped before release as they just don’t work well in VR. It was too difficult to scale their range of motion and position them in a way where they didn’t require you to constantly move your head to see.
“Hexbugs”. Initially it seemed like a great idea to have some sound chips rattling around the world as the music played, but it proved too difficult to get them to behave well. They’d either lay on the floor almost motionless, or zoom off into the air like a rocket. These could probably be made to work well, but I found the physics aspect of it too difficult to understand and they were scrapped. There are a number of chips laying around in the world that can be thrown at other players though :)
Audio-controlled lights. Looked great, but realtime lighting is a performance killer so these were scrapped.

Demo

The announcement video for the world is a good overview of the effects that ended up being used:

Learning to love VRCUrl

When you pass a video URL to a media player, you’d normally expect the URL to be a simple string. This is not the case in Udon. There is a special type, VRCUrl, which must be baked into the world at build time or created as a result of a user entering text into an input field. VRCUrls otherwise cannot be constructed or edited at runtime.

I don’t think the docs explicitly state why this restriction exists, but my assumption is that it’s for security reasons. It’s much more difficult to phone home data from a VRChat instance when you can’t populate a URL with data fields. (This hasn’t stopped people though - there’s a project that uses video players to call pre-baked Google Forms submission URLs!)

In any event, I can’t construct URLs at runtime, that’s just the way it is. I’m downloading and decoding an index file with a list of music tracks available in the world, but how can I specify the URL where each track is available? The solution was to bake an array of a few thousand URLs into the world using a custom Unity inspector, all simply specifying a track ID number. The data in the track index then specifies the index into the URL array to use for each track. Note: the mapping is not 1:1, as I have a “recommendations” playlist which points to tracks that already exist in their own game playlists.

Getting it all running on… a smartphone?! Shaders, part II

I’m being facetious here. The challenge was to get the world working on the native VRChat port for the Meta Quest, a series of headsets that use a mobile chipset and generally have a lot more in common with a smartphone than a desktop PC, down to running Android.

The restrictions for VRChat content on Quest are a lot tighter when it comes to avatars, but for worlds you can do most of the things you can do on PC. Notable restrictions are a lack of post-processing support and a lower file size limit. As I was only using post-processing for a fade transition and my world didn’t have a whole lot of data directly embedded in it, neither of these restrictions were a serious problem for me.

While you’re not too limited by what VRChat allows, you are absolutely limited by the reality of the Quest’s hardware. Anything you do on PC that feels slow is going to feel much slower on Quest. It immediately became obvious that the piano roll was a serious performance hog. With it disabled I was in the 30 FPS range, but when enabled I was dropping down to 10-15. This was with only me in the world, and obviously framerates would drop even lower when the world was full of other players’ avatars. Framerates weren’t great on PC either, with me getting about 40-50 FPS on my mid-range gaming machine. 30 FPS is what I would consider the lower bound of what feels fluid in VR, so obviously something had to be done.

Shaders are still an option on Quest. I’m sure there are some limitations due to running on a mobile GPU, but I just needed to read a texture and use that data to move cubes, hardly complicated stuff. A few hours of research and testing later, I had done it - the piano roll, originally controlled by Udon, had been converted to a vertex shader that would read the video data texture and act accordingly. Framerates on Quest more than doubled and PC had a similar reaction. Obviously pleased by these results, I doubled the length of the piano roll from one to two seconds, which looked great and still ran at a reasonable framerate.

A side note: I think I could have gone further than two seconds and still had good performance, but I was starting to push the boundaries of how much data I could store in each video frame, and I simply didn’t feel like putting any effort into increasing it. I was very happy with how much data was being stored versus the video bitrate, and reliability was perfect, I never saw a corrupt frame. This is definitely one area that could be improved though. My encoding doing four bits per macroblock is not very dense.

There were also some issues with aliasing of the data as it was read by the vertex shader, on the Quest version only. I never figured it out, but it may have something to do with sRGB color levels. What I ended up doing as a workaround was directly reading the video texture instead of the decoded data texture, where I’m just looking for simple black/white values instead of needing 8 bits of precision per color. The performance impact of this, while initially concerning due to needing 16 texture reads instead of 2 for a 16-bit value, ended up being a total non-issue. I’ve read that there is aggressive caching of tex2Dlod reads so I guess it ends up not mattering much.

How VRChat media players work, and a little optimization

Media players in VRChat’s PC version are quite unique. Not only can you play a direct link to a video file (via HTTP/HTTPS) or stream (RTSP, HLS, etc), but links to sites like YouTube and Vimeo are supported, despite them not serving plain video files. This is supported with the help of a bundled copy of yt-dlp. For example, consider a user plays a link to a YouTube video. VRChat will run yt-dlp with some parameters to get the link to a direct MP4, and then pass that link to the actual media player component.

The issue, and this is really a minor thing but it bothered me and I fixed it, is that all links are run through yt-dlp. This includes my own direct links to MP4s on my server, which don’t need that extra step to “resolve” a direct MP4 link the way YouTube does. I had just switched to having my media server do a 302 redirect to BunnyCDN, and I noticed videos were now taking significantly longer to load. Some investigation revealed that several unnecessary steps were being taken.

The URL is requested to play in VRChat.
VRChat calls yt-dlp with the following parameters: --no-check-certificate --no-cache-dir --rm-cache-dir -f "mp4[height<=?$RES][height>=?64][width>=?64]/best[height<=?$RES][height>=?64][width>=?64]" --get-url "$URL" (where $RES is the maximum vertical resolution specified in the component properties and $URL is the requested URL).
yt-dlp requests that URL from my server.
My server responds with a 302 redirect to BunnyCDN where the actual video file can be found.
yt-dlp follows the 302, making a request to BunnyCDN for the media.
yt-dlp downloads up to a megabyte or two of media, before deciding that yes, this is indeed media, and returning the resulting media URL to VRChat.
VRChat passes the URL returned from yt-dlp to the media player component, and playback begins.

The issue presented is the extra time and wasted bandwidth spent by yt-dlp requesting the media file, only to throw it away and then have VRChat load it again. My initial attempt at optimizing this (while keeping media served by a redirect from my server - I didn’t want to embed CDN URLs directly into the world) was as follows:

Return 200 with some dummy content for user agents matching what yt-dlp uses (Chrome with a randomized version).
Return 302 for all other user agents.

While this prevented yt-dlp from starting to download the media file, it was not much of a speed improvement, as I had really just switched who is following the 302 - instead of yt-dlp doing it, the VRChat media player backend now had to. Moreover, while yt-dlp is forced to run for all requests no matter what, the time spent launching it and making a request were being wasted by deliberately serving it garbage. This solved the problem of excess data being transferred, but didn’t really make things faster.

After some discussion on the Fediverse, it was suggested that I try to take advantage of yt-dlp’s generic link extractor, and it worked!

Return 200 for yt-dlp’s user agent, but this time with a dummy HTML5 video element, where the source points to the media on the CDN.
- yt-dlp returns the actual destination media URL to VRChat.
- No request for the media is made by yt-dlp.
Return 302 for all other user agents.
- This really doesn’t matter anymore, as VRChat’s media player backend is now requesting the target media directly!

For those interested, the NGINX config looks like this:

location ~ ^/v2/track/(?<trackid>[0-9]+)$ {
    if ($http_user_agent ~* Chrome) { #ytdlp
        add_header Content-Type text/html;
        return 200 "<video width=\"192\" height=\"216\" controls><source src=\"http://vrchat-fmc-pc.cdn.natalie.ee/trax/$trackid.mp4\" type=\"video/mp4\"></video>";
    }
    return 302 http://vrchat-fmc-pc.cdn.natalie.ee/trax/$trackid.mp4;
}

Side note: Using such a coarse user agent filter (“Chrome”) is okay here as the media server is not intended to serve web browsers. There are only three client types we really care about - yt-dlp for PC, WMFSDK for PC, and either stagefright or AVProMobileVideo on Quest.

HTTPS Everywhere! Well, except here…

One thing that came up during early testing and was never completely resolved is that none of the media files would load for a few people. If one person has the problem it’s probably their fault, but when it’s multiple then I need to look into it, so I worked with an affected person who very patiently helped me troubleshoot and provided info from her end.

Upon examining the VRChat client logs we found that yt-dlp was dying with a certificate error, which was weird because I could not find anything wrong with the server. The server is using a Let’s Encrypt certificate and these problems were only a few months after the LE root certificate change, which caused problems for a lot of people who were behind on OS updates (and thus updates to their root certificate stores). But that wasn’t the case here, she was fully updated and we confirmed the media server was working in all other browsers on her PC. She did not have any kind of antivirus or middlebox interfering with her network connection. We tried manually passing the URL to the yt-dlp exe that VRChat distributes (in case it was using its own embedded cert store - I don’t know how compiling a Python app for Windows works) and confusingly, it worked! This became even more confusing upon discovering that not only did her exe file (not working in-game) match mine (working in-game), but VRChat is passing the --no-check-certificate flag anyway, so certificate issues shouldn’t matter!

We never really got to the bottom of this. On the server side I tried various combinations of allowed SSL/TLS versions and ciphersuites, and even switched to the alternate LE certificate chain, with no luck at all. I could’ve rolled the dice by purchasing a paid certificate from another vendor or proxying all the traffic through a CDN like Cloudflare, but neither of these were something I wanted to do. Eventually, I compromised and switched to plaintext HTTP requests for the PC version of the world. The risk by using plain HTTP here is pretty low, and it resolved the problem for everyone who was experiencing it.

If anyone knows what’s up with this, I’d be very interested to hear from you.

Release

There’s not really a whole lot to say about release. Other than Unity crashes delaying my planned release time by about an hour, publishing the world went really smoothly. 60GB of media was served within two or three hours and by the next day the world had already made it out of Community Labs (where newly published VRChat worlds go until they get enough traffic - users must opt-in to see this).

I haven’t spent too much time in public instances, but the general vibe seems to be “those who know, know”. This is definitely not a world with general appeal, and that’s fine. I made it specifically for chiptune enthusiasts, and if anyone else gets a kick out of it, that’s a nice bonus as well.

To quote the Alien Soldier title screen: For Megadrivers Custom!

Post-release fixes - AVPro saves the day

A persistent issue I noticed while building and testing the world was that the VRChat client lagging would result in audio stuttering. Serious stuttering, like one or two seconds of no audio at all followed by a few little glitches once it started playing again. I had initially dismissed this as a non-problem - if you were doing something that causes hitches like playing with VRC’s photo camera, you could deal with some audio weirdness for a few seconds. But as VRChat introduced a new client UI (which actually caused serious hitching as you navigated around it), I decided something needed to be done. Furthermore, after the world was released and getting some traffic, we also could tell when someone new joined the instance as the audio would stutter while their avatar loaded. Needless to say, this was kind of ruining the experience.

VRChat offers two media player backends - the Unity player, and AVPro. I knew from spending many hundreds of hours watching videos in other worlds that the AVPro player wasn’t as susceptible to this problem, but I had refrained from using it in my world as it doesn’t work in Unity’s “play mode”, which would have made world development extremely cumbersome. The time had finally come to deal with this. I had read some comments in community Discords that importing the AVPro trial package would let it work in play mode, but this didn’t work for me. As a result, I ended up writing wrappers around all media player functions, so the Unity player could be used in the editor and the AVPro player while in the VRChat client. This also required a few helper objects to switch player-dependent material settings. This is another area that could be improved upon… there may have been a cleaner way to handle this, but I needed it done ASAP so I went with what I knew.

Reality

This section will be very frank.

Building this world was one of the least fulfilling projects I have worked on in several years.

Not exactly the happiest way to end this article, is it? Don’t get me wrong, I’m glad it’s done. I learned a hell of a lot building it and I’m pretty happy with the end result. The main issue is that I’m not particularly interested in gamedev (at least on the client side), and a lot of the skills I picked up while working on this are skills I didn’t really have any desire to learn, and don’t think I will have much use for in the future. Time will tell, but that’s my feeling right now.

I think about this as a VRChat project first and foremost - Unity was merely a tool I needed to learn to reach the goal I wanted, rather than learning it being a goal in its own right. A lot of time was spent fighting with tools (if I had a nickel for every Unity crash…) and development was at times literally painful. Building something for VRChat specifically is what made it worth it. I don’t think I would have pushed through if I didn’t have as much emotional investment in the platform and a desire to bring things I love together… chiptune music and social VR.

This is not the end of my VRChat-related development. VRChat isn’t the problem here at all anyway. I have several other world ideas I’d like to make a reality, but I’ll definitely be tempering my expectations, and generally trying to take it easy and not push myself toward any particular result. In the meantime, I’ll be getting back to my hardware development and trying to push my skills further there. You can catch me in VRChat a few times a week, but you won’t catch me in Unity anywhere near as much.

Final words

Thank you for making it to the end. I hope this writeup has been informative and not structured too badly like spaghetti :). Feel free to check out the world, and if you have any thoughts you’d like to share, my socials are on my website. I would like to extend a special thanks to my testers (you know who you are <3) who provided valuable feedback and put up with my constant iteration on this project, and the people of the “Redahs Nilrem” Discord server, who I have not interacted with directly, but were the source of many chatlogs I searched to find information about weird Unity and VRChat issues I was having. And if you are even the slightest bit interested in social VR, I implore you to check out VRChat as soon as you can. The platform is really something special. Don’t let it slip away before you get to experience it yourself.