Bugs of the Past

2021-07-02

Started digging more on my kernel, and got some kind of inspiration again. This lead me fixing bugs, implementing new things, etc. So let’s talk about the bugs. In the end there was quite many of them. And I’m sure there’s still many to be found.

SMP

Let’s start with SMP - the subject I mentioned on previous post. I mentioned that it was buggy, but I never figured out why. Sometimes it worked just fine and sometimes not.

I fixed the one mutex issue back then, but this was still failing. It bothered me a lot. Luckily I found a case where it reproduced almost all the time.

I really don’t want to admit how long it took for me to get it straight. Went thorough all the code, changed even how the SMP initialization works. Tried memory barriers in case there was something related to those.

After some pondering I figured out that the stack for all the SMP cores was actually the same. I had failed to setup separate stack thus causing stack corruption. This explains all the issues. It was not that trivial to fix because stack was set at so early phase and trying to allocate bigger one in loader failed. But luckily I was able to postpone it, and setup stack elsewhere. Now it boots cleanly, but there’s very short slot where this all can go still wrong. And one thing is, I still don’t know why I need some extra data area in my loader in x86.

printf

I don’t know how many printf implementations there is. However I have one of my own, and it’s really needed for all the printing and debugging. One could claim that since I have musl, keeping my own implementation is just waste.

I’d say it’s not true. My own implementation allows much earlier access and printing than relying on musl. Also it’s like layered cake, and I have had issues trying to use musl in many places. It becomes easily chicken-egg problem.

While debugging things I found out that my printf implementation was buggy. That wasn’t big surprise, but it caused a lot of confusion. It really didn’t handle 64 bit variables and formats right. Fix was almost trivial, and I would say it’s one step better again. Not perfect and most probably never will be, but at least it’s usable.

Misc things

There was missing a lot of system call implementations, and that’s a known issue. However there came up some other interesting issues. Like that even my SMP uses now proper stack, all my tasks were still using only 4096 bytes stack. And that ran out easily. For example:

char filename[MAX_FILE_NAME];
char new_filename[MAX_FILE_NAME];
char old_filename[MAX_FILE_NAME];

It doesn’t look at all bad. But all these are reserved from the stack. If we add the fact that MAX_FILE_NAME was 1024, you might start seeing the problem. Increasing the stack to 32k fixed this issue, but need to follow what could be optimal stack size for apps.

Way forward?

For now I see it has improved a lot, I can trust the kernel again to perform certain tasks reliably. And I believe I can fix issues if (and when) they come up. My dream has been to get Python running on there. By fixing the bugs, and by implementing syscalls, I got one step closer to that one. However now it failed because it could not load libraries, modules and files.

Python module load fails

I have one filesystem there, but looking it now I see all the shortcomings it has. Thus I decided to implement USTAR filesystem. That’s in principle just plain tar archive which is mounted read only. The implementation is simple, and way to create filesystem image is easy on any computer which has tar. So in principle anywhere.

I have the implementation already there but it might have some slight issues. My hope is to get it working soon so that I could build and package Python modules there… However until that, let’s see what kind of things I encounter next!