Troydm's Blog

A personal blog about software development

Writing Parser Combinator Library in Clojure

TL;DR Writing parser combinator library in Clojure from scratch

Recently I’ve had an fortunate opportunity to participate in a commercial project that was heavy on parsing and I’ve decided to use Clojure as my main tool for doing it. As you might all know Clojure is not new to the game and is quite mature functional programming language with a hidden power of Lisp. One of the distinct features of Clojure which I personally adore is nrepl and Interactive Programming incremental development style which it cultivates. This is by no means something new, Common Lisp has it via slime/swank, some of Scheme implementations have geiser support and almost all Smalltalk variants could be considered as full interactive programming environments, one notable example being Pharo. There are some more specialized interactive programming environments like Processing and SuperCollider which deal with either sound or visual or even both but today we’ll be talking about parsing and all things related instead. Before I’ll start explaining what we are actually trying to build I would like to introduce you to a really nice open source library that I’ve successfully used in a commercial project I’ve mentioned before, called kern. But as all things good there is always some value in trying to build your own even better, at least for me it was an insightful journey. So yeah as title says we’ll be building parser combinator library in Clojure just to learn a thing or two but solely just for fun. The full source for impatient ones is in my ccp repo. So let’s get started then.

Violin Loli

Of All the Garbage in the World

TL;DR Writing tri-color (actually 4-color) incremental generational garbage collector in C

Previously we’ve talked about manual memory allocation and unraveled some shadows around it. Now let’s talk about automatic memory allocation, more specifically about Garbage Collection. Most if not all modern Programming Languages contain a mechanism that handles allocation of memory automatically at precisely the moment application needs it and deallocation of that memory at the moment it’s not needed anymore. However determining when memory region (or object) is safe to deallocate is not an easy feat as we need to somehow determine if the region is needed or not. And here is where Garbage Collection comes into a play. Simply speaking Garbage Collection is an algorithm that checks entire memory of an application for regions that aren’t referenced anywhere inside application by other memory regions and deallocates these regions because they aren’t needed anymore.

Space Dandy

Lifting Shadows Off a Memory Allocation

TL;DR Writing power-of-2 malloc memory allocator in C

Any sufficiently advanced technology is indistinguishable from magic or so they say. Today we are going to lift some shadows from the very basic thing that is dynamic memory management in an application process. This knowledge is essential for anyone who wants to write his/her own Programming Language so gaining this knowledge is not an optional thing since it opens a whole new lever of understanding how dynamic memory is managed by application and is split between different applications. To do that we need to have some basic understanding of how the actual physical thing is used by operating system. For those who don’t know anything about Virtual Memory or what Memory Pages are and which algorithms modern operating systems are using to manage those I suggest reading Operating Systems Concepts and for those who want to know all the guts of how Linux does this under the hood there is Understanding Linux Kernel. Police Loli

Write You a Monad for No Particular Reason at All!

TL;DR Writing functional programming library in Java

Functional Programming paradigm and languages that fully utilize it’s benefits are slowly gaining popularity nowadays not only in academia but also in enterprise becoming de facto standard for future generations of programmers. Maybe in 10 years from now humanity will come up with a better idea of how to make bug-free quality software that does the job and takes less effort to produce, however currently we are living in a century of Object Oriented programming systems slowly morphing into Functional Programming systems. There are a lot of learning resources on internet about Function Programming in general and Functional Languages such as Haskell, Erlang, Scala, Clojure and many other. You all probably heard about Learn you a Haskell for Great Good book, a really bright, colorful and interesting book that got me into Haskell programming few years ago. Some of you may even tried reading it. I think most of people without prior functional programming experience that start reading it sooner or later hit a bottom of their mental capacity while trying to comprehend Monads. This article is intended for this people! Yes this post is all about Functors and Monads you heard it right! In order to explore the ideas of functional programming let’s create a small functional programming library in Java that will have Functors and Monads and by doing so learn about functional programming from a really unusual perspective of using Object Oriented programming. Our target compiler will be Java 6 (yes we won’t be using new fancy Java 8) as most of the people that are reading this now are probably more familiar with it rather than newer version. For those of you who are too lazy to follow can check the whole thing at my fpj repository, however I strongly encourage you to try reimplement it all by yourself from scratch. The information described in this post implies that you have prior experience or basic understand of Monads but want to better understand them internally. So let’s get to work!

Rikka loli

Rewriting Micro Compiler in OCaml Using Ocamllex and Ocamlyacc

TL;DR Rewriting micro compiler in OCaml

In my previous post I’ve talked about writing micro compiler in OCaml under 300 lines of source code. There are number of ways to make our work easier and number of source code lines significantly smaller.

Potato Loli

Let’s rewrite our micro compiler using tools called lexer and parser generators. We’ll be using tools called ocamllex and ocamlyacc which are distributed with OCaml compiler and are modeled after famous lex and yacc tools for Unix operating systems. Those tools actually have better modern analogues called flex and bison which are described in detail in Flex & Bison: Text Processing Tools book. Nowadays however if you are writing a professional compiler in OCaml I strongly suggest you consider using sedlex and menhir instead of ocamllex and ocamlyacc as both tools are quite outdated and lack some significant features that their modern analogues have such as unicode support for lexing, parameterized parser generation and built-in grammar interpreter. So what are lexer and parser generators? To put it simply ocamllex and ocamlyacc take special .mll and .mly definition files of lexer and parser semantics mixed with OCaml source code and generate an .ml source code files that do the actual token generation and parsing for you. Pretty neat indeed, and it’s actually easier to use than it sounds so let’s rewrite our micro compiler using those tools. We’ll be using original source code of micro compiler as reference only as entire code base needs to be changed. You can see the end result of our rewrite in micro git repository branch called simple. For the actual description of the micro language see my previous post. So let’s get started!

Writing Micro Compiler in OCaml

TL;DR Writing micro compiler in OCaml

At one point or another every single software developer in the world comes to a realization in his career when the time is ripe and it’s time to write your own super cool programming language.

Lemon Loli

However the subject of creating your own programming language with an compiler is quite a complex one and can’t be tackled without some pre-research. That’s how I’ve started reading Crafting Compiler in C, an aged but really comprehensive book about developing your own compiler for an Ada-like programming language. Second chapter describes writing a really simple micro language targeting pseudo assembly-like output in order to explain the core concepts of developing your own compiler and writing an LL(1) parser.

Let’s try rewriting this micro compiler in OCaml, a language better suited for writing compilers that is becoming quite popular due to it’s clean syntax and strict evaluation semantics combined with functional and object-oriented programming styles. If you are not familiar with OCaml try reading Real World OCaml first. Instead of outputting pseudo assembly our micro compiler will output a real nasm source code which will be automatically compiled into a binary executable file.

Making 30 Years Old Pascal Code Run Again

TL;DR Fixing some really old Prolog implementation written in Pascal

Recently I’ve been interested in Logic Programming, notably in learning Prolog so I’m in a process of reading two great books, Programming for Artificial Intelligence and The Art of Prolog. If you want to get a quick feel of Prolog I recommend you take a look at Bernardo Pires’s Gentle Introduction to Prolog and Prologomenon blog. To put it simply Prolog is all about logic, deduction and backtracking

Sherlock Loli

While browsing /r/prolog I’ve stumbled upon Prolog for Programmers originally published in 1985, an old book indeed and honestly sometimes hard to follow. I can’t recommend it as a starter book about Prolog but it’s still quite interesting to read. However it has a whole two chapters describing implementation of Prolog interpreter which is quite a complex task and sparkled my interest in continuing reading this book. Authors provide source code of two version of Prolog interpreter, the one they originally wrote in Pascal back in 1983 and it’s port to C in 2013 which, as stated on their website, was done because they couldn’t compile old Pascal code with Free Pascal Compiler and because… well Pascal is quite out of fashion nowadays. Couldn’t compile?! Well, challenge accepted!

Processing & Broadcasting Financial Data in Scheme

TL;DR Writing data collecter, processor and broadcaster in Scheme

Any software developer who worked in financial industry will tell you that there are few key requirements to programming applications for real time market. Applications should be as fast as possible and they should be as easily modifiable as possible. First requirement is essential since getting and processing information takes time and sending processed information takes even additional precious time, and in financial world time equals money. Second requirement is determined by constantly changing business rules imposed on data processing.

(eq? 'money 'power)

Writing IRC Bot Using Perl 5 and POCO::IRC

TL;DR Writing IRC bot in Perl

Some people use IRC to chat, some don’t. It was invented a really long time ago and isn’t going away anytime soon despite some new generation alternatives popping up like Jabber.

Personally I always have my IRC client running (I’m using weechat + tmux) and chat with lots of interesting people who inspire me to try new technologies and learn something different every day. One person, who’s nickname I won’t name, was always telling me about how awesome Perl as a programming language is and how great it’s potential is thanks to CPAN that has almost 124k modules for any life situation. I always thought he was exaggerating and literally acting like a Perl fanboy. Perl was the first programming language I’ve learned back in the late 90’s and remembering how frustrating my experience with it was and how cryptic it really was for me do something with it when I was unexperienced and lacked lots of qualities that make up a any decent software engineer I was skeptic about using it again. Well, time passed, time always passes, and I haven’t written anything more than quick 50 line server scripts in Perl for almost 13 years. I’ve almost forgotten everything about Perl. Since lately I was having this crazy idea about writing IRC bot that could store and execute shell scripts on server so I could automate my servers through IRC, I thought why not write it in Perl. I’ve remembered that person who was always bragging about Perl’s greatness wrote an IRC bot in Perl using POE::Component::IRC so I’ve decided to try and use the same framework for my bot. It’s based on really popular POE event loop framework which is very easy to learn and use. Matt Cashner wrote a really good introduction article called Application Design with POE

Hosting Your Own Remote Private Torrent Tracker

TL;DR Modifying BitTorrent Tracker written in C++

Ever wanted to share a really big file (more than 4 GB) with someone without a hassle of uploading it to some file upload server?

BitTorrent to rescue, also there are alternatives like hosting your own ftp/sftp file server but I won’t consider them here! So you probably already have a dedicated home file server running on Linux/BSD/Solaris that also has a torrent client installed on it that you access through web interface?

Oh you don’t? Snap it’s it’s so useful that nowadays almost everyone has some kind of NAS that he/she is using for file storage and torrents. So if you don’t have one then you are behind of times

So what do we need to share some file over torrent? Yes indeed we need a torrent tracker