Archives

Permalink

Because I'm an amazingly huge geek, I started working on creating my own programming language over the Thanksgiving holiday. I now have a working (but very primitive) byte-compiler and interpreter.
Planned features:

  • Imperative-style syntax (i.e., like Python/C/Perl/Java and unlike LISP/ML/Prolog)
  • Full support for continuations including callcc (a la Scheme) and serialization [1]
  • List, unicode string, and byte-array types based on my blist data structure
  • Duck-typing and object prototypes (as in JavaScript) rather than classes
  • A pipe (|) operator (as in bash)
  • Easy access to C libraries (similar to Python's ctypes module

Right now, continuations, serialization, and object prototypes appear to work. Many other important things (such as basic 4-function arithmetic!) remain unimplemented.

For the pipe operator, I imagine being able to chain Python-like generators together, something like this:

open `somefile.txt' | contains `Agthorr' | sort > important_lines

which (imo) is much easier to read than the Python equivalent:

important_lines = sorted(line for line in open('somefile.txt') if 'Agthorr' in line)

I find the pipe operator lends itself better to left-to-right reading, as opposed to nesting things in layers of parenthesis. Shells are the only languages I know of with a pipe operator, and they require spawning a separate process for each segment of the pipe.

Ideally, on multi-core machines I'd like to be able to assign each segment to a separate thread if the compiler can determine that the segments are fully independent (except for the piped data). I'm not sure how feasible it is for the compiler to determine independence other than in the most trivial cases, though.

[1] = Serializable continuations mean you can save the execution state of the program for later use. For example, you can save a CGI script's state to a database and resume where you left off
when you get a hit with the matching cookie, without having to explicitly move all of the data, e.g.:

foo = parse_html(html)
send_form_page(foo)

# save state to DB and return to main event loop
foo2 = wait_for_form(event_loop)  
# main event loop called us back; resume sequence of pages for this user

send_next_page(foo, foo2)

I have to put this project on the shelf for now, though. Maybe I'll make more progress over Christmas.

Share

11 comments to

  • neat.
    where is this project shelved? :)
    also, lol at your example sort criteria.

  • That’s quite a rabbit hole you’ve built for yourself.

    :)

  • The pipe operator makes some sense, but it fails if you need your data to source/sink from more than one data source at the same time: e.g. interleaving two tables into one. How do you plan on dealing with that sort of mess? Fall back on functional parameters?

    • Certainly for a first version, falling back on function parameters is the way to go. I don’t think this comes up all that often so I’m not too worried about it. I’m definitely open to other ideas, though.

      Got a concrete example, and/or other suggestions for solutions?

      • pmb

        How about the tee command in shell? It’s pretty handy, and it would be great if it could have a symmetric brother eet.

        • What syntax would you propose to pipe two different generators into eet?

          • You could set up one of the generators on the line before, running it into a temporary placeholder of some kind. Then on the next line you could specify that as the parameter to eet, and it would interleave back and forth between it’s standard input and the parameterized input… Optionally, eet, could take more parameters specifying interleave size, etc…

            The basic concept of running a generator into a temporary holding structure kind of fits with your use of continuations, too…

          • Python has something like this: itertools.izip. In Python, it’d be:

            a = generator1
            b = generator2
            c = izip(a, b)
            

            Where c is a generator and the first element it produces is a 2-tuple containing the first elements generated by a and b.

            Personally, I think keeping the two input generators on equal footing is much clearer than feeding one of them through a pipe.

          • pmb

            Named pipes? Or perhaps:

            (gen1 & gen2) | eet