Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin: concurrency as builtin functions #224

Open
katcipis opened this issue May 30, 2017 · 17 comments
Open

builtin: concurrency as builtin functions #224

katcipis opened this issue May 30, 2017 · 17 comments

Comments

@katcipis
Copy link
Member

I thought about modeling our concurrency primitives as function, instead of syntactic constructions or overloading the rfork call. The idea would be to have two builtin functions:

  • go()
  • channel()

The go function will receive a function as parameter and execute it concurrently, like the go keyword:

go(fn() {
    # Do concurrent stuff
})

The semantics would be fairly similar to Go's (as usual =)).

The trick is on modeling channels as functions, with the channel function returning 3 values, the receive, the send, and the close functions:

receive, send, close <= channel()

The receive and send functions works in pair, calling receive blocks until send is called, and vice-versa.

If the idea makes sense, we need to define how to support buffered channels, and semantics on closed channels. It could be simpler than Go's and not have any kind of select magic, just to enable very simple usages.

One that is very common is when you want to wait for N operations to end, it could be something like this (I'll assume integers to make it simpler):

receive, send, close <= channel()
tasks = 10

for i = 0; i < $tasks; i++ {
    go(fn(){
        _, status <= cmd arg
        send($status)
     })
}

for i = 0; i < $tasks; i++ {
        status <= receive()
        print($status)
     })
}

close()

Lot of this code is boilerplate and further functions could make it even simpler, specially to when you just want to exec N concurrent versions of a command and wait for all to end, it could be modeled as a helper function on the stdlib.

This idea is on EXTREME draft phase =)

@ppizarro
Copy link

ppizarro commented May 31, 2017 via email

@katcipis
Copy link
Member Author

What about a function called exit ? It will happen the same thing that happens when you create a name that shadows a builtin name, it will be shadowed, because you did it explicitly in your code.

If we introduce a "go" keyword, how would you compile Go code in nash ? =P

We could come up with some clever way to eliminate ambiguity, but I'm not sure if it is worth it.

@i4ki
Copy link
Collaborator

i4ki commented May 31, 2017

I really enjoyed the idea. But if I'm not mistaken, your code example will block forever because close is being called after the receiver for loop. Close must send a signal to receivers that no more data will come up, then it must be called after the sender for and before the reading (the other for).

Regarding buffered channels, maybe this could be a parameter of the channel() function.

s, r, c <= channel(10)

But I think it will be very handy with channel type (instead of backing functions).

Select semantics can be achieve without built-ins only if we add support for non-blocked read/send's. We can make the channel function the power to create both blocking and non-blocking channels, or we can simple make the send/recv functions receive an additional parameter indicating if the send/recv must block.

For example:

fn getchan() {
    # functions returned supports a parameter indicating if they must block
    s, r, c <= channel() 
    return [$s, $r, $c]
}

# selectRead will recv from a number of channels. If def is not passed then it will
# block until some receive succeed, otherwise def function is invoked and returns.
fn selectRead(alt, def...) {
    defFn = ""
    if len($def) > 0 {
        defFn <= $def[0]
    }
    val = ""
    # something like select
    # shufle(alt) # optional
    for {
        for a in $alt {
            # false tells the recv to not block
            val, errcode <= $a["recv"](false) 
            if $errcode == 0 { # receive success
                return $a["fn"]($val)
            }
            if $errcode == 1 { # channel closed
               return false
            }
            # $errcode with any other value indicates 
            # that it's not ready, no data...
            if #defFn != "" {
                return $defFn()
           }
            # continue the loop
        }
        sleep 0.001
    }
    unreachable
}

c1 <= getchan()
c2 <= getchan()

# writers
go(fn() { write($c1) })
go(fn() { write($c2) })

# print the results

# Alt structure contains the recv channel funcs and callbacks to be executed
# when data arrives.
alt = [{
    "recv": $c1[1],
    "fn": $print,
}, {
    "recv": $c2[1],
    "fn": $print,
}]

for {
    ok <= selectRead($alt)
    if !$ok {
        break
    }
}

print("Done")

The example expects our new syntax changes.

That's much like the way that select is implemented in Plan9's C libc. Take a look in the Alt structure and documentation:
http://man.cat-v.org/plan_9/2/thread

@katcipis
Copy link
Member Author

But if I'm not mistaken, your code example will block forever because close is being called after the receiver for loop

The example is terrible =P, but although being terrible it does not seem as a case of locking forever. To every send there is a matching read call, N to N, no one will get blocked. Only if one of the senders panics/explodes before sending. The close call was added just to show it...not really necessary.

I'll take a look on the rest of the ideas soon =)

@i4ki
Copy link
Collaborator

i4ki commented May 31, 2017

Ah, got it. My bad.. I thought this because of the close in the end.

@i4ki
Copy link
Collaborator

i4ki commented May 31, 2017

The example I made is terrible also, only to illustrate that select could not be required in the first version. Someone can use something like that if really need this, but it's very ugly..

The Alt in plan9 is very very hard to understand and use.. A hack to circunvent missing features of C.

@ppizarro
Copy link

ppizarro commented Jun 3, 2017 via email

@i4ki
Copy link
Collaborator

i4ki commented Jun 3, 2017

command names and functions have different syntaxes.

exit   # parsed as a command with name 'exit'
exit() # parsed as a function call

@katcipis
Copy link
Member Author

katcipis commented Jun 3, 2017

@ppizarro to implement your idea the principle is the same as what we use today, the problem is only that we have to look ahead further on the parser, and do a more aggressive backtracking, which makes the parser considerably more complex. But it is not impossible.

@tiago4orion can give more info or point out if I'm saying something stupid =)

@ppizarro
Copy link

ppizarro commented Jun 3, 2017 via email

@ppizarro
Copy link

ppizarro commented Jun 3, 2017 via email

@i4ki
Copy link
Collaborator

i4ki commented Jun 4, 2017

@ppizarro

Keywords are emitted earlier by lexer, but their mean is deferred to parser.
There are two rules for their meaning:

1- If the keyword happens in a command argument, then it has no special meaning and parse into an unquoted ast.StringExpr.
2- If the keyword happens outside command arguments, then it has a proper syntax and cannot be used as identifiers for variables or functions.

Examples:

# keyword used in command argument
λ> echo for
for
λ> echo for fn rfork
for fn rfork

# keyword used outside arguments
λ> fn for(a) { }
ERROR: <stdin line 74>:1:3: Unexpected token for. Expected '('

What is an argument?
https://github.com/NeowayLabs/nash/blob/master/parser/parse.go#L1471

Parsers for keywords:
https://github.com/NeowayLabs/nash/blob/master/parser/parse.go#L40

This special syntax for command arguments makes every shell a non-context-free language.

About the syntax change you propose, I don't think it is good because it will harder the everyday use of shell in interactive mode.

λ> $(ls /etc)
λ> $(go build)
...

We need to balance the syntax to cli and scripts. It will never be pleasant to both cases, unfortunately, because of shell's nature (everything could be an argument value).

Let me know if this is useful, I can give more details.

@katcipis
Copy link
Member Author

I was thinking about select as a function too, what about something like (assuming maps exists):

receive, send, close <= channel()

select({
    "reader": receive,
    "call": fn() {
        data <= receive()
    },
},
{
    "writer": send,
    "call": fn() {
        $send("hi")
    },
},
{
    "default": fn() {
        echo "nothing ready"
    },
},
)

We used the send and receive functions of the channel as instances that can be used on select to identify the channel + direction.

@katcipis
Copy link
Member Author

I was thinking about what @tiago4orion said about being a little hard implementing concurrency since nothing is ready for all the races that are going to happen. I added this to my frustration of working with languages that even have the possibility of races, because they have the possibility of shared state. I'm aware this is a tradeoff for efficiency in some cases, but I think this is not the case for nash (we dont need to be stupidly slow but I dont see it as a system programming language =P).

Having that in mind this was my first draft (the idea is very new and may have obvious holes in it, that is why I'm sharing) of a different approach to concurrency.

We keep the idea of using functions, but what if when you call the go function the function passed to it will run completely isolated from anything else ? It is a very lightweight concurrent unit but completely isolated from everything else (like Erlang processes).

But different from Erlang we could still use CSP, but for this to make sense we would need to obligate every call to create a channel to communicate with the concurrent function being executed, I imagined something like this:

chan <= go(fn(chan) {
    # receive stuff
    a <= receive($chan)
})
# send stuff
send($chan, "lala")

# waiting the go routine to end is to wait the channel to be closed
wait($chan)

The wait function could accept a vargs, making it very easy to do simple fan-in's.

There is some holes...like what happens if you manually closes the channel...not sure if this channel closed semantic to mean that the goroutine has ended is a good idea...I just wanted an easy way to do that because the main use cases that we have on infrastructure building is basically running a lot of crap concurrently and waiting it to end.

Or we could see the channel closed semantics as a way of knowing that the job is done...not that you can be SURE that the concurrent execution actually ended. But it would be useful to implement logic that guarantees a close if the function ended.

Anyway...there will be some loose ends...do the idea seems interesting enough to be pursuit-ed ?

@i4ki
Copy link
Collaborator

i4ki commented Oct 20, 2017

I like the idea, but I'll stretch it a little bit =) Why then coroutines? If there are no sharing, and performance isn't a requirement, then why design concurrency with a thread model (sharing) ? We could use processes and message passing over some protocol (rpc, unix, tcp, shm, etc). Rfork in nash currently does exactly that but without concurrency (using unix sockets as message transport):

A = "some value"

rfork u {
    echo $A  # error, A isnt defined
}

Because using rfork we start a new process that sets up the right linux namespaces.
If we are going to this path (non-sharing threads), then I think is a good time to unify nash execution models and achieve one-way of get concurrent units.

Changing rfork syntax and semantics to be concurrent (maybe also the name) and adding the way to communicate by channels (over an reliable transport) is the most effective and simple approach in my opinion.
What do you think?

@katcipis
Copy link
Member Author

I think this is a GREAT idea =), specially because we are not that concerned with performance (the only loss comparing to Go/Erlang model). No sharing will be enforced by the OS this way =D.

Lets start a design doc (like vars) with some of the syntax that we already discussed for rfork =)

@katcipis
Copy link
Member Author

Design doc started: https://github.com/NeowayLabs/nash/pull/247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants