Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream control structure #16

Open
niryuu opened this issue Dec 12, 2014 · 7 comments
Open

Stream control structure #16

niryuu opened this issue Dec 12, 2014 · 7 comments

Comments

@niryuu
Copy link

niryuu commented Dec 12, 2014

As I have seen, one of a key point of Streem is streaming data flows on concurrent situation. The FizzBuzz example shows one simple stream([1..100]->FizzBuzz->STDOUT). But if we write complicated concurrent programs, we tame complicated relations of processes. For example:

  • switching the next process
  • generating multiple processes
  • receiving from multiple processes and merge to one data flow

Streem has fascinating syntax like UNIX pipe. But it already have function and if statement, and we can implement process control structures using these. Besides this, we can also implement by extending the pipe syntax. So it will be important which control structures to assign to pipe. It will affect usability and expressiveness. How do you think?

@matz
Copy link
Owner

matz commented Dec 13, 2014

I think switching and generating processes can be implicit using pipe syntax. At least I'd like to experiment how far we can go without explicit process control.

We will have pipeline operations such as

  • zipping zip(s1, s2, s3) that gives tuple of elements from each streams
  • mixing mix(s1,s2) that gives stream of elements from streams
  • concatenating cat(s1,s2) that gives elements of s1 then s2

some of them might have special operators (I have + and & in mind).

@oleksandr
Copy link

May be something from Flow-Based Programming (FBP) can be useful here? UNIX pipes are quite limited even together with GNU parallel. Multiple input/output streams and named-ports for blocks would reveal a lot of possibilities. Just a suggestion as I don't know the initial intent of designing a streem language...

@matz
Copy link
Owner

matz commented Dec 13, 2014

@oleksandr thank you for the info. I will investigate.

@ekg
Copy link

ekg commented Dec 13, 2014

@oleksandr Actually, you can handle multiple input and output streams with shell. You can tee to multiple named pipes, for instance, or pee one input to multiple subshells defined by other commands. Merging is also straightforward, as you "just" need to be able to have one process open and sort/zip/mix a variety of input files (or named pipes).

I'm interested in what streem could bring to the table that's not already well-supported in concurrent, parallel ways in posix-compatible shells (provided a handful of utilities like those found in moreutils and gnu parallel). I'm not saying the situation is ideal, but unix shells like bash and zsh are lacking surprisingly little, and I use them in my work all the time to implement concurrent, multiprocessor stream processing pipelines.

@oleksandr
Copy link

@ekg Indeed, this is all possible. What I meant under "limited" is rather inefficient resources usage in certain cases and DX (developer's experience), which in the first turn involves readability & learning curve. Let me elaborate a bit on those.

  1. A simple stream transformation running as a process can be too heavy - running it as a native or green thread would be more appropriate. In UNIX pipes each "block" is executed as a process (involves address space and etc allocation). This is a price to pay for using any executable as a block - some kind of language-agnostic dataflow programming. In other dataflow systems the "blocks" are mapped in different ways. I don't know what kind of async execution for steem @matz has in mind, as it's in progress. But I would also be interested what it could bring to shells.
  2. A relatively complex UNIX pipe is hard to comprehend in comparison to visual representation of a flow or FBP DSL. I believe these kind of systems should exploit visual programming as much as possible. Referring to Bret Victor's talk in 2013 we're still coding instead of direct manipulation of data, we store code in text files instead of spacial representation and we still use sequential programming model instead of thinking in terms of concurrency. If the dataflow/stream approach targets the latter, the first 2 points are not addressed. And to me they are very important for creating a nice DX and providing a moderate learning curve.

@bver
Copy link

bver commented Dec 14, 2014

Not sure if this is the right thread, but:
I think Streem could be great for scripting code blocks in a form of simple independent agents running concurrently. IMHO limiting the communication of such agents only to the UNIX pipes model would restrict its potential.
We can imagine more messaging patterns here -- and guess zip, mix, cat operators are steps in the right direction. Good inspiration (FBP aside) can be e.g. 0MQ which provides the nice set of communication patterns:
http://zguide.zeromq.org/page:all#toc32

@oleksandr
Copy link

@bver Handling connections between blocks is the second important aspect after executing a block. In FBP they talk about connection as a bounded buffer with configurable capacity. 0MQ in particular has only HWM notation and queueing depends on the socket type. Would be interesting to see how this can be specified in the steeem DSL.

P.S. Here's an experiment with FBP + 0MQ we were playinh with for the last couple of months: https://github.com/cascades-fbp/cascades - it supports both FBP DSL and JSON format from NoFlo guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants