Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can only run 400 concurrent requests on my machine(4c 8g). Is this normal? #15

Open
lixiangzzz2017 opened this issue Sep 13, 2024 · 4 comments

Comments

@lixiangzzz2017
Copy link

as title.
the cpu utilization will raised up to 95%
the code are as below.
sleep 20ms to simulate real phone

package main

import (
	"fmt"
	"io"
	"net/http"
	_ "net/http/pprof"
	"os"
	"postsuperman/codec/g729"
	"strconv"
	"sync"
	"time"
)

/*
#include <stdio.h>
int sum(int a, int b) {
    return a + b;
}
*/
import "C"

func main() {
	go func() {
		err := http.ListenAndServe("0.0.0.0:6065", nil)
		if err != nil {
			fmt.Println(err)
		}
	}()

	dirName := os.Args[1]
	list, err := os.ReadDir(dirName)
	if err != nil {
		panic(err)
	}

	concurrency, err := strconv.Atoi(os.Args[2])
	fmt.Println(concurrency)
	if err != nil {
		panic(err)
	}

	m := map[string][][]byte{}
	for _, entry := range list {
		fullName := dirName + "/" + entry.Name()
		inputWAV, err := os.Open(fullName)
		if err != nil {
			panic(err)
		}

		wavHeader := make([]byte, 44)
		if _, err = inputWAV.Read(wavHeader); err != nil {
			panic(err)
		}

		arr := [][]byte{}
		for {
			buf := make([]byte, 160)
			if n, err := inputWAV.Read(buf); err == io.EOF {
				break
			} else if err != nil {
				break
			} else if n != 160 {
				// ignore last frame if frame size is invalid
				break
			}
			arr = append(arr, append([]byte{}, buf...))
		}
		m[entry.Name()] = arr
	}
	concurrentCh := make(chan struct{}, concurrency)
	for {
		wg := &sync.WaitGroup{}
		for i := 0; i < 500; i++ {
			for _, v := range m {
				wg.Add(1)
				go func(wg *sync.WaitGroup, v [][]byte) {
					concurrentCh <- struct{}{}
					defer func() {
						<-concurrentCh
					}()
					defer wg.Done()

					enc := g729.NewEncoder(false)
					defer enc.Close()
					dec := g729.NewDecoder()
					defer dec.Close()
					for _, value := range v {
						// for range v {
						time.Sleep(20 * time.Millisecond)
						// C.sum(C.int(2), C.int(3))
						if encodedByte, err := enc.Encode(value); err != nil {
							return
						} else {
							if _, err := dec.Decode(encodedByte); err != nil {
								return
							}
						}
					}
				}(wg, v)
			}
		}
		wg.Wait()
	}
}

[Image]

@jeannotlapin
Copy link
Member

Hi,
on a relatively old CPU (intel core i5-6600T), a single core can encrypt/decrypt around 250 streams (doing really only that). 400 streams/core seems to be realistic for some more modern CPU core.

A common performance issue with this lib is to build it without the optimization flag (-O2),. When not present it will increase the request on CPU by a factor 3. If your test runs only on the CPU and does not use the GPU (which seems likely) it leaves you with 4 cores running it. 100 streams/core without the -O2 on the build line is more or less expected.

@lixiangzzz2017
Copy link
Author

Thank you. At which step should I add the flag (O2) during the build? I built it according to the instructions on the homepage.

@jeannotlapin
Copy link
Member

To enable -O2 option on the compiler command line you must add

-DCMAKE_BUILD_TYPE=RelWithDebInfo

to the cmake configuration command

@lixiangzzz2017
Copy link
Author

I've already tried to build with -O2 option, the machine can run 1200 streams now, nearly about 3 times before.Really appreciate. Why dont you guys add it into the instructions on homepage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants