Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have file_stem accept a full path #284

Merged
merged 1 commit into from
Aug 18, 2024
Merged

have file_stem accept a full path #284

merged 1 commit into from
Aug 18, 2024

Conversation

mattseddon
Copy link
Member

@mattseddon mattseddon commented Aug 13, 2024

Follow up to https://github.com/iterative/datachain/pull/101/files#r1697409039 and https://github.com/iterative/datachain/pull/101/files#r1709153373

This PR updates the compiled file_stem function to accept a full path instead of expecting users to write path.file_stem(path.name(C("file.path"))). This change leads to (slightly) better SQL being generated as well.

For comparison I've put the current and newly generated SQL below:

Before PR

              rtrim(
                substr(
                  ltrim(
                    substr(
                      source__file__path,
                      length(
                        rtrim(
                          rtrim(
                            source__file__path,
                            replace(source__file__path, :param_1, :param_2)
                          ),
                          :param_1
                        )
                      ) + :length_16
                    ),
                    :param_1
                  ),
                  :substr_2,
                  length(
                    ltrim(
                      substr(
                        source__file__path,
                        length(
                          rtrim(
                            rtrim(
                              source__file__path,
                              replace(source__file__path, :param_1, :param_2)
                            ),
                            :param_1
                          )
                        ) + :length_17
                      ),
                      :param_1
                    )
                  ) - CASE WHEN (
                    instr(
                      ltrim(
                        substr(
                          ltrim(
                            substr(
                              source__file__path,
                              length(
                                rtrim(
                                  rtrim(
                                    source__file__path,
                                    replace(source__file__path, :param_1, :param_2)
                                  ),
                                  :param_1
                                )
                              ) + :length_18
                            ),
                            :param_1
                          ),
                          length(
                            rtrim(
                              rtrim(
                                ltrim(
                                  substr(
                                    source__file__path,
                                    length(
                                      rtrim(
                                        rtrim(
                                          source__file__path,
                                          replace(source__file__path, :param_1, :param_2)
                                        ),
                                        :param_1
                                      )
                                    ) + :length_19
                                  ),
                                  :param_1
                                ),
                                replace(
                                  ltrim(
                                    substr(
                                      source__file__path,
                                      length(
                                        rtrim(
                                          rtrim(
                                            source__file__path,
                                            replace(source__file__path, :param_1, :param_2)
                                          ),
                                          :param_1
                                        )
                                      ) + :length_20
                                    ),
                                    :param_1
                                  ),
                                  :param_1,
                                  :param_2
                                )
                              ),
                              :param_1
                            )
                          ) + :length_21
                        ),
                        :param_1
                      ),
                      :param_3
                    ) = :instr_2
                  ) THEN :param_5 ELSE length(
                    ltrim(
                      substr(
                        ltrim(
                          substr(
                            source__file__path,
                            length(
                              rtrim(
                                rtrim(
                                  source__file__path,
                                  replace(source__file__path, :param_1, :param_2)
                                ),
                                :param_1
                              )
                            ) + :length_22
                          ),
                          :param_1
                        ),
                        length(
                          rtrim(
                            rtrim(
                              ltrim(
                                substr(
                                  source__file__path,
                                  length(
                                    rtrim(
                                      rtrim(
                                        source__file__path,
                                        replace(source__file__path, :param_1, :param_2)
                                      ),
                                      :param_1
                                    )
                                  ) + :length_23
                                ),
                                :param_1
                              ),
                              replace(
                                ltrim(
                                  substr(
                                    source__file__path,
                                    length(
                                      rtrim(
                                        rtrim(
                                          source__file__path,
                                          replace(source__file__path, :param_1, :param_2)
                                        ),
                                        :param_1
                                      )
                                    ) + :length_24
                                  ),
                                  :param_1
                                ),
                                :param_1,
                                :param_2
                              )
                            ),
                            :param_1
                          )
                        ) + :length_21
                      ),
                      :param_1
                    )
                  ) - length(
                    rtrim(
                      ltrim(
                        substr(
                          ltrim(
                            substr(
                              source__file__path,
                              length(
                                rtrim(
                                  rtrim(
                                    source__file__path,
                                    replace(source__file__path, :param_1, :param_2)
                                  ),
                                  :param_1
                                )
                              ) + :length_25
                            ),
                            :param_1
                          ),
                          length(
                            rtrim(
                              rtrim(
                                ltrim(
                                  substr(
                                    source__file__path,
                                    length(
                                      rtrim(
                                        rtrim(
                                          source__file__path,
                                          replace(source__file__path, :param_1, :param_2)
                                        ),
                                        :param_1
                                      )
                                    ) + :length_26
                                  ),
                                  :param_1
                                ),
                                replace(
                                  ltrim(
                                    substr(
                                      source__file__path,
                                      length(
                                        rtrim(
                                          rtrim(
                                            source__file__path,
                                            replace(source__file__path, :param_1, :param_2)
                                          ),
                                          :param_1
                                        )
                                      ) + :length_27
                                    ),
                                    :param_1
                                  ),
                                  :param_1,
                                  :param_2
                                )
                              ),
                              :param_1
                            )
                          ) + :length_21
                        ),
                        :param_1
                      ),
                      replace(
                        ltrim(
                          substr(
                            ltrim(
                              substr(
                                source__file__path,
                                length(
                                  rtrim(
                                    rtrim(
                                      source__file__path,
                                      replace(source__file__path, :param_1, :param_2)
                                    ),
                                    :param_1
                                  )
                                ) + :length_28
                              ),
                              :param_1
                            ),
                            length(
                              rtrim(
                                rtrim(
                                  ltrim(
                                    substr(
                                      source__file__path,
                                      length(
                                        rtrim(
                                          rtrim(
                                            source__file__path,
                                            replace(source__file__path, :param_1, :param_2)
                                          ),
                                          :param_1
                                        )
                                      ) + :length_29
                                    ),
                                    :param_1
                                  ),
                                  replace(
                                    ltrim(
                                      substr(
                                        source__file__path,
                                        length(
                                          rtrim(
                                            rtrim(
                                              source__file__path,
                                              replace(source__file__path, :param_1, :param_2)
                                            ),
                                            :param_1
                                          )
                                        ) + :length_30
                                      ),
                                      :param_1
                                    ),
                                    :param_1,
                                    :param_2
                                  )
                                ),
                                :param_1
                              )
                            ) + :length_21
                          ),
                          :param_1
                        ),
                        :param_3,
                        :param_2
                      )
                    )
                  ) END
                ),
                :param_3
              ) AS stem

After PR

              CASE WHEN (
                instr(file__path, '/') = 0
              ) THEN rtrim(
                substr(
                  file__path, 
                  1, 
                  length(file__path) - CASE WHEN (
                    instr(file__path, '.') = 0
                  ) THEN 0 ELSE length(file__path) - length(
                    rtrim(
                      file__path, 
                      replace(file__path, '.', '')
                    )
                  ) END
                ), 
                '.'
              ) ELSE ltrim(
                rtrim(
                  substr(
                    file__path, 
                    length(
                      rtrim(
                        rtrim(
                          file__path, 
                          replace(file__path, '/', '')
                        ), 
                        '/'
                      )
                    ) + 1, 
                    (
                      length(file__path) - length(
                        rtrim(
                          rtrim(
                            file__path, 
                            replace(file__path, '/', '')
                          ), 
                          '/'
                        )
                      )
                    ) - CASE WHEN (
                      instr(
                        ltrim(
                          substr(
                            file__path, 
                            length(
                              rtrim(
                                rtrim(
                                  file__path, 
                                  replace(file__path, '/', '')
                                ), 
                                '/'
                              )
                            ) + 1
                          ), 
                          '/'
                        ), 
                        '.'
                      ) = 0
                    ) THEN 0 ELSE length(
                      ltrim(
                        substr(
                          file__path, 
                          length(
                            rtrim(
                              rtrim(
                                file__path, 
                                replace(file__path, '/', '')
                              ), 
                              '/'
                            )
                          ) + 1
                        ), 
                        '/'
                      )
                    ) - length(
                      rtrim(
                        ltrim(
                          substr(
                            file__path, 
                            length(
                              rtrim(
                                rtrim(
                                  file__path, 
                                  replace(file__path, '/', '')
                                ), 
                                '/'
                              )
                            ) + 1
                          ), 
                          '/'
                        ), 
                        replace(
                          ltrim(
                            substr(
                              file__path, 
                              length(
                                rtrim(
                                  rtrim(
                                    file__path, 
                                    replace(file__path, '/', '')
                                  ), 
                                  '/'
                                )
                              ) + 1
                            ), 
                            '/'
                          ), 
                          '.', 
                          ''
                        )
                      )
                    ) END
                  ), 
                  '.'
                ), 
                '/'
              ) END AS stem, 

... as you can see the SQL is still hot garbage but there is less of it and the UX is slightly better

Copy link

cloudflare-workers-and-pages bot commented Aug 13, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: c95626d
Status: ✅  Deploy successful!
Preview URL: https://b2babfeb.datachain-documentation.pages.dev
Branch Preview URL: https://full-path-file-stem.datachain-documentation.pages.dev

View logs

@mattseddon mattseddon self-assigned this Aug 13, 2024
@mattseddon mattseddon marked this pull request as ready for review August 13, 2024 03:40
@mattseddon mattseddon requested a review from a team August 13, 2024 03:40
stem=path.file_stem(path.name(C("file.path"))),
ext=path.file_ext(path.name(C("file.path"))),
stem=path.file_stem(C("file.path")),
ext=path.file_ext(C("file.path")),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] This accepted a full path already

@mattseddon
Copy link
Member Author

Will need a studio companion PR.

Copy link
Contributor

@dtulga dtulga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this! Although, I would recommend only merging once the Studio/ClickHouse part is done and those tests pass as well.

@mattseddon
Copy link
Member Author

mattseddon commented Aug 14, 2024

LGTM, thanks for adding this! Although, I would recommend only merging once the Studio/ClickHouse part is done and those tests pass as well.

yeah, I'm a bit stuck. The companion is up (https://github.com/iterative/studio/pull/10469) but the Studio tests are completely borked (see #292 / #293). I'll wait til we get a decision or a proper fix.

@dtulga
Copy link
Contributor

dtulga commented Aug 14, 2024

Yeah, that is less than ideal - if all the tests pass except those that were already broken, that's fine by me. (Although ideally these tests should be fixed on main at some point. 😕 And it may not be possible to release / update Studio, so that may not be an option either until the tests are fixed.)

Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you! 🙏

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.86%. Comparing base (58c63a4) to head (c95626d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #284   +/-   ##
=======================================
  Coverage   86.85%   86.86%           
=======================================
  Files          90       90           
  Lines        9868     9874    +6     
  Branches     1995     1995           
=======================================
+ Hits         8571     8577    +6     
  Misses        947      947           
  Partials      350      350           
Flag Coverage Δ
datachain 86.79% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon
Copy link
Member Author

@mattseddon mattseddon force-pushed the full-path-file-stem branch 3 times, most recently from 6c3c65d to db0b454 Compare August 18, 2024 09:25
@mattseddon mattseddon merged commit 61aeed4 into main Aug 18, 2024
38 checks passed
@mattseddon mattseddon deleted the full-path-file-stem branch August 18, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants